SiaScoreNet: a siamese neural network-based model integrating prediction scores for HLA-peptide interaction prediction

Abstract

Motivation

Cancer immunotherapy uses the immune system to recognize and eliminate tumor cells by presenting tumor antigens through Human Leukocyte Antigen (HLA) molecules. Accurate prediction of HLA–peptide interactions is essential for personalized immunotherapy development. Allele-specific models achieve high accuracy and handle variable peptide lengths but require separate training for each allele, limiting scalability to rare or unseen HLAs. Pan-specific models generalize across multiple alleles and match or surpass allele-specific methods. Ensemble methods improve prediction by combining outputs from multiple predictors, often via linear combinations, though nonlinear strategies may better capture HLA–peptide complexities.

We propose SiaScoreNet, a three-step predictive pipeline enhancing HLA–peptide interaction prediction. First, ESM, a pretrained transformer-based protein language model, embeds HLA and peptide sequences into fixed-length representations, accommodating varying sequence lengths. Second, we integrate predicted scores from state-of-the-art models into a comprehensive feature vector. Third, a nonlinear ensemble strategy combines features, capturing complex dependencies and boosting performance.

Results

Benchmark evaluations show SiaScoreNet outperforms existing models in accuracy, comparable to TransPHLA, BigMHC, and CapHLA. Recent models prioritize recall over precision, valuable for identifying potential binders but resource-intensive. SiaScoreNet offers improved performance and runtime efficiency compared to these models, evaluated against HPV viruses for HLA–peptide prediction.

Availability and implementation

The data and source code for prediction and experiments presented in this study is publicly available in the SiaScoreNet repository hosted on GitHub: https://github.com/CBRC-lab/SiaScoreNet.

Publication
Bioinformatics Advances
Mahsa Sa'adat
Mahsa Sa'adat
Postdoctoral researcher

Ph.D. candidate in Computer Science specializing in Soft Computing and Artificial Intelligence, with a strong focus on bioinformatics, immunoinformatics, and computational drug design. Experienced in teaching, academic leadership, and organizing international conferences. Passionate about employing AI and machine learning to solve complex problems in healthcare and biology.

Fatemeh Zare-Mirakabad
Fatemeh Zare-Mirakabad
Associate Professor

My research interests include bioinformatics, computational biology and artificial intelligence.

Milad Besharatifard
Milad Besharatifard

Ph.D. student in Computer Science specializing in Soft Computing and Artificial Intelligence