A novel data augmentation approach for influenza A subtype prediction based on HA proteins

Abstract

Influenza, a pervasive viral respiratory illness, remains a significant global health concern. The influenza A virus, capable of causing pandemics, necessitates timely identification of specific subtypes for effective prevention and control, as highlighted by the World Health Organization. The genetic diversity of influenza A virus, especially in the hemagglutinin protein, presents challenges for accurate subtype prediction. This study introduces PreIS as a novel pipeline utilizing advanced protein language models and supervised data augmentation to discern subtle differences in hemagglutinin protein sequences. PreIS demonstrates two key contributions: leveraging pre-trained protein language models for influenza subtype classification and utilizing supervised data augmentation to generate additional training data without extensive annotations. The effectiveness of the pipeline has been rigorously assessed through …

Publication
Computers in Biology and Medicine
Fatemeh Zare
Fatemeh Zare
Associate Professor

My research interests include bioinformatics, computational biology and artificial intelligence.

Mahsa Sa'adat
Mahsa Sa'adat

Ph.D. candidate in Computer Science specializing in Soft Computing and Artificial Intelligence, with a strong focus on bioinformatics, immunoinformatics, and computational drug design. Experienced in teaching, academic leadership, and organizing international conferences. Passionate about employing AI and machine learning to solve complex problems in healthcare and biology.