Enhancing Predictive Accuracy of CRISPR-Cas9 on-target efficiency using Deep Learning and Active Learning Optimization for Small Datasets

Abstract

The CRISPR-Cas9 gene editing system has revolutionized genetics, but predicting sgRNA cleavage efficiency remains a challenge, particularly with small datasets. We present a deep learning framework optimized for small datasets by integrating active learning, which iteratively prioritizes the most informative data points for labeling. Our model outperforms previous methods on benchmark datasets, capturing complex sequence features through domain-specific properties. Active learning reduces the required dataset size enabling high predictive accuracy even with limited data. This approach provides a scalable and robust solution for improving CRISPR-Cas9 design and precise gene editing across diverse genomic contexts, demonstrating the potential of active learning to enhance deep learning model performance in data-scarce scenarios.

Publication
4th International & 13th Iranian Conference on Bioinformatics
Fatemeh Zare-Mirakabad
Fatemeh Zare-Mirakabad
Associate Professor

My research interests include bioinformatics, computational biology and artificial intelligence.