Tabriz University of Medical Sciences BioImpacts 2228-5652 15 1 2025 01 19 Hybrid deep learning models for text-based identification of gene-disease associations 31226 31226 10.34172/bi.31226 EN Noor Fadhil Jumaa https://orcid.org/0009-0000-4248-3707 Jafar Razmara https://orcid.org/0000-0002-6320-8517 Sepideh Parvizpour Jaber Karimpour Journal Article 10.34172/bi.31226 2025 04 08 2025 05 28 Introduction: Identifying gene-disease associations is crucial for advancing medical research and improving clinical outcomes. Nevertheless, the rapid expansion of biomedical literature poses significant obstacles to extracting meaningful relationships from extensive text collections. Methods: This study uses deep learning techniques to automate this process, using publicly available datasets (EU-ADR, GAD, and SNPPhenA) to classify these associations accurately. Each dataset underwent rigorous pre-processing, including entity identification and preparation, word embedding using pre-trained Word2Vec and fastText models, and position embedding to capture semantic and contextual relationships within the text. In this research, three deep learning-based hybrid models have been implemented and contrasted, including CNN-LSTM, CNN-GRU, and CNN-GRU-LSTM. Each model has been equipped with attentional mechanisms to enhance its performance. Results: Our findings reveal that the CNN-GRU model achieved the highest accuracy of 91.23% on the SNPPhenA dataset, while the CNN-GRU-LSTM model attained an accuracy of 90.14% on the EU-ADR dataset. Meanwhile, the CNN-LSTM model demonstrated superior performance on the GAD dataset, achieving an accuracy of 84.90%. Compared to previous state-of-the-art methods, such as BioBERT-based models, our hybrid approach demonstrates superior classification performance by effectively capturing local and sequential features without relying on heavy pre-training. Conclusion: The developed models and their evaluation data are available at https://github.com/NoorFadhil/Deep-GDAE.