﻿<?xml version="1.0" encoding="UTF-8"?>
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName>Tabriz University of Medical Sciences</PublisherName>
      <JournalTitle>BioImpacts</JournalTitle>
      <Issn>2228-5652</Issn>
      <Volume>15</Volume>
      <Issue>1</Issue>
      <PubDate PubStatus="ppublish">
        <Year>2025</Year>
        <Month>01</Month>
        <DAY>19</DAY>
      </PubDate>
    </Journal>
    <ArticleTitle>Hybrid deep learning models for text-based identification of gene-disease associations</ArticleTitle>
    <FirstPage>31226</FirstPage>
    <LastPage>31226</LastPage>
    <ELocationID EIdType="doi">10.34172/bi.31226</ELocationID>
    <Language>EN</Language>
    <AuthorList>
      <Author>
        <FirstName>Noor</FirstName>
        <LastName>Fadhil Jumaa</LastName>
        <Identifier Source="ORCID">https://orcid.org/0009-0000-4248-3707</Identifier>
      </Author>
      <Author>
        <FirstName>Jafar</FirstName>
        <LastName>Razmara</LastName>
        <Identifier Source="ORCID">https://orcid.org/0000-0002-6320-8517</Identifier>
      </Author>
      <Author>
        <FirstName>Sepideh</FirstName>
        <LastName>Parvizpour</LastName>
      </Author>
      <Author>
        <FirstName>Jaber</FirstName>
        <LastName>Karimpour</LastName>
      </Author>
    </AuthorList>
    <PublicationType>Journal Article</PublicationType>
    <ArticleIdList>
      <ArticleId IdType="doi">10.34172/bi.31226</ArticleId>
    </ArticleIdList>
    <History>
      <PubDate PubStatus="received">
        <Year>2025</Year>
        <Month>04</Month>
        <Day>08</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2025</Year>
        <Month>05</Month>
        <Day>28</Day>
      </PubDate>
    </History>
    <Abstract>Introduction: Identifying gene-disease associations is crucial for advancing medical research and improving clinical outcomes. Nevertheless, the rapid expansion of biomedical literature poses significant obstacles to extracting meaningful relationships from extensive text collections. Methods: This study uses deep learning techniques to automate this process, using publicly available datasets (EU-ADR, GAD, and SNPPhenA) to classify these associations accurately. Each dataset underwent rigorous pre-processing, including entity identification and preparation, word embedding using pre-trained Word2Vec and fastText models, and position embedding to capture semantic and contextual relationships within the text. In this research, three deep learning-based hybrid models have been implemented and contrasted, including CNN-LSTM, CNN-GRU, and CNN-GRU-LSTM. Each model has been equipped with attentional mechanisms to enhance its performance. Results: Our findings reveal that the CNN-GRU model achieved the highest accuracy of 91.23% on the SNPPhenA dataset, while the CNN-GRU-LSTM model attained an accuracy of 90.14% on the EU-ADR dataset. Meanwhile, the CNN-LSTM model demonstrated superior performance on the GAD dataset, achieving an accuracy of 84.90%. Compared to previous state-of-the-art methods, such as BioBERT-based models, our hybrid approach demonstrates superior classification performance by effectively capturing local and sequential features without relying on heavy pre-training.  Conclusion: The developed models and their evaluation data are available at https://github.com/NoorFadhil/Deep-GDAE.</Abstract>
    <ObjectList>
      <Object Type="keyword">
        <Param Name="value">Gene-disease association extraction</Param>
      </Object>
      <Object Type="keyword">
        <Param Name="value">Deep learning</Param>
      </Object>
      <Object Type="keyword">
        <Param Name="value">Attention mechanism</Param>
      </Object>
      <Object Type="keyword">
        <Param Name="value">Feature extraction</Param>
      </Object>
    </ObjectList>
  </Article>
</ArticleSet>