Elham Nazari
1,2* , Ghazaleh Khalili-Tanha
2,3, Ghazaleh Pourali
2, Fatemeh Khojasteh-Leylakoohi
3, Hanieh Azari
3, Mohammad Dashtiahangar
4, Hamid Fiuji
5, Zahra Yousefli
2,3, Alireza Asadnia
2,3, Mina Maftooh
2,6, Hamed Akbarzade
2, Mohammadreza Nassiri
7, Seyed Mahdi Hassanian
2, Gordon A Ferns
8, Godefridus J Peters
5,9, Elisa Giovannetti
5,10, Jyotsna Batra
11,12, Majid Khazaei
2, Amir Avan
2,12* 1 Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
2 Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
3 Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
4 School of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran
5 Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam U.M.C., VU. University Medical Center (VUMC), Amsterdam, The Netherlands
6 College of Medicine, University of Warith Al-Anbiyaa, Karbala, Iraq
7 Recombinant Proteins Research Group, The Research Institute of Biotechnology, Ferdowsi University of Mashhad, Mashhad, Iran
8 Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex BN1 9PH, UK
9 Professor In Biochemistry, Medical University of Gdansk,Gdansk, Poland
10 Cancer Pharmacology Lab, AIRC Start up Unit, Fondazione Pisana per La Scienza, Pisa, Italy
11 Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane 4059, Australia
12 Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane 4059, Australia
13 Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane 4059, Australia
Abstract
Introduction: Colorectal cancer (CRC) is among the lethal cancers, indicating the need for the identification of novel biomarkers for the detection of patients in earlier stages. RNA and microRNA sequencing were analyzed using bioinformatics and machine learning algorithms to identify differentially expressed genes (DEGs), followed by validation in CRC patients.
Methods: The genome-wide RNA sequencing of 631 samples, comprising 398 patients and 233 normal cases was extracted from the Cancer Genome Atlas (TCGA). The DEGs were identified using DESeq package in R. Survival analysis was evaluated using Kaplan–Meier analysis to identify prognostic biomarkers. Predictive biomarkers were determined by machine learning algorithms such as Deep learning, Decision Tree, and Support Vector Machine. The biological pathways, protein-protein interaction (PPI), the co-expression of DEGs, and the correlation between DEGs and clinical data were evaluated. Additionally, the diagnostic markers were assessed with a combioROC package. Finally, the candidate tope score gene was validated by Real-time PCR in CRC patients.
Results: The survival analysis revealed five novel prognostic genes, including KCNK13, C1orf174, CLEC18A, SRRM5, and GPR89A. Thirty-nine upregulated, 40 downregulated genes, and 20 miRNAs were detected by SVM with high accuracy and AUC. The upregulation of KRT20 and FAM118A genes and the downregulation of LRAT and PROZ genes had the highest coefficient in the advanced stage. Furthermore, our findings showed that three miRNAs (mir-19b-1, mir-326, and mir-330) upregulated in the advanced stage. C1orf174, as a novel gene, was validated using RT-PCR in CRC patients. The combineROC curve analysis indicated that the combination of C1orf174-AKAP4-DIRC1-SKIL-Scan29A4 can be considered as diagnostic markers with sensitivity, specificity, and AUC values of 0.90, 0.94, and 0.92, respectively.
Conclusion: Machine learning algorithms can be used to Identify key dysregulated genes/miRNAs involved in the pathogenesis of diseases, leading to the detection of patients in earlier stages. Our data also demonstrated the prognostic value of C1orf174 in colorectal cancer.