Abstract
Large-scale genome-wide association studies (GWAS) have been successfully applied to a wide range of genetic variants underlying complex diseases. The network-based regression approach has been developed to incorporate a biological genetic network and to overcome the challenges caused by the computational efficiency for analyzing high-dimensional genomic data. In this paper, we propose a gene selection approach by incorporating genetic networks into case-control association studies for DNA sequence data or DNA methylation data. Instead of using traditional dimension reduction techniques such as principal component analyses and supervised principal component analyses, we use a linear combination of genotypes at SNPs or methylation values at CpG sites in a gene to capture gene-level signals. We employ three linear combination approaches: optimally weighted sum (OWS), beta-based weighted sum (BWS), and LD-adjusted polygenic risk score (LD-PRS). OWS and LD-PRS are supervised approaches that depend on the effect of each SNP or CpG site on the case-control status, while BWS can be extracted without using the case-control status. After using one of the linear combinations of genotypes or methylation values in each gene to capture gene-level signals, we regularize them to perform gene selection based on the biological network. Simulation studies show that the proposed approaches have higher true positive rates than using traditional dimension reduction techniques. We also apply our approaches to DNA methylation data and UK Biobank DNA sequence data for analyzing rheumatoid arthritis. The results show that the proposed methods can select potentially rheumatoid arthritis related genes that are missed by existing methods.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data analyzed during this study are included in this published article and its supplemental materials.
References
Ritchie MD. Large-scale analysis of genetic and clinical patient data. Annual Review of Biomedical Data. Science. 2018;1:263–74.
Li R, Duan R, Kember RL, Rader DJ, Damrauer SM, Moore JH, et al. A regression framework to uncover pleiotropy in large-scale electronic health record data. J Am Med Inform Assoc. 2019;26:1083–90.
Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science. 1998;280:1077–82.
Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012;13:705–19.
Waldmann P, Mészáros G, Gredler B, Fuerst C, Sölkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet. 2013;4:270.
Wang H, Lengerich BJ, Aragam B, Xing EP. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics. 2019;35:1181–7.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc: Ser B (Stat Methodol). 2006;68:49–67.
Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. J R Stat Soc: Ser B (Stat Methodol). 2008;70:53–71.
Kim K, Sun H. Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data. BMC Bioinforma. 2019;20:1–15.
Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008;24:1175–82.
Sun H, Wang S. Network‐based regularization for matched case‐control analysis of high‐dimensional DNA methylation data. Stat Med. 2013;32:2127–39.
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93.
Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012;36:561–71.
Yan S, Sha Q, Zhang S. Gene-based association tests using new polygenic risk scores and incorporating gene expression data. Genes. 2022;13:1120.
Baker E, Schmidt KM, Sims R, O’Donovan MC, Williams J, Holmans P, et al. POLARIS: Polygenic LD‐adjusted risk score approach for set‐based analysis of GWAS data. Genet Epidemiol. 2018;42:366–77.
Choi J, Kim K, Sun H. New variable selection strategy for analysis of high-dimensional DNA methylation data. J Bioinforma Computational Biol. 2018;16:1850010.
Sun H, Wang S. Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics. 2012;28:1368–75.
Meinshausen N, Bühlmann P. Stability selection. J R Stat Soc: Ser B (Stat Methodol). 2010;72:417–73.
Kuhn M, Johnson K. Applied predictive modeling. Springer; 2013.
Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol. 2013;31:142–7.
Kular L, Liu Y, Ruhrmann S, Zheleznyakova G, Marabita F, Gomez-Cabrero D, et al. DNA methylation as a mediator of HLA-DRB1* 15: 01 and a protective variant in multiple sclerosis. Nat Commun. 2018;9:1–15.
Jiang X, Källberg H, Chen Z, Ärlestig L, Rantapää-Dahlqvist S, Davila S, et al. An Immunochip-based interaction study of contrasting interaction effects with smoking in ACPA-positive versus ACPA-negative rheumatoid arthritis. Rheumatology. 2016;55:149–55.
Traylor M, Knevel R, Cui J, Taylor J, Harm-Jan W, Conaghan PG, et al. Genetic associations with radiological damage in rheumatoid arthritis: Meta-analysis of seven genome-wide association studies of 2,775 cases. PloS One. 2019;14:e0223246.
Eyre S, Bowes J, Diogo D, Lee A, Barton A, Martin P, et al. High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat Genet. 2012;44:1336–40.
Govind N, Choudhury A, Hodkinson B, Ickinger C, Frost J, Lee A, et al. Immunochip identifies novel, and replicates known, genetic risk loci for rheumatoid arthritis in black South Africans. Mol Med. 2014;20:341–9.
Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, et al. TRAF1–C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl J Med. 2007;357:1199–209.
Bossini-Castillo L, De Kovel C, Kallberg H, van’t Slot R, Italiaander A, Coenen M, et al. A genome-wide association study of rheumatoid arthritis without antibodies against citrullinated peptides. Ann Rheum Dis. 2015;74:e15–e.
Consortium WTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661.
Wei W-H, Viatte S, Merriman TR, Barton A, Worthington J. Genotypic variability based association identifies novel non-additive loci DHCR7 and IRF4 in sero-negative rheumatoid arthritis. Sci Rep. 2017;7:1–7.
Julia A, Ballina J, Canete JD, Balsa A, Tornero‐Molina J, Naranjo A, et al. Genome‐wide association study of rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility. Arthritis Rheumatism: Off J Am Coll Rheumatol. 2008;58:2275–86.
Negi S, Juyal G, Senapati S, Prasad P, Gupta A, Singh S, et al. A genome‐wide association study reveals ARL15, a novel non‐HLA susceptibility gene for rheumatoid arthritis in North Indians. Arthritis Rheumatism. 2013;65:3026–35.
Aterido A, Cañete JD, Tornero J, Ferrándiz C, Pinto JA, Gratacós J, et al. Genetic variation at the glycosaminoglycan metabolism pathway contributes to the risk of psoriatic arthritis but not psoriasis. Ann Rheum Dis. 2019;78:355–64.
Kochi Y, Okada Y, Suzuki A, Ikari K, Terao C, Takahashi A, et al. A regulatory variant in CCR6 is associated with rheumatoid arthritis susceptibility. Nat Genet. 2010;42:515–9.
Raychaudhuri S, Remmers EF, Lee AT, Hackett R, Guiducci C, Burtt NP, et al. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat Genet. 2008;40:1216–23.
Weyand CM, Goronzy JJ. Association of MHC and rheumatoid arthritis: HLA polymorphisms in phenotypic variants of rheumatoid arthritis. Arthritis Res Ther. 2000;2:1–5.
Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am J Hum Genet. 2017;101:37–49.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Huber PJ. Robust estimation of a location parameter. Breakthroughs in statistics: Springer; 1992. p. 492–518.
Acknowledgements
Part of this research has been conducted using the UK Biobank Resource under application number 41722 and the NHGRI-EBI GWAS Catalog. XC was partially supported by the Michigan Technological University Health Research Institute Fellowship program and the Portage Health Foundation Graduate Assistantship.
Funding
No financial assistance was received in support of the study.
Author information
Authors and Affiliations
Contributions
Formal analysis: XC; Methodology: XC, SZ, XL, and QS; Data curation: XC and XL; Visualization: XC; Writing original draft: XC, XL, and QS; Writing review and editing: XC, SZ, XL, and QS.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
This study used DNA sequence data from the UK Biobank, which has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) approval (approval number: 11/NW/0382). No specific ethical approval was required for DNA methylation data in this study, which is downloaded from GEO publicly available database with access number GSE42861.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, X., Liang, X., Zhang, S. et al. Gene selection by incorporating genetic networks into case-control association studies. Eur J Hum Genet 32, 270–277 (2024). https://doi.org/10.1038/s41431-022-01264-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41431-022-01264-x