An efficient framework to identify disease-associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limited to genes with strong or established evidence levels and applications including genes with less or emerging evidence of disease association.
We extracted genes with evidence for gene–disease association from the Human Gene Mutation Database, OMIM, and ClinVar to build a comprehensive gene list of 6,145 genes. Next, we applied stringent filters in conjunction with computationally curated evidence (DisGeNET) to create a restrictive list limited to 3,929 genes with stronger disease associations.
When compared to manual gene curation efforts, including the Clinical Genome Resource, genes with strong or definitive disease associations are included in both gene lists at high percentages, while genes with limited evidence are largely removed. We further confirmed the utility of this approach in identifying pathogenic and likely pathogenic variants in 45 genomes.
Our approach efficiently creates highly sensitive gene lists for genomic applications, while remaining dynamic and updatable, enabling time savings in genomic applications.
Subscribe to Journal
Get full journal access for 1 year
only $33.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The gene lists and data used to develop the lists can be found at https://Broad.io/genelist.
Strande, N. T. et al. Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the Clinical Genome Resource. Am. J. Hum. Genet. 100, 895–906 (2017).
Piñero, J. et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015, bav028 (2015).
Piñero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45(D1), D833–D839 (2017).
Piñero, J. et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48(D1), D845–D855 (2020).
Martin, A. R. et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat. Genet. 51, 1560–1565 (2019).
The Gene Curation Coalition. https://thegencc.org/ (2021).
Ceyhan-Birsoy, O. et al. A curated gene list for reporting results of newborn genomic sequencing. Genet. Med. 19, 809–818 (2017).
Machini, K. et al. Analyzing and reanalyzing the genome: findings from the MedSeq Project. Am. J. Hum. Genet. 105, 177–188 (2019).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46(D1), D1062–D1067 (2018).
OMIM. https://www.omim.org/ (2020).
Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Funding support was partly provided by grant 5R01HL143295 from the National Institutes of Health/National Heart, Lung, and Blood Institute (L.L.d.l.V., C.L.B.Z., R.C.G., H.L.R., M.S.L.). The authors would like to thank the Gene Curation Coalition (GenCC) for generating curated content used in this project. GenCC’s curated content was obtained at www.thegencc.org (13 March 2021) and includes contributions from the following organizations: Invitae, Illumina, Myriad Women’s Health, Ambry Genetics, and TGMI/G2P.
This project has been reviewed and approved by the Mass General Brigham institutional review board (IRB). All individuals consented for clinical genomic screening and all individual data was de-identified.
M.S.L., L.L.d.l.V., and C.L.B.Z. report grants from the National Institutes of Health (NIH) during the conduct of the study. H.L.R. reports grants from NIH during the conduct of the study; she also reports personal fees from Genome Medical outside the submitted work. R.C.G. reports grants from NIH during the conduct of the study; he also reports personal fees from AIA, SavvySherpa, Verily, and Wamberg, all outside the submitted work. The other authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lazo de la Vega, L., Yu, W., Machini, K. et al. A framework for automated gene selection in genomic applications. Genet Med (2021). https://doi.org/10.1038/s41436-021-01213-x