Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts

Frésard, Laure; Smail, Craig; Ferraro, Nicole M.; Teran, Nicole A.; Li, Xin; Smith, Kevin S.; Bonner, Devon; Kernohan, Kristin D.; Marwaha, Shruti; Zappala, Zachary; Balliu, Brunilda; Davis, Joe R.; Liu, Boxiang; Prybol, Cameron J.; Kohler, Jennefer N.; Zastrow, Diane B.; Reuter, Chloe M.; Fisk, Dianna G.; Grove, Megan E.; Davidson, Jean M.; Hartley, Taila; Joshi, Ruchi; Strober, Benjamin J.; Utiramerur, Sowmithri; Lind, Lars; Ingelsson, Erik; Battle, Alexis; Bejerano, Gill; Bernstein, Jonathan A.; Ashley, Euan A.; Boycott, Kym M.; Merker, Jason D.; Wheeler, Matthew T.; Montgomery, Stephen B.

doi:10.1038/s41591-019-0457-8

Letter
Published: 03 June 2019

Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts

Laure Frésard ORCID: orcid.org/0000-0001-8154-6328¹,
Craig Smail²,
Nicole M. Ferraro²,
Nicole A. Teran³,
Xin Li ORCID: orcid.org/0000-0002-2122-7461¹,
Kevin S. Smith¹,
Devon Bonner⁴,
Kristin D. Kernohan⁵,
Shruti Marwaha^4,6,
Zachary Zappala³,
Brunilda Balliu¹,
Joe R. Davis³,
Boxiang Liu ORCID: orcid.org/0000-0002-2595-4463⁷,
Cameron J. Prybol³,
Jennefer N. Kohler⁴,
Diane B. Zastrow⁴,
Chloe M. Reuter⁴,
Dianna G. Fisk⁸,
Megan E. Grove⁸,
Jean M. Davidson⁴,
Taila Hartley⁹,
Ruchi Joshi⁸,
Benjamin J. Strober¹⁰,
Sowmithri Utiramerur⁸,
Undiagnosed Diseases Network,
Care4Rare Canada Consortium,
Lars Lind¹¹,
Erik Ingelsson^6,12,
Alexis Battle^10,13,
Gill Bejerano^14,15,16,17,
Jonathan A. Bernstein¹⁵,
Euan A. Ashley ORCID: orcid.org/0000-0001-9418-9577^3,4,12,
Kym M. Boycott⁹,
Jason D. Merker^1,8^nAff19,
Matthew T. Wheeler ORCID: orcid.org/0000-0001-8721-3022^4,6 &
…
Stephen B. Montgomery ORCID: orcid.org/0000-0002-5200-3903^1,3

Nature Medicine volume 25, pages 911–919 (2019)Cite this article

18k Accesses
171 Citations
132 Altmetric
Metrics details

Subjects

Abstract

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene¹. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches^2,3,4,5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases^6,7,8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders^6,9, and cultured fibroblasts from patients with mitochondrial disorders⁷. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Using blood RNA-seq to study rare-disease genes.**

**Fig. 2: Expression outliers in rare-disease samples.**

**Fig. 4: Identification of disease gene through splicing outlier detection.**

Whole-genome sequencing of patients with rare diseases in a national health system

Article 24 June 2020

Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts

Article Open access 28 January 2020

The expanding diagnostic toolbox for rare genetic diseases

Article 18 January 2024

Data availability

UDN data are accessible through the UDN Gateway and through dbGaP entry at phs001232.v1.p1. DGN RNA-seq data are available by application through the NIMH Center for Collaborative Genomic Studies on Mental Disorders. Instructions for requesting access to data can be found at https://www.nimhgenetics.org/access_data_biomaterial.php, and inquiries should reference the ‘Depression Genes and Networks study (D. Levinson, PI)’. The GTEx Analysis v.7 release allele-specific expression data are available from dbGaP (dbGaP Accession phs000424.v7.p2). PIVUS RNA-seq data are accessible on the European Genome-Phenome Archive (EGAS00001003583). The Care4Rare data are available through Genomics4RD.

Code availability

Code for running the analysis and producing the figures throughout the manuscript is available at https://github.com/lfresard/blood_rnaseq_rare_disease_paper. Our pipeline to highlight candidate variants is available at https://github.com/lfresard/blood_rnaseq_rare_disease_paper/blob/master/pipeline.md

References

Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Article Google Scholar
Boycott, K. M. et al. International cooperation to enable the diagnosis of all rare genetic diseases. Am. J. Hum. Genet. 100, 695–705 (2017).
Article CAS Google Scholar
Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Unlocking Mendelian disease using exome sequencing. Genome Biol. 12, 228 (2011).
Article CAS Google Scholar
Yang, Y. et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013).
Article CAS Google Scholar
Ewans, L. J. et al. Whole-exome sequencing reanalysis at 12 months boosts diagnosis and is cost-effective when applied early in Mendelian disorders. Genet. Med. 20, 1564–1574 (2018).
Article Google Scholar
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).
Article Google Scholar
Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).
Article CAS Google Scholar
Kernohan, K. D. et al. Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum. Mutat. 38, 611–614 (2017).
Article CAS Google Scholar
Hamanaka, K. et al. RNA sequencing solved the most common but unrecognized NEB pathogenic variant in Japanese nemaline myopathy. Genet. Med. https://doi.org/10.1038/s41436-018-0360-6. (2018).
Article Google Scholar
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
Article CAS Google Scholar
Lind, L. A comparison of three different methods to evaluate endothelium-dependent vasodilation in the elderly: the Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) Study. Arterioscler. Thromb. Vasc. Biol. 25, 2368–2375 (2005).
Article CAS Google Scholar
GTEx Consortium Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Article CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS Google Scholar
Zeng, Y. et al. Aberrant gene expression in humans. PLOS Genet. 11, e1004942 (2015).
Article Google Scholar
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
Article CAS Google Scholar
Pala, M. et al. Population- and individual-specific regulatory variation in Sardinia. Nat. Genet. 49, 700–707 (2017).
Article CAS Google Scholar
Cao, D. & Parker, R. Computational modeling and experimental analysis of nonsense-mediated decay in yeast. Cell 113, 533–545 (2003).
Article CAS Google Scholar
Lykke-Andersen, S. & Jensen, T. H. Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat. Rev. Mol. Cell Biol. 16, 665–677 (2015).
Article CAS Google Scholar
Nickless, A., Bailis, J. M. & You, Z. Control of gene expression through the nonsense-mediated RNA decay pathway. Cell Biosci. 7, 26 (2017).
Article Google Scholar
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Article Google Scholar
Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2017).
Article Google Scholar
Estivill, X. Genetic variation and alternative splicing. Nat. Biotechnol. 33, 357–359 (2015).
Article CAS Google Scholar
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806–1254806 (2015).
Article Google Scholar
Walter, K. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Article CAS Google Scholar
Soens, Z. T. et al. Leveraging splice-affecting variant predictors and a minigene validation system to identify Mendelian disease-causing variants among exon-captured variants of uncertain significance. Hum. Mutat. 38, 1521–1533 (2017).
Article CAS Google Scholar
Albers, C. A. et al. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat. Genet. 44, S1–S2 (2012).
Article Google Scholar
Reinius, B. & Sandberg, R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).
Article CAS Google Scholar
Barbosa, M. et al. Identification of rare de novo epigenetic variations in congenital disorders. Nat. Commun. 9, 2064 (2018).
Article Google Scholar
Avramidou, A. et al. The novel adaptor protein Swiprosin-1 enhances BCR signals and contributes to BCR-induced apoptosis. Cell Death Differ. 14, 1936–1947 (2007).
Article CAS Google Scholar
Kroczek, C. et al. Swiprosin-1/EFhd2 controls B cell receptor signaling through the assembly of the B cell receptor, Syk, and phospholipase C gamma2 in membrane rafts. J. Immunol. 184, 3665–3676 (2010).
Article CAS Google Scholar
Dütting, S., Brachs, S. & Mielenz, D. Fraternal twins: Swiprosin-1/EFhd2 and Swiprosin-2/EFhd1, two homologous EF-hand containing calcium binding adaptor proteins with distinct functions. Cell Commun. Signal. 9, 2 (2011).
Article Google Scholar
Thylur, R. P., Gowda, R., Mishra, S. & Jun, C.-D. Swiprosin-1: its expression and diverse biological functions. J. Cell. Biochem. 119, 150–156 (2018).
Article CAS Google Scholar
Heimer, G. et al. MECR mutations cause childhood-onset dystonia and optic atrophy, a mitochondrial fatty acid synthesis disorder. Am. J. Hum. Genet. 99, 1229–1244 (2016).
Article CAS Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS Google Scholar
Eldomery, M. K. et al. Lessons learned from additional research analyses of unsolved clinical exome cases. Genome Med. 9, 26 (2017).
Article Google Scholar
Wright, C. F. et al. Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders. Genet. Med. 20, 1216–1223 (2018).
Article Google Scholar
Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).
Article Google Scholar
Eilbeck, K., Quinlan, A. & Yandell, M. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 18, 599–612 (2017).
Article CAS Google Scholar
Rao, A. R. & Nelson, S. F. Calculating the statistical significance of rare variants causal for Mendelian and complex disorders. BMC Med. Genomics 11, 53 (2018).
Article Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Article CAS Google Scholar
Tange, O. GNU Parallel - The Command-Line Power Tool. The USENIX Magazine 36, 42–47 (2011).
Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS Google Scholar
Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).
Article CAS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Pedersen, B. S., Layer, R. M. & Quinlan, A. R. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol. 17, 118 (2016).
Article Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article Google Scholar
Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2015).
Article CAS Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS Google Scholar
Siepel, A. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS Google Scholar
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Meth. 9, 215–216 (2012).
Article CAS Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Article CAS Google Scholar
Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T.. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Article Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank the patients and their families for their participation in this study. S.B.M. is supported by NIH grants nos. R01HG008150 (NoVa) and U01HG009080 (GSPAC) and the Glenn Center for Aging at Stanford. L.F. was supported by the Stanford Center for Computational, Evolutionary, and Human Genomics Fellowship. C.S. is supported by a BD2K Training Grant (T32 LM012409). N.M.F. is supported by a National Science Foundation Graduate Research Fellowship. N.A.T. is supported by the Stanford Genome Training Program (2T32HG000044-21). B.L. was supported by the Stanford Computational, Evolutionary, and Human Genomics Fellowship and the National Key R&D Program of China (2016YFD0400800). K.M.B. is supported by a CIHR Foundation grant (FDN-154279). Z.Z. was supported by the CEHG Fellowship, the National Science Foundation GRFP (DGE-114747) and the Stanford Genome Training Program (NIH/NHGRI T32HG000044). B.B. was supported by the Stanford Genome Training Program and Dean’s Postdoctoral Fellowship. J.R.D. was supported by a Lucille P. Markey Biomedical Research 688 Stanford Graduate Fellowship. J.R.D. acknowledges the Stanford Genome Training Program 689 (NIH/NHGRI T32HG000044). C.J.P. is supported by NIST/JIMB grant no. 70NANB15H268. A.B. is supported by NIH grant no. R01HG008150 (NoVa) and the Searle Scholar Fund. Clinical sample collection was supported, in part, by the Care4Rare Canada Consortium funded by Genome Canada, the Canadian Institutes of Health Research, the Ontario Genomics Institute, the Ontario Research Fund and the Children’s Hospital of Eastern Ontario Foundation. Research reported in this manuscript was in part supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number U01HG007708. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Jason D. Merker
Present address: Departments of Pathology and Laboratory Medicine & Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina School Medicine, Chapel Hill, NC, USA

Authors and Affiliations

Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
Laure Frésard, Xin Li, Kevin S. Smith, Brunilda Balliu, Jason D. Merker & Stephen B. Montgomery
Biomedical Informatics Program, Stanford University, Stanford, CA, USA
Craig Smail & Nicole M. Ferraro
Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
Nicole A. Teran, Zachary Zappala, Joe R. Davis, Cameron J. Prybol, Euan A. Ashley & Stephen B. Montgomery
Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
Devon Bonner, Shruti Marwaha, Jennefer N. Kohler, Diane B. Zastrow, Chloe M. Reuter, Jean M. Davidson, Euan A. Ashley & Matthew T. Wheeler
Newborn Screening Ontario (NSO), Children’s Hospital of Eastern Ontario, Ottawa, Ontario, Canada
Kristin D. Kernohan
Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA
Shruti Marwaha, Erik Ingelsson & Matthew T. Wheeler
Department of Biology, School of Humanities and Sciences, Stanford University, Stanford, CA, USA
Boxiang Liu
Stanford Medicine Clinical Genomics Program, School of Medicine, Stanford University, Stanford, CA, USA
Dianna G. Fisk, Megan E. Grove, Ruchi Joshi, Sowmithri Utiramerur & Jason D. Merker
Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
Taila Hartley, Kym Boycott, Alex MacKenzie, Dennis Bulman, David Dyment & Kym M. Boycott
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Benjamin J. Strober & Alexis Battle
Department of Medical Sciences, Cardiovascular Epidemiology, Uppsala University, Uppsala, Sweden
Lars Lind
Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
Erik Ingelsson & Euan A. Ashley
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Alexis Battle
Department of Computer Science, Stanford University, Stanford, CA, USA
Gill Bejerano
Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA
Gill Bejerano & Jonathan A. Bernstein
Department of Developmental Biology, School of Medicine, Stanford University, Stanford, CA, USA
Gill Bejerano
Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
Gill Bejerano
NIH Undiagnosed Diseases Network, National Institutes of Health, Bethesda, MD, USA
David R. Adams, Aaron Aday, Mercedes E. Alejandro, Patrick Allard, Euan A. Ashley, Mahshid S. Azamian, Carlos A. Bacino, Eva Baker, Ashok Balasubramanyam, Hayk Barseghyan, Gabriel F. Batzli, Alan H. Beggs, Babak Behnam, Hugo J. Bellen, Jonathan A. Bernstein, Gerard T. Berry, Anna Bican, David P. Bick, Camille L. Birch, Devon Bonner, Braden E. Boone, Bret L. Bostwick, Lauren C. Briere, Elly Brokamp, Donna M. Brown, Matthew Brush, Elizabeth A. Burke, Lindsay C. Burrage, Manish J. Butte, Shan Chen, Gary D. Clark, Terra R. Coakley, Joy D. Cogan, Heather A. Colley, Cynthia M. Cooper, Heidi Cope, William J. Craigen, Precilla D’Souza, Mariska Davids, Jean M. Davidson, Jyoti G. Dayal, Esteban C. Dell’Angelica, Shweta U. Dhar, Katrina M. Dipple, Laurel A. Donnell-Fink, Naghmeh Dorrani, Daniel C. Dorset, Emilie D. Douine, David D. Draper, Annika M. Dries, Laura Duncan, David J. Eckstein, Lisa T. Emrick, Christine M. Eng, Gregory M. Enns, Ascia Eskin, Cecilia Esteves, Tyra Estwick, Liliana Fernandez, Carlos Ferreira, Elizabeth L. Fieg, Paul G. Fisher, Brent L. Fogel, Noah D. Friedman, William A. Gahl, Emily Glanton, Rena A. Godfrey, Alica M. Goldman, David B. Goldstein, Sarah E. Gould, Jean-Philippe F. Gourdine, Catherine A. Groden, Andrea L. Gropman, Melissa Haendel, Rizwan Hamid, Neil A. Hanchard, Frances High, Ingrid A. Holm, Jason Hom, Ellen M. Howerton, Yong Huang, Fariha Jamal, Yong-hui Jiang, Jean M. Johnston, Angela L. Jones, Lefkothea Karaviti, David M. Koeller, Isaac S. Kohane, Jennefer N. Kohler, Donna M. Krasnewich, Susan Korrick, Mary Koziura, Joel B. Krier, Jennifer E. Kyle, Seema R. Lalani, C. Christopher Lau, Jozef Lazar, Kimberly LeBlanc, Brendan H. Lee, Hane Lee, Shawn E. Levy, Richard A. Lewis, Sharyn A. Lincoln, Sandra K. Loo, Joseph Loscalzo, Richard L. Maas, Ellen F. Macnamara, Calum A. MacRae, Valerie V. Maduro, Marta M. Majcherska, May Christine V. Malicdan, Laura A. Mamounas, Teri A. Manolio, Thomas C. Markello, Ronit Marom, Martin G. Martin, Julian A. Martínez-Agosto, Shruti Marwaha, Thomas May, Allyn McConkie-Rosell, Colleen E. McCormack, Alexa T. McCray, Jason D. Merker, Thomas O. Metz, Matthew Might, Paolo M. Moretti, Marie Morimoto, John J. Mulvihill, David R. Murdock, Jennifer L. Murphy, Donna M. Muzny, Michele E. Nehrebecky, Stan F. Nelson, J. Scott Newberry, John H. Newman, Sarah K. Nicholas, Donna Novacic, Jordan S. Orange, James P. Orengo, J. Carl Pallais, Christina GS. Palmer, Jeanette C. Papp, Neil H. Parker, Loren DM. Pena, John A. Phillips III, Jennifer E. Posey, John H. Postlethwait, Lorraine Potocki, Barbara N. Pusey, Genecee Renteria, Chloe M. Reuter, Lynette Rives, Amy K. Robertson, Lance H. Rodan, Jill A. Rosenfeld, Jacinda B. Sampson, Susan L. Samson, Kelly Schoch, Daryl A. Scott, Lisa Shakachite, Prashant Sharma, Vandana Shashi, Rebecca Signer, Edwin K. Silverman, Janet S. Sinsheimer, Kevin S. Smith, Rebecca C. Spillmann, Joan M. Stoler, Nicholas Stong, Jennifer A. Sullivan, David A. Sweetser, Queenie K.-G. Tan, Cynthia J. Tifft, Camilo Toro, Alyssa A. Tran, Tiina K. Urv, Eric Vilain, Tiphanie P. Vogel, Daryl M. Waggott, Colleen E. Wahl, Nicole M. Walley, Chris A. Walsh, Melissa Walker, Jijun Wan, Michael F. Wangler, Patricia A. Ward, Katrina M. Waters, Bobbie-Jo M. Webb-Robertson, Monte Westerfield, Matthew T. Wheeler, Anastasia L. Wise, Lynne A. Wolfe, Elizabeth A. Worthey, Shinya Yamamoto, John Yang, Yaping Yang, Amanda J. Yoon, Guoyun Yu, Diane B. Zastrow, Chunli Zhao & Allison Zheng
McGill University, Montreal, Quebec, Canada
Jacek Majewski
University of Toronto, Toronto, Ontario, Canada
Michael Brudno

Authors

Laure Frésard
View author publications
You can also search for this author in PubMed Google Scholar
Craig Smail
View author publications
You can also search for this author in PubMed Google Scholar
Nicole M. Ferraro
View author publications
You can also search for this author in PubMed Google Scholar
Nicole A. Teran
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Kevin S. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Devon Bonner
View author publications
You can also search for this author in PubMed Google Scholar
Kristin D. Kernohan
View author publications
You can also search for this author in PubMed Google Scholar
Shruti Marwaha
View author publications
You can also search for this author in PubMed Google Scholar
Zachary Zappala
View author publications
You can also search for this author in PubMed Google Scholar
Brunilda Balliu
View author publications
You can also search for this author in PubMed Google Scholar
Joe R. Davis
View author publications
You can also search for this author in PubMed Google Scholar
Boxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cameron J. Prybol
View author publications
You can also search for this author in PubMed Google Scholar
Jennefer N. Kohler
View author publications
You can also search for this author in PubMed Google Scholar
Diane B. Zastrow
View author publications
You can also search for this author in PubMed Google Scholar
Chloe M. Reuter
View author publications
You can also search for this author in PubMed Google Scholar
Dianna G. Fisk
View author publications
You can also search for this author in PubMed Google Scholar
Megan E. Grove
View author publications
You can also search for this author in PubMed Google Scholar
Jean M. Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Taila Hartley
View author publications
You can also search for this author in PubMed Google Scholar
Ruchi Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J. Strober
View author publications
You can also search for this author in PubMed Google Scholar
Sowmithri Utiramerur
View author publications
You can also search for this author in PubMed Google Scholar
Lars Lind
View author publications
You can also search for this author in PubMed Google Scholar
Erik Ingelsson
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Battle
View author publications
You can also search for this author in PubMed Google Scholar
Gill Bejerano
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan A. Bernstein
View author publications
You can also search for this author in PubMed Google Scholar
Euan A. Ashley
View author publications
You can also search for this author in PubMed Google Scholar
Kym M. Boycott
View author publications
You can also search for this author in PubMed Google Scholar
Jason D. Merker
View author publications
You can also search for this author in PubMed Google Scholar
Matthew T. Wheeler
View author publications
You can also search for this author in PubMed Google Scholar
Stephen B. Montgomery
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Care4Rare Canada Consortium

Kym Boycott
, Alex MacKenzie
, Jacek Majewski
, Michael Brudno
, Dennis Bulman
& David Dyment

Contributions

S.B.M., M.T.W., J.D.M., E.A.A. and K.M.B. conceived and planned the experiments. K.S.M., D.B., J.N.K., D.B.Z., D.G.F., M.E.G., C.M.R., J.M.D. and R.J. contributed to sample preparation and case review. L.L. and E.I. provided phenotypic data together with blood RNA-seq of PIVUS control samples. S.M., X.L., K.K., R.J. and S.U. helped with processing the variant data. L.F., C.S., N.M.F., N.A.T., Z.Z., X.L., B.B., J.R.D. and B.L. carried out the analyses. K.D.K., B.J.S., A.B., G.B. and J.A.B. contributed to the interpretation of the results. K.D.K., T.H., C.J.P., D.B., J.N.K., D.Z., D.G.F. and M.E.G. performed the validation of results. L.F. and S.B.M. wrote the manuscript with support from C.S., N.M.F. and N.A.T. All authors provided critical feedback and helped shape the research, analysis and manuscript.

Corresponding authors

Correspondence to Laure Frésard or Stephen B. Montgomery.

Ethics declarations

Competing interests

J.D.M. is on Genoox Scientific advisory board and Rainbow Genomics Clinical advisory board and consults for Illumina. E.A.A. is co-founder of Personalis, DeepCell and advisor to Genome Medical and Sequence Bio. E.I. is a scientific advisor for Precision Wellness for work unrelated to the present project. S.B.M. is on the scientific advisory board for Prime Genomics.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Gene expression patterns across whole blood samples.

We used a total of 1,061 whole blood samples from our controls cohorts and rare disease samples. a, Density plot representing the proportion of annotated junctions covered per gene. Those are a subset of genes for which at least one junction is covered with at least five uniquely mapped reads across at least 20% of the samples. On average (blue dashed line) 86%, (median of 100%—red dashed line) of junctions fulfil those criteria. b, Percentage of genes from disease genes panels in which at least one junction is covered with at least five uniquely mapped reads in at least 20% of samples. We observe that about 50% of genes from OMIM, Neurology, Musculoskeletal, Ophthalmology or Hematology panels are fulfilling this criteria. c, Tolerance to different types of mutations (from ExAC) in function of the expression status in a single versus multiple tissues (two-sided Wilcoxon test, Pvalue ≤ 2 × 10⁻¹⁶). Analysis performed on 620 individuals from GTEx v.7 across 22 tissues. Boxplots represent median value, with lower and upper hinges corresponding to the 25th and 75th percentiles, and lower and upper whiskers extend from the hinge to the smallest and largest value at most 1.5× interquartile range of the hinge, respectively. Genes that are expressed in multiple tissues tend to be more sensitive to missense and LoF mutations. d, Number of LoF intolerant genes stratified by expression level in blood. We considered genes with pLI score ≥ 0.9 as LoF intolerant.

Source Data

Extended Data Fig. 2 Correction for batch effects: Expression data.

Analyses performed on n = 909 DGN samples and 143 rare diseases (cases and family controls). a, Plot of first two principal components run on uncorrected gene expression data. Samples are coloured by batch. Largest cluster (green dots) are DGN control samples (n = 909). b, Plot of first two principal components run on gene expression data after regressing out significant surrogate variables found by SVA. c, Correlation between known covariates and all significant surrogate variables (SVs). We observed that SV2 is highly correlated with the read type, and the sequencing technology corresponding to differences between DGN and the other samples.

Source Data

Extended Data Fig. 3 Use of regression splines in expression data normalization.

a, Normalized gene expression residuals from 1,052 samples in an example gene without correction (left panel), after regressing out significant surrogate variables (SVs) (middle panel) and significant SVs plus regression splines on top SVs significantly associated with batch and study (right panel). Residuals were plotted against SV2 for illustration purposes (SV2 is significantly associated with batch (P value < 1 × 10^–30, two-sided t-test from linear regression, no adjustment for multiple correction). b, Mean number of outlier genes per sample (n = 990) in each batch (absolute Z-score > 8) after correction with SVs (left panel) and SVs with regression splines (right panel). Standard deviation is displayed above each bar. Regression splines resulted in a more consistent number of outlier genes across samples in all batches. c, Benjamini & Hochberg adjusted P values resulting from a per-gene likelihood ratio test comparing linear regression model fit both with and without regression splines. Regression splines improve the model fit for 2,644 genes (P value ≤ 0.05,17.6% of all genes in dataset). Red dashed line indicates P value = 0.05 cutoff. d, Change in R², in decreasing order, across all genes in the dataset (n = 14,988) after correcting data using significant SVs with regression splines, compared to correcting data using significant SVs without regression splines. Mean change in R² is 0.036 (s.d. = 0.025).

Source Data

Extended Data Fig. 4 Impact of the number of controls on loss-of-function intolerance enrichment.

a, Enrichment of case (red, n = 64) under-expression outliers in LoF sensitive genes as we increase the number of controls (7,600 random subsets for each sample size indicated in legend). This enrichment was not observed for rare disease family member controls (gray, n = 34). b, Benjamini & Hochberg adjusted −log₁₀ P value associated with the enrichment at different number of controls (two-sided t-test, n = 64 cases). Horizontal line indicates 0.05 significance cutoff. The P values are decreasing as we increase the number of controls. When switching cases for controls (gray) we observed significant negative log odds when using the a smaller number of controls, but this trend disappeared when using the full set of 900 controls. For a and b, Boxplots represent median value, with lower and upper hinges corresponding to the 25th and 75th percentiles, and lower and upper whiskers extend from the hinge to the smallest and largest value at most 1.5× interquartile range of the hinge, respectively.

Extended Data Fig. 5 Percentage of samples left when filtering outliers.

Filters have various impacts on the number of samples with at least one candidate gene. By combining several layers of filters we are drastically reducing the number of candidate genes but also the number of samples for which we have candidates. We recommend to relax filter stringency after looking at sets of genes that match the most stringent criterion. a, Expression outliers. After filtering for outlier genes matching HPO terms, with a deleterious rare variant within 10 kb, we observed less than 2.6% of samples with over 25 candidate genes. b, Splicing outliers. When keeping only genes with HPO match, and a deleterious rare variant with 20 bp of the outlier junction, we observed less than 1.3% of samples with more than five candidate genes.

Source Data

Extended Data Fig. 6 Correction for batch effects - Splicing data.

Analyses performed on 65 PIVUS samples and 143 rare disease samples. a, Plot of first two principal components (PCs) run on uncorrected splicing ratio data. Samples are coloured by batch. We observed that PC1 was separating PIVUS controls samples (left) from rare disease samples (right). b, Plot of first two PCs on splicing ratios after regressing out PCs explained up to 95% of the variance in the data. Batches were no longer separated on the first PCs. c, Correlation between known covariates 10 first PCs. We observed that PC1 is highly correlated with the batch, whereas PCs 2 and 3 separated samples from one institution (batch 1, CHEO) from others. We also observed that PC1 is highly correlated with RIN, highlighting differences in quality across samples.

Source Data

Extended Data Fig. 7 Allele specific expression across rare disease samples.

a, Prevalence of ASE events in rare diseases samples (n = 112). Results are displayed separately for exome and genome sequencing. b, Difference in proportion of genes matching HPO terms for top 20 ASE outliers per case in comparison to random genes (100 random gene sets for each sample, n = 109 samples). Analysis performed for all genes, genes with pLI ≥ 0.9, genes with a rare variant (RV) and genes with a RV with CADD score ≥ 10. The top 20 ASE outlier genes are enriched for overlap with HPO-associated genes per case, regardless of the filters applied to the extreme ASE genes and background genes (**** P value ≤ 1 × 10⁻⁴, two-sided Wilcoxon test). For a and b, Boxplots represent median value, with lower and upper hinges corresponding to the 25th and 75th percentiles, and lower and upper whiskers extend from the hinge to the smallest and largest value at most 1.5× interquartile range of the hinge respectively. c, Rare deleterious variants are biased toward the alternative allele across all samples. A stop–gain variant was highly expressed in EFHD2 for one sample where there were matching symptoms.

Source Data

Extended Data Fig. 8 Diagnostic rate after analysis of 80 distinct cases.

a, Overview of cases. Solved: causal gene found and further validated. Strong candidate: Strong candidate after RNA-seq analysis (out of a subset of 30 affected individuals for which we have prior candidate genes information from literature). Unsolved: Other cases for which further investigation is needed. b, Percentage of cases for which prior candidate gene is in final set of filtered genes (outlier with deleterious rare variant in a gene linked to symptoms). Analysis was performed only on a subset of 30 cases for which we have prior candidate gene information and for which we have genetic information. Shuffling candidates corresponds to the percentage of cases for which we observe a prior candidate genes in the most stringent gene list when shuffling gene lists across individuals (10,000 permutations). On average, no match is found. Shuffling genes correspond to the percentage of prior candidate genes we observed within the final set of DNA-only filters when sampling from this list a matched number of genes corresponding to the expression filters. Average matched percentage is 4.1% after 10,000 permutations. Real data corresponds to the percentage of cases for which we found a candidate gene in the most stringent RNA-based filter set. We find a match for 7 affected samples out of 30, that is, 25.9 % of cases. There is significantly more match in real data in comparison to permuted data (two-sided Wilcoxon rank sum test, P value < 10^–5). Boxplots represent median value, with lower and upper hinges corresponding to the 25th and 75th percentiles, and lower and upper whiskers extend from the hinge to the smallest and largest value at most 1.5× interquartile range of the hinge, respectively.

Source Data

Extended Data Fig. 9 Identification of disease gene through expression outlier detection.

MECR case. a, Proband results. After our most stringent filter, there are 11 candidate genes left and MECR is rank 2nd by Z-score. b, Proband’s brother. After filtering, only 15 out of 1,099 expression outliers are left and MECR is ranked 10th in that list.

Source Data

Extended Data Fig. 10 Solved case without genetic data: ASAH1 case.

a, After filtering our detected splicing outliers for genes related to the phenotype (through HPO IDs), only eight genes were left, with ASAH1 being the most extreme outlier (Z-score = 3.9) and for which we previously confirmed the association with SMA-PME phenotype in the case. b, Sashimi plot of the case and 2 controls of the ASAH1 gene. For the case (red track), we observed an alternative transcript skipping exon 6 (supported by 142 reads). This pattern was never observed in controls.

Source Data

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–4

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Source Data Extended Data Fig. 10

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frésard, L., Smail, C., Ferraro, N.M. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med 25, 911–919 (2019). https://doi.org/10.1038/s41591-019-0457-8

Download citation

Received: 30 August 2018
Accepted: 15 April 2019
Published: 03 June 2019
Issue Date: June 2019
DOI: https://doi.org/10.1038/s41591-019-0457-8

This article is cited by

Identification of skewed X chromosome inactivation using exome and transcriptome sequencing in patients with suspected rare genetic disease
- Numrah Fadra
- Laura E Schultz-Rogers
- Eric W Klee
BMC Genomics (2024)
Genomes in clinical care
- Olaf Riess
- Marc Sturm
- Tobias Haack
npj Genomic Medicine (2024)
Next-generation sequencing and bioinformatics in rare movement disorders
- Michael Zech
- Juliane Winkelmann
Nature Reviews Neurology (2024)
From Mendel to multi-omics: shifting paradigms
- Tesfaye B. Mersha
European Journal of Human Genetics (2024)
The expanding diagnostic toolbox for rare genetic diseases
- Kristin D. Kernohan
- Kym M. Boycott
Nature Reviews Genetics (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Undiagnosed Diseases Network

Care4Rare Canada Consortium

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links