Primary immunodeficiency (PID) is characterized by recurrent and often life-threatening infections, autoimmunity and cancer, and it poses major diagnostic and therapeutic challenges. Although the most severe forms of PID are identified in early childhood, most patients present in adulthood, typically with no apparent family history and a variable clinical phenotype of widespread immune dysregulation: about 25% of patients have autoimmune disease, allergy is prevalent and up to 10% develop lymphoid malignancies1,2,3. Consequently, in sporadic (or non-familial) PID genetic diagnosis is difficult and the role of genetics is not well defined. Here we address these challenges by performing whole-genome sequencing in a large PID cohort of 1,318 participants. An analysis of the coding regions of the genome in 886 index cases of PID found that disease-causing mutations in known genes that are implicated in monogenic PID occurred in 10.3% of these patients, and a Bayesian approach (BeviMed4) identified multiple new candidate PID-associated genes, including IVNS1ABP. We also examined the noncoding genome, and found deletions in regulatory regions that contribute to disease causation. In addition, we used a genome-wide association study to identify loci that are associated with PID, and found evidence for the colocalization of—and interplay between—novel high-penetrance monogenic variants and common variants (at the PTPN2 and SOCS1 loci). This begins to explain the contribution of common variants to the variable penetrance and phenotypic complexity that are observed in PID. Thus, using a cohort-based whole-genome-sequencing approach in the diagnosis of PID can increase diagnostic yield and further our understanding of the key pathways that influence immune responsiveness in humans.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
WGS and phenotype data from participants are available from one of three data repositories, determined by the informed consent of the participant. (1) Data from participants enrolled in the NIHR BioResource for the 100,000 Genomes Project–Rare Diseases Pilot can be accessed via Genomics England (https://www.genomicsengland.co.uk/about-gecip/joining-research-community/); (2) data from the UK Biobank samples are available through a data release process overseen by UK Biobank (https://www.ukbiobank.ac.uk/); (3) data from the remaining NIHR BioResource participants are available from the European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) at the EMBL European Bioinformatics Institute (accession code EGAD00001004523). Patients all fall into group 3 and controls into groups 1–3. Variants listed in Supplementary Table 1 (diagnostic findings) have been submitted to ClinVar and are accessible under NIHR_Bioresource_Rare_Diseases_PID (https://www.ncbi.nlm.nih.gov/clinvar/?term=NIHR_Bioresource_Rare_Diseases_PID). Summary statistics will be made available through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics).
The R code for running major analyses is available at https://github.com/ollyburren/pid_thaventhiran_et_al.
Gathmann, B. et al. Clinical picture and treatment of 2212 patients with common variable immunodeficiency. J. Allergy Clin. Immunol. 134, 116–126 (2014).
Lenardo, M., Lo, B. & Lucas, C. L. Genomics of immune diseases and new therapies. Annu. Rev. Immunol. 34, 121–149 (2016).
Bousfiha, A. et al. The 2017 IUIS phenotypic classification for primary immunodeficiencies. J. Clin. Immunol. 38, 129–143 (2018).
Greene, D., Richardson, S. & Turro, E. A fast association test for identifying pathogenic variants involved in rare diseases. Am. J. Hum. Genet. 101, 104–114 (2017).
Casanova, J.-L. Human genetic basis of interindividual variability in the course of infection. Proc. Natl Acad. Sci. USA 112, E7118–E7127 (2015).
Edgar, J. D. M. et al. The United Kingdom Primary Immune Deficiency (UKPID) registry: report of the first 4 years’ activity 2008–2012. Clin. Exp. Immunol. 175, 68–78 (2014).
Pan-Hammarström, Q. et al. Reexamining the role of TACI coding variants in common variable immunodeficiency and selective IgA deficiency. Nat. Genet. 39, 429–430 (2007).
Li, J. et al. Association of CLEC16A with human common variable immunodeficiency disorder and role in murine B cells. Nat. Commun. 6, 6804 (2015).
Shillitoe, B. et al. The United Kingdom Primary Immune Deficiency (UKPID) registry 2012 to 2017. Clin. Exp. Immunol. 192, 284–291 (2018).
Bousfiha, A. et al. The 2015 IUIS phenotypic classification for primary immunodeficiencies. J. Clin. Immunol. 35, 727–738 (2015).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).
van de Veerdonk, F. L. et al. STAT1 mutations in autosomal dominant chronic mucocutaneous candidiasis. N. Engl. J. Med. 365, 54–61 (2011).
Liu, L. et al. Gain-of-function human STAT1 mutations impair IL-17 immunity and underlie chronic mucocutaneous candidiasis. J. Exp. Med. 208, 1635–1648 (2011).
Breuer, O. et al. Autosomal dominant gain of function STAT1 mutation and severe bronchiectasis. Respir. Med. 126, 39–45 (2017).
Toubiana, J. et al. Heterozygous STAT1 gain-of-function mutations underlie an unexpectedly broad clinical phenotype. Blood 127, 3154–3164 (2016).
Fliegauf, M. et al. Haploinsufficiency of the NF-κB1 subunit p50 in common variable immunodeficiency. Am. J. Hum. Genet. 97, 389–403 (2015).
Kuijpers, T. W. et al. Combined immunodeficiency with severe inflammation and allergy caused by ARPC1B deficiency. J. Allergy Clin. Immunol. 140, 273–277.e10 (2017).
Tuijnenburg, P. et al. Loss-of-function nuclear factor κB subunit 1 (NFKB1) variants are the most common monogenic cause of common variable immunodeficiency in Europeans. J. Allergy Clin. Immunol. 142, 1285–1296 (2018).
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Kahr, W. H. A. et al. Loss of the Arp2/3 complex component ARPC1B causes platelet abnormalities and predisposes to inflammatory disease. Nat. Commun. 8, 14816 (2017).
Burns, S., Cory, G. O., Vainchenker, W. & Thrasher, A. J. Mechanisms of WASp-mediated hematologic and immunologic disease. Blood 104, 3454–3462 (2004).
Engelhardt, K. R. et al. Identification of heterozygous single- and multi-exon deletions in IL7R by whole exome sequencing. J. Clin. Immunol. 37, 42–50 (2017).
Schepp, J. et al. Deficiency of adenosine deaminase 2 causes antibody deficiency. J. Clin. Immunol. 36, 179–186 (2016).
Salzer, U. et al. Mutations in TNFRSF13B encoding TACI are associated with common variable immunodeficiency in humans. Nat. Genet. 37, 820–828 (2005).
Alexander, W. S. et al. SOCS1 is a critical inhibitor of interferon γ signaling and prevents the potentially fatal neonatal actions of this cytokine. Cell 98, 597–608 (1999).
Sobreira, N., Schiettecatte, F., Valle, D. & Hamosh, A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum. Mutat. 36, 928–930 (2015).
Starr, R. et al. Liver degeneration and lymphoid deficiencies in mice lacking suppressor of cytokine signaling-1. Proc. Natl Acad. Sci. USA 95, 14395–14399 (1998).
Horino, J. et al. Suppressor of cytokine signaling-1 ameliorates dextran sulfate sodium-induced colitis in mice. Int. Immunol. 20, 753–762 (2008).
Yoshida, T. et al. SOCS1 is a suppressor of liver fibrosis and hepatitis-induced carcinogenesis. J. Exp. Med. 199, 1701–1707 (2004).
Bourdeau, A. et al. TC-PTP-deficient bone marrow stromal cells fail to support normal B lymphopoiesis due to abnormal secretion of interferon-γ. Blood 109, 4220–4228 (2007).
You-Ten, K. E. et al. Impaired bone marrow microenvironment and immune function in T cell protein tyrosine phosphatase-deficient mice. J. Exp. Med. 186, 683–693 (1997).
Wiede, F., Sacirbegovic, F., Leong, Y. A., Yu, D. & Tiganis, T. PTPN2-deficiency exacerbates T follicular helper cell and B cell responses and promotes the development of autoimmunity. J. Autoimmun. 76, 85–100 (2017).
Forbes, L. R. et al. Jakinibs for the treatment of immune dysregulation in patients with gain-of-function signal transducer and activator of transcription 1 (STAT1) or STAT3 mutations. J. Allergy Clin. Immunol. 142, 1665–1669 (2018).
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Okada, Y. et al. Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat. Genet. 44, 511–516 (2012).
Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature https://doi.org/10.1038/s41586-020-2434-2 (2020).
Carss, K. J. et al. Comprehensive rare variant analysis via whole-genome sequencing to determine the molecular pathology of inherited retinal disease. Am. J. Hum. Genet. 100, 75–90 (2017).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Burren, O. S. et al. Chromosome contacts in activated T cells identify autoimmune disease candidate genes. Genome Biol. 18, 165 (2017).
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).
The Wellcome Trust Case Control Consortium. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Hou, T. Z. et al. Identifying functional defects in patients with immune dysregulation due to LRBA and CTLA-4 mutations. Blood 129, 1458–1468 (2017).
The NBR-RD PID Consortium is part of the NIHR BioResource, for which funding was provided by the NIHR (NIHR, grant number RG65966). We acknowledge the participation of all NIHR BioResource volunteers, and thank the NIHR BioResource centre and staff for their contribution. J.E.D.T. is supported by the Medical Research Council (MRC) (RG95376 and MR/L006197/1); A.J.T. is supported by the Wellcome Trust (104807/Z/14/Z) and the NIHR Biomedical Research Centre at Great Ormond Street Hospital for Children NHS Foundation Trust and University College London; K.G.C.S. is supported by the MRC (programme grant MR/L019027) and is a Wellcome Investigator; A.J.C. was supported by the Wellcome Trust (091157/Z/10/Z, 107212/Z/15/Z, 100140/Z/12/Z, 203141/Z/16/Z), JDRF (9-2011-253, 5-SRA-2015-130-A-N), NIHR Oxford Biomedical Research Centre and NIHR Cambridge Biomedical Research Centre; E.E. has received funding from the European Union Seventh Framework Programme (FP7-PEOPLE-2013-COFUND) under grant agreement number 609020- Scientia Fellows; E.R. is supported by the Wellcome Trust (201250/Z/16/Z); D.E. is supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the e:Med research and funding concept (SysInflame grant 01ZX1306A; GB-XMAP grant 01ZX1709) and funded by the Deutsche Forschungsgemeinschaft (DFG; German Research Foundation) under Germany’s Excellence Strategy (EXC 2167-390884018). The NIHR Cambridge Biomedical Research Centre (BRC) is a partnership between Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge, funded by the NIHR. This research was co-funded by the support listed above and the NIHR Cambridge BRC.
The authors declare no competing interests.
Peer review information Nature thanks Luigi Notarangelo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Summary of the WGS analysis approach and findings.
Extended Data Fig. 2 Genetic testing in the PID cohort before WGS recruitment, in sporadic versus familial cases.
Any type of genetic test is included; for example, single-exon or gene sequencing, MLPA, or targeted gene-panel or exome sequencing. The information was supplied on the referral form and is probably an underestimate of the number of patients who had additional genetic testing.
Extended Data Fig. 3 BeviMed simulation study of positive predictive value with increasing size of the disease cohort.
We simulated genotypes at 25 rare-variant sites in a hypothetical locus among 20,000 controls and a further 1,000, 2,000, 3,000, 4,000 or 5,000 cases. We simulated that 0.2%, 0.3%, 0.4% or 0.5% of the cases had the hypothetical locus as their causal locus. We distinguish between cases that are caused by the hypothetical locus (CHLs) and cases that are caused by other loci (COLs). The allele frequency of 20 variants was set to 1/10,000 in the cases and COLs. The allele frequency of the remaining five variants was set to zero in the controls and COLs. One of the five variants was assigned a heterozygous genotype among the CHLs at random. Thus, we represent a dominant disorder caused by variants with full penetrance. As inference is typically performed across thousands of loci, with only a small number being causal, we assumed a mixture of 100 to 1 non-causal to causal loci. To compute the positive predictive value (PPV) for a given threshold on the PPA, we computed PPAs for 10,000 datasets without permutation of the case or control labels and 10,000 more datasets with a permutation of the case or control labels. We then sampled 1,000 PPAs from the permuted set and 10 PPAs from the non-permuted set to compute the PPV obtained when the posterior probability threshold was set to achieve 100% power. The mean over 2,000 repetitions of this procedure is shown on the y axis. The x axis shows the number of cases in a hypothetical cohort. As the number of cases increases from 1,000 to 5,000, the PPV increases above 87.5% irrespective of the proportion of cases with the same genetic aetiology. This demonstrates the utility of expanding the size of the PID case collection for detecting even very rare aetiologies that result in the same broad phenotype as cases with different aetiologies. In practice, the PPV–power relationship may be much better, as the wealth of phenotypic information of the cases can allow subcategorization of cases to better approximate shared genetic aetiologies.
a, Filtering strategy to identify cHET pathogenic variants that consist of a rare coding variant in a PID-associated gene and a deletion of a CRE for the same gene. b, Regional plot of the cHET variants. Gene annotations are taken from Ensembl release 75, and the transcripts shown are those with mRNA identifiers in RefSeq (ENST00000357115 and ENST00000510413). The position of each variant relative to the gene transcript is shown by a red bar, with the longer bar indicating the extent of the deleted region. Variant coordinates are shown for the GRCh37 genome build. c, Pedigree of the patient with LRBA variants, showing the phase of the causal variants. d, FACS dot plot of CTLA-4 and FOXP3 expression in the patient with LRBA cHET variants and in a healthy control individual (representative of two independent experiments). Numbers in black are the percentage in each quadrant; numbers in red are the MFI of CTLA-4 staining in FOXP-negative and FOXP3-positive cells. e, Normalized CTLA-4 expression, assessed as previously described54, in the patient with LRBA cHET variants (n = 1), in healthy control individuals (n = 8) and in patients deficient in CTLA-4 (n = 4) or LRBA (n = 3) (positive controls). Data are mean ± s.e.m.
a, Regional plot of the cHET variants. Gene annotations are taken from Ensembl release 75, and the transcripts shown are those with mRNA identifiers in RefSeq (ENST00000432829 and ENST00000469391). The position of each variant relative to the gene transcript is shown by a red bar, with the longer bar indicating the extent of the deleted region. Variant coordinates are shown for the GRCh37 genome build. b, Photographs of the extensive human papilloma virus (HPV)-associated wart infection in the patient with DOCK8 cHET variants. c, cHET variant phasing. Top, cartoon representation of phasing using high-quality heterozygous calls from short-read WGS data and long-read nanopore sequencing data. Bottom, WGS and nanopore data from the patient with DOCK8 variants. The two variants (large deletion and missense substitution) are shown in the bottom track (orange), and a single phase block (green) that spans the entire region between the two variants confirmed them to be in trans. d, Dye-dilution proliferation assessment in response to phytohaemagglutinin (PHA) and anti-CD3/anti-CD28 beads in CD4+ T cells and CD8+ T cells in cells from the patient and from a control individual (representative of two independent experiments). Staining was performed with carboxyfluorescein succinimidyl ester (CFSE) dye (Invitrogen) with the same additional fluorochrome markers as described in ‘Flow cytometry’ in the Methods.
a–c, GWASs for the all-PID cohort with MAF > 0.05 (a), the AD-PID cohort with MAF > 0.05 (b) and the AD-PID cohort with 0.005 < MAF < 0.05 (c). Sample sizes: all-PID cases n = 886; AD-PID cases n = 733; control individuals n = 9,225. Each point represents an individual SNP association P value, adjusted for genomic inflation. Only signals with P < 1 × 10−2 are shown. None of the SNPs in c appear in the results of the common-variant GWAS in b, and are therefore additional signals gained from a GWAS that included variants of intermediate MAF. Red and blue lines represent genome-wide (P < 5 × 10−8) and suggestive (P < 1 × 10−5) associations, respectively. Note the additional signal of genome-wide significance that represents the TNFRSF13B locus, and several suggestive associations that only become apparent with variants in the 0.005–0.5 MAF range in c. Suggestive loci are indicated by the rsID of the lead SNP in each chromosome. Note that lead SNPs in the AD-PID GWAS (b) may differ from lead SNPs in the meta-analysis.
Sample sizes: n = 733 cases; n = 9,225 controls. a, LocusZoom association plots of the results of the initial (top) and conditional (middle, bottom) analyses of the MHC locus in the AD-PID GWAS. The x axis and the left y axis represent the chromosomal position and the −log10 of the association P value, respectively. Each point represents an analysed SNP, with the lead SNP indicated by a purple diamond and all other points coloured according to the strength of their linkage disequilibrium with the lead SNP. Purple lines represent HapMap CEU population recombination hotspots. A selection of genes in the region is shown at the very bottom, with over 150 genes omitted. Top, association plot of the most significant signal, rs1265053, which is in the class I region and close to the HLA-B and HLA-C genes. Middle, plot showing the association signal that remains after conditioning on rs1265053, with the strongest signal rs9273841 mapping to the class II region, close to the HLA-DRB1 and HLA-DQA1 genes. Bottom, plot showing the association signal that remains after conditioning on both rs1265053 and rs9273841. b, c, MHC locus conditional analyses of the classical HLA alleles (b) and amino acids of individual HLA genes (c). Each point represents a single imputed classical allele or amino acid, with those marked in red indicating those added as covariates to the logistic regression model: the class I signal (second row), the class II signal (third row) and both class I and class II signals (bottom row). The HLA allele and amino acid shown in the bottom plots are those with the lowest P value remaining after conditioning on both class I and class II signals; as there are no genome-wide-significant signals remaining, the results suggest that there are two independent signals at the MHC locus. d, Protein modelling of two independent MHC locus signals—HLA-DRB1 residue E71 and HLA-B residue N114—using PDB 1BX2 and PDB 4QRQ, respectively. The protein is depicted in white, the highlighted residue in red and the peptide in green.
This file contains: Strategy Outline that provides the rationale for the prioritisation approaches undertaken within the study; Supplementary Methods that provide additional recruitment and ethics details; five Supplementary Notes clarifying clinical information and providing further details of some of the analytical strategies described in the manuscript; and Supplementary Figure 1, which contains the raw images of Western blots and gating strategies for B and T cell immunophenotyping experiments.
This table contains the list of PID genes and clinically relevant variants identified in the NBR-PID cohort, phenotypes of the patients with reported variants, and OMIM phenotypes associated with defects in the same genes.
This table is a comprehensively annotated list of variants that contribute to the BeviMed prioritisation of candidate genes as causes of PID.
This table lists candidate pathogenic non-coding deletions of a conserved regulatory element (CRE) of a known PID gene, in which the same patient also has a rare damaging exonic variant. Compound heterozygosity (cHET) for a non-coding CRE deletion and a coding variant could lead to effectively absent gene expression and a recessively inherited disease.
This table lists candidate PID genes within the AD-PID GWAS loci, prioritised through capture-HiC omnibus gene score (COGS) approach.
About this article
Cite this article
Thaventhiran, J.E.D., Lango Allen, H., Burren, O.S. et al. Whole-genome sequencing of a sporadic primary immunodeficiency cohort. Nature 583, 90–95 (2020). https://doi.org/10.1038/s41586-020-2265-1
Human Genetics (2020)