Whole-genome sequencing of a sporadic primary immunodeficiency cohort


Primary immunodeficiency (PID) is characterized by recurrent and often life-threatening infections, autoimmunity and cancer, and it poses major diagnostic and therapeutic challenges. Although the most severe forms of PID are identified in early childhood, most patients present in adulthood, typically with no apparent family history and a variable clinical phenotype of widespread immune dysregulation: about 25% of patients have autoimmune disease, allergy is prevalent and up to 10% develop lymphoid malignancies1,2,3. Consequently, in sporadic (or non-familial) PID genetic diagnosis is difficult and the role of genetics is not well defined. Here we address these challenges by performing whole-genome sequencing in a large PID cohort of 1,318 participants. An analysis of the coding regions of the genome in 886 index cases of PID found that disease-causing mutations in known genes that are implicated in monogenic PID occurred in 10.3% of these patients, and a Bayesian approach (BeviMed4) identified multiple new candidate PID-associated genes, including IVNS1ABP. We also examined the noncoding genome, and found deletions in regulatory regions that contribute to disease causation. In addition, we used a genome-wide association study to identify loci that are associated with PID, and found evidence for the colocalization of—and interplay between—novel high-penetrance monogenic variants and common variants (at the PTPN2 and SOCS1 loci). This begins to explain the contribution of common variants to the variable penetrance and phenotypic complexity that are observed in PID. Thus, using a cohort-based whole-genome-sequencing approach in the diagnosis of PID can increase diagnostic yield and further our understanding of the key pathways that influence immune responsiveness in humans.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Description of the immunodeficiency cohort and disease associations in coding regions.
Fig. 2: Discovery of novel PID-associated genes in a large-cohort WGS analysis.
Fig. 3: Using the WGS data to detect regulatory region deletions that are implicated in PID.
Fig. 4: AD-PID GWAS identifies common variants that mediate disease risk and suggests monogenic candidate genes.

Data availability

WGS and phenotype data from participants are available from one of three data repositories, determined by the informed consent of the participant. (1) Data from participants enrolled in the NIHR BioResource for the 100,000 Genomes Project–Rare Diseases Pilot can be accessed via Genomics England (https://www.genomicsengland.co.uk/about-gecip/joining-research-community/); (2) data from the UK Biobank samples are available through a data release process overseen by UK Biobank (https://www.ukbiobank.ac.uk/); (3) data from the remaining NIHR BioResource participants are available from the European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) at the EMBL European Bioinformatics Institute (accession code EGAD00001004523). Patients all fall into group 3 and controls into groups 1–3. Variants listed in Supplementary Table 1 (diagnostic findings) have been submitted to ClinVar and are accessible under NIHR_Bioresource_Rare_Diseases_PID (https://www.ncbi.nlm.nih.gov/clinvar/?term=NIHR_Bioresource_Rare_Diseases_PID). Summary statistics will be made available through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics).

Code availability

The R code for running major analyses is available at https://github.com/ollyburren/pid_thaventhiran_et_al.


  1. 1.

    Gathmann, B. et al. Clinical picture and treatment of 2212 patients with common variable immunodeficiency. J. Allergy Clin. Immunol. 134, 116–126 (2014).

    PubMed  Google Scholar 

  2. 2.

    Lenardo, M., Lo, B. & Lucas, C. L. Genomics of immune diseases and new therapies. Annu. Rev. Immunol. 34, 121–149 (2016).

    CAS  PubMed  Google Scholar 

  3. 3.

    Bousfiha, A. et al. The 2017 IUIS phenotypic classification for primary immunodeficiencies. J. Clin. Immunol. 38, 129–143 (2018).

    PubMed  Google Scholar 

  4. 4.

    Greene, D., Richardson, S. & Turro, E. A fast association test for identifying pathogenic variants involved in rare diseases. Am. J. Hum. Genet. 101, 104–114 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Casanova, J.-L. Human genetic basis of interindividual variability in the course of infection. Proc. Natl Acad. Sci. USA 112, E7118–E7127 (2015).

    CAS  PubMed  Google Scholar 

  6. 6.

    Edgar, J. D. M. et al. The United Kingdom Primary Immune Deficiency (UKPID) registry: report of the first 4 years’ activity 2008–2012. Clin. Exp. Immunol. 175, 68–78 (2014).

    CAS  PubMed  Google Scholar 

  7. 7.

    Pan-Hammarström, Q. et al. Reexamining the role of TACI coding variants in common variable immunodeficiency and selective IgA deficiency. Nat. Genet. 39, 429–430 (2007).

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Li, J. et al. Association of CLEC16A with human common variable immunodeficiency disorder and role in murine B cells. Nat. Commun. 6, 6804 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Shillitoe, B. et al. The United Kingdom Primary Immune Deficiency (UKPID) registry 2012 to 2017. Clin. Exp. Immunol. 192, 284–291 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Bousfiha, A. et al. The 2015 IUIS phenotypic classification for primary immunodeficiencies. J. Clin. Immunol. 35, 727–738 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    van de Veerdonk, F. L. et al. STAT1 mutations in autosomal dominant chronic mucocutaneous candidiasis. N. Engl. J. Med. 365, 54–61 (2011).

    PubMed  Google Scholar 

  14. 14.

    Liu, L. et al. Gain-of-function human STAT1 mutations impair IL-17 immunity and underlie chronic mucocutaneous candidiasis. J. Exp. Med. 208, 1635–1648 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Breuer, O. et al. Autosomal dominant gain of function STAT1 mutation and severe bronchiectasis. Respir. Med. 126, 39–45 (2017).

    PubMed  Google Scholar 

  16. 16.

    Toubiana, J. et al. Heterozygous STAT1 gain-of-function mutations underlie an unexpectedly broad clinical phenotype. Blood 127, 3154–3164 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Fliegauf, M. et al. Haploinsufficiency of the NF-κB1 subunit p50 in common variable immunodeficiency. Am. J. Hum. Genet. 97, 389–403 (2015).

    CAS  Google Scholar 

  18. 18.

    Kuijpers, T. W. et al. Combined immunodeficiency with severe inflammation and allergy caused by ARPC1B deficiency. J. Allergy Clin. Immunol. 140, 273–277.e10 (2017).

    PubMed  Google Scholar 

  19. 19.

    Tuijnenburg, P. et al. Loss-of-function nuclear factor κB subunit 1 (NFKB1) variants are the most common monogenic cause of common variable immunodeficiency in Europeans. J. Allergy Clin. Immunol. 142, 1285–1296 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

    CAS  PubMed  Google Scholar 

  21. 21.

    Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Kahr, W. H. A. et al. Loss of the Arp2/3 complex component ARPC1B causes platelet abnormalities and predisposes to inflammatory disease. Nat. Commun. 8, 14816 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Burns, S., Cory, G. O., Vainchenker, W. & Thrasher, A. J. Mechanisms of WASp-mediated hematologic and immunologic disease. Blood 104, 3454–3462 (2004).

    CAS  PubMed  Google Scholar 

  24. 24.

    Engelhardt, K. R. et al. Identification of heterozygous single- and multi-exon deletions in IL7R by whole exome sequencing. J. Clin. Immunol. 37, 42–50 (2017).

    CAS  PubMed  Google Scholar 

  25. 25.

    Schepp, J. et al. Deficiency of adenosine deaminase 2 causes antibody deficiency. J. Clin. Immunol. 36, 179–186 (2016).

    CAS  PubMed  Google Scholar 

  26. 26.

    Salzer, U. et al. Mutations in TNFRSF13B encoding TACI are associated with common variable immunodeficiency in humans. Nat. Genet. 37, 820–828 (2005).

    CAS  PubMed  Google Scholar 

  27. 27.

    Alexander, W. S. et al. SOCS1 is a critical inhibitor of interferon γ signaling and prevents the potentially fatal neonatal actions of this cytokine. Cell 98, 597–608 (1999).

    CAS  PubMed  Google Scholar 

  28. 28.

    Sobreira, N., Schiettecatte, F., Valle, D. & Hamosh, A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum. Mutat. 36, 928–930 (2015).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Starr, R. et al. Liver degeneration and lymphoid deficiencies in mice lacking suppressor of cytokine signaling-1. Proc. Natl Acad. Sci. USA 95, 14395–14399 (1998).

    ADS  CAS  PubMed  Google Scholar 

  30. 30.

    Horino, J. et al. Suppressor of cytokine signaling-1 ameliorates dextran sulfate sodium-induced colitis in mice. Int. Immunol. 20, 753–762 (2008).

    CAS  PubMed  Google Scholar 

  31. 31.

    Yoshida, T. et al. SOCS1 is a suppressor of liver fibrosis and hepatitis-induced carcinogenesis. J. Exp. Med. 199, 1701–1707 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Bourdeau, A. et al. TC-PTP-deficient bone marrow stromal cells fail to support normal B lymphopoiesis due to abnormal secretion of interferon-γ. Blood 109, 4220–4228 (2007).

    CAS  PubMed  Google Scholar 

  33. 33.

    You-Ten, K. E. et al. Impaired bone marrow microenvironment and immune function in T cell protein tyrosine phosphatase-deficient mice. J. Exp. Med. 186, 683–693 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Wiede, F., Sacirbegovic, F., Leong, Y. A., Yu, D. & Tiganis, T. PTPN2-deficiency exacerbates T follicular helper cell and B cell responses and promotes the development of autoimmunity. J. Autoimmun. 76, 85–100 (2017).

    CAS  PubMed  Google Scholar 

  35. 35.

    Forbes, L. R. et al. Jakinibs for the treatment of immune dysregulation in patients with gain-of-function signal transducer and activator of transcription 1 (STAT1) or STAT3 mutations. J. Allergy Clin. Immunol. 142, 1665–1669 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Okada, Y. et al. Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat. Genet. 44, 511–516 (2012).

    CAS  PubMed  Google Scholar 

  38. 38.

    The NIHR BioResource. Whole-genome sequencing of rare disease patients in a national healthcare system. Preprint at biorXiv https://doi.org/10.1101/507244 (2020).

  39. 39.

    Carss, K. J. et al. Comprehensive rare variant analysis via whole-genome sequencing to determine the molecular pathology of inherited retinal disease. Am. J. Hum. Genet. 100, 75–90 (2017).

    CAS  PubMed  Google Scholar 

  40. 40.

    McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  PubMed  MATH  Google Scholar 

  45. 45.

    Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Burren, O. S. et al. Chromosome contacts in activated T cells identify autoimmune disease candidate genes. Genome Biol. 18, 165 (2017).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    PubMed Central  Google Scholar 

  49. 49.

    Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).

    PubMed  Google Scholar 

  50. 50.

    The Wellcome Trust Case Control Consortium. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

    PubMed Central  Google Scholar 

  51. 51.

    Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Hou, T. Z. et al. Identifying functional defects in patients with immune dysregulation due to LRBA and CTLA-4 mutations. Blood 129, 1458–1468 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


The NBR-RD PID Consortium is part of the NIHR BioResource, for which funding was provided by the NIHR (NIHR, grant number RG65966). We acknowledge the participation of all NIHR BioResource volunteers, and thank the NIHR BioResource centre and staff for their contribution. J.E.D.T. is supported by the Medical Research Council (MRC) (RG95376 and MR/L006197/1); A.J.T. is supported by the Wellcome Trust (104807/Z/14/Z) and the NIHR Biomedical Research Centre at Great Ormond Street Hospital for Children NHS Foundation Trust and University College London; K.G.C.S. is supported by the MRC (programme grant MR/L019027) and is a Wellcome Investigator; A.J.C. was supported by the Wellcome Trust (091157/Z/10/Z, 107212/Z/15/Z, 100140/Z/12/Z, 203141/Z/16/Z), JDRF (9-2011-253, 5-SRA-2015-130-A-N), NIHR Oxford Biomedical Research Centre and NIHR Cambridge Biomedical Research Centre; E.E. has received funding from the European Union Seventh Framework Programme (FP7-PEOPLE-2013-COFUND) under grant agreement number 609020- Scientia Fellows; E.R. is supported by the Wellcome Trust (201250/Z/16/Z); D.E. is supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the e:Med research and funding concept (SysInflame grant 01ZX1306A; GB-XMAP grant 01ZX1709) and funded by the Deutsche Forschungsgemeinschaft (DFG; German Research Foundation) under Germany’s Excellence Strategy (EXC 2167-390884018). The NIHR Cambridge Biomedical Research Centre (BRC) is a partnership between Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge, funded by the NIHR. This research was co-funded by the support listed above and the NIHR Cambridge BRC.

Author information





J.E.D.T., E.S., J.S., Z.Z., W.R., N.S.G., P.T., E.R. and A.J.C. carried out experiments; H.L.A., O.S.B., J.E.D.T., J.H.R.F., D.G., I.S., C.J.P., S.V.V.D., A.S.-J., J.M., J.S., P.A.L., A.G.L., K.M., E.E., D.E., S.F.J., T.H.K. and E.T. performed computational analysis of the data; H.L.A., I.S., C.J.P., M.B., C. Samarghitean, R.L., P.J.R.-M., J.S. and K.E.S. conducted sample and data processing; J.E.D.T., E.S., W.R., M.J.T., R.B.S., P.G., H.E.B., A.W., S.H., R.L., M.S.B., K.C.G., D.S.K., A.C., D.E., A. Herwadkar, N.C., S. Grigoriadou, A. Huissoon., S. Goddard, S.J., C. Schuetz, F.B., S.S., S.O.B., T.W.K., W.H.O., S.L.S, J.D.M.E and A.J.T. recruited patients, provided clinical phenotype data and confirmed genetic diagnosis. All authors contributed to the analysis of the presented results. K.G.C.S., J.E.D.T., H.L.A., W.R. and O.S.B. wrote the paper with input from all other authors. K.G.C.S., W.H.O., A.J.T. and T.W.K. conceived the research programme. K.G.C.S. supervised the work, with input from T.W.K., E.T., W.H.O. and A.J.T. D.G., E.S., Z.Z. and J.H.R.F. contributed equally as second authors.

Corresponding authors

Correspondence to James E. D. Thaventhiran or Kenneth G. C. Smith.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Luigi Notarangelo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Graphical abstract.

Summary of the WGS analysis approach and findings.

Extended Data Fig. 2 Genetic testing in the PID cohort before WGS recruitment, in sporadic versus familial cases.

Any type of genetic test is included; for example, single-exon or gene sequencing, MLPA, or targeted gene-panel or exome sequencing. The information was supplied on the referral form and is probably an underestimate of the number of patients who had additional genetic testing.

Extended Data Fig. 3 BeviMed simulation study of positive predictive value with increasing size of the disease cohort.

We simulated genotypes at 25 rare-variant sites in a hypothetical locus among 20,000 controls and a further 1,000, 2,000, 3,000, 4,000 or 5,000 cases. We simulated that 0.2%, 0.3%, 0.4% or 0.5% of the cases had the hypothetical locus as their causal locus. We distinguish between cases that are caused by the hypothetical locus (CHLs) and cases that are caused by other loci (COLs). The allele frequency of 20 variants was set to 1/10,000 in the cases and COLs. The allele frequency of the remaining five variants was set to zero in the controls and COLs. One of the five variants was assigned a heterozygous genotype among the CHLs at random. Thus, we represent a dominant disorder caused by variants with full penetrance. As inference is typically performed across thousands of loci, with only a small number being causal, we assumed a mixture of 100 to 1 non-causal to causal loci. To compute the positive predictive value (PPV) for a given threshold on the PPA, we computed PPAs for 10,000 datasets without permutation of the case or control labels and 10,000 more datasets with a permutation of the case or control labels. We then sampled 1,000 PPAs from the permuted set and 10 PPAs from the non-permuted set to compute the PPV obtained when the posterior probability threshold was set to achieve 100% power. The mean over 2,000 repetitions of this procedure is shown on the y axis. The x axis shows the number of cases in a hypothetical cohort. As the number of cases increases from 1,000 to 5,000, the PPV increases above 87.5% irrespective of the proportion of cases with the same genetic aetiology. This demonstrates the utility of expanding the size of the PID case collection for detecting even very rare aetiologies that result in the same broad phenotype as cases with different aetiologies. In practice, the PPV–power relationship may be much better, as the wealth of phenotypic information of the cases can allow subcategorization of cases to better approximate shared genetic aetiologies.

Extended Data Fig. 4 Candidate cHET filtering strategy and patient with LRBA variants.

a, Filtering strategy to identify cHET pathogenic variants that consist of a rare coding variant in a PID-associated gene and a deletion of a CRE for the same gene. b, Regional plot of the cHET variants. Gene annotations are taken from Ensembl release 75, and the transcripts shown are those with mRNA identifiers in RefSeq (ENST00000357115 and ENST00000510413). The position of each variant relative to the gene transcript is shown by a red bar, with the longer bar indicating the extent of the deleted region. Variant coordinates are shown for the GRCh37 genome build. c, Pedigree of the patient with LRBA variants, showing the phase of the causal variants. d, FACS dot plot of CTLA-4 and FOXP3 expression in the patient with LRBA cHET variants and in a healthy control individual (representative of two independent experiments). Numbers in black are the percentage in each quadrant; numbers in red are the MFI of CTLA-4 staining in FOXP-negative and FOXP3-positive cells. e, Normalized CTLA-4 expression, assessed as previously described54, in the patient with LRBA cHET variants (n = 1), in healthy control individuals (n = 8) and in patients deficient in CTLA-4 (n = 4) or LRBA (n = 3) (positive controls). Data are mean ± s.e.m.

Extended Data Fig. 5 Patient with DOCK8 cHET variants.

a, Regional plot of the cHET variants. Gene annotations are taken from Ensembl release 75, and the transcripts shown are those with mRNA identifiers in RefSeq (ENST00000432829 and ENST00000469391). The position of each variant relative to the gene transcript is shown by a red bar, with the longer bar indicating the extent of the deleted region. Variant coordinates are shown for the GRCh37 genome build. b, Photographs of the extensive human papilloma virus (HPV)-associated wart infection in the patient with DOCK8 cHET variants. c, cHET variant phasing. Top, cartoon representation of phasing using high-quality heterozygous calls from short-read WGS data and long-read nanopore sequencing data. Bottom, WGS and nanopore data from the patient with DOCK8 variants. The two variants (large deletion and missense substitution) are shown in the bottom track (orange), and a single phase block (green) that spans the entire region between the two variants confirmed them to be in trans. d, Dye-dilution proliferation assessment in response to phytohaemagglutinin (PHA) and anti-CD3/anti-CD28 beads in CD4+ T cells and CD8+ T cells in cells from the patient and from a control individual (representative of two independent experiments). Staining was performed with carboxyfluorescein succinimidyl ester (CFSE) dye (Invitrogen) with the same additional fluorochrome markers as described in ‘Flow cytometry’ in the Methods.

Extended Data Fig. 6 Manhattan plots of GWAS results.

ac, GWASs for the all-PID cohort with MAF > 0.05 (a), the AD-PID cohort with MAF > 0.05 (b) and the AD-PID cohort with 0.005 < MAF < 0.05 (c). Sample sizes: all-PID cases n = 886; AD-PID cases n = 733; control individuals n = 9,225. Each point represents an individual SNP association P value, adjusted for genomic inflation. Only signals with P < 1 × 10−2 are shown. None of the SNPs in c appear in the results of the common-variant GWAS in b, and are therefore additional signals gained from a GWAS that included variants of intermediate MAF. Red and blue lines represent genome-wide (P < 5 × 10−8) and suggestive (P < 1 × 10−5) associations, respectively. Note the additional signal of genome-wide significance that represents the TNFRSF13B locus, and several suggestive associations that only become apparent with variants in the 0.005–0.5 MAF range in c. Suggestive loci are indicated by the rsID of the lead SNP in each chromosome. Note that lead SNPs in the AD-PID GWAS (b) may differ from lead SNPs in the meta-analysis.

Extended Data Fig. 7 Analyses of the MHC locus in the AD-PID GWAS.

Sample sizes: n = 733 cases; n = 9,225 controls. a, LocusZoom association plots of the results of the initial (top) and conditional (middle, bottom) analyses of the MHC locus in the AD-PID GWAS. The x axis and the left y axis represent the chromosomal position and the −log10 of the association P value, respectively. Each point represents an analysed SNP, with the lead SNP indicated by a purple diamond and all other points coloured according to the strength of their linkage disequilibrium with the lead SNP. Purple lines represent HapMap CEU population recombination hotspots. A selection of genes in the region is shown at the very bottom, with over 150 genes omitted. Top, association plot of the most significant signal, rs1265053, which is in the class I region and close to the HLA-B and HLA-C genes. Middle, plot showing the association signal that remains after conditioning on rs1265053, with the strongest signal rs9273841 mapping to the class II region, close to the HLA-DRB1 and HLA-DQA1 genes. Bottom, plot showing the association signal that remains after conditioning on both rs1265053 and rs9273841. b, c, MHC locus conditional analyses of the classical HLA alleles (b) and amino acids of individual HLA genes (c). Each point represents a single imputed classical allele or amino acid, with those marked in red indicating those added as covariates to the logistic regression model: the class I signal (second row), the class II signal (third row) and both class I and class II signals (bottom row). The HLA allele and amino acid shown in the bottom plots are those with the lowest P value remaining after conditioning on both class I and class II signals; as there are no genome-wide-significant signals remaining, the results suggest that there are two independent signals at the MHC locus. d, Protein modelling of two independent MHC locus signals—HLA-DRB1 residue E71 and HLA-B residue N114—using PDB 1BX2 and PDB 4QRQ, respectively. The protein is depicted in white, the highlighted residue in red and the peptide in green.

Extended Data Table 1 ESID definition of PID subtypes
Extended Data Table 2 Description of the NIHR BioResource PID cohort
Extended Data Table 3 Genome-wide-significant and suggestive signals in our AD-PID GWAS and a previous CVID meta-analysis

Supplementary information

Supplementary Information

This file contains: Strategy Outline that provides the rationale for the prioritisation approaches undertaken within the study; Supplementary Methods that provide additional recruitment and ethics details; five Supplementary Notes clarifying clinical information and providing further details of some of the analytical strategies described in the manuscript; and Supplementary Figure 1, which contains the raw images of Western blots and gating strategies for B and T cell immunophenotyping experiments.

Reporting Summary

Supplementary Table 1

This table contains the list of PID genes and clinically relevant variants identified in the NBR-PID cohort, phenotypes of the patients with reported variants, and OMIM phenotypes associated with defects in the same genes.

Supplementary Table 2

This table is a comprehensively annotated list of variants that contribute to the BeviMed prioritisation of candidate genes as causes of PID.

Supplementary Table 3

This table lists candidate pathogenic non-coding deletions of a conserved regulatory element (CRE) of a known PID gene, in which the same patient also has a rare damaging exonic variant. Compound heterozygosity (cHET) for a non-coding CRE deletion and a coding variant could lead to effectively absent gene expression and a recessively inherited disease.

Supplementary Table 4

This table lists candidate PID genes within the AD-PID GWAS loci, prioritised through capture-HiC omnibus gene score (COGS) approach.

This file contains the full list of consortium members and working groups.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Thaventhiran, J.E.D., Lango Allen, H., Burren, O.S. et al. Whole-genome sequencing of a sporadic primary immunodeficiency cohort. Nature 583, 90–95 (2020). https://doi.org/10.1038/s41586-020-2265-1

Download citation

Further reading

  • Rare variant association testing in the non-coding genome

    • Ozvan Bocher
    •  & Emmanuelle Génin

    Human Genetics (2020)

  • Whole-genome sequencing of patients with rare diseases in a national health system

    • Ernest Turro
    • , William J. Astle
    • , Karyn Megy
    • , Stefan Gräf
    • , Daniel Greene
    • , Olga Shamardina
    • , Hana Lango Allen
    • , Alba Sanchis-Juan
    • , Mattia Frontini
    • , Chantal Thys
    • , Jonathan Stephens
    • , Rutendo Mapeta
    • , Oliver S. Burren
    • , Kate Downes
    • , Matthias Haimel
    • , Salih Tuna
    • , Sri V. V. Deevi
    • , Timothy J. Aitman
    • , David L. Bennett
    • , Paul Calleja
    • , Keren Carss
    • , Mark J. Caulfield
    • , Patrick F. Chinnery
    • , Peter H. Dixon
    • , Daniel P. Gale
    • , Roger James
    • , Ania Koziell
    • , Michael A. Laffan
    • , Adam P. Levine
    • , Eamonn R. Maher
    • , Hugh S. Markus
    • , Joannella Morales
    • , Nicholas W. Morrell
    • , Andrew D. Mumford
    • , Elizabeth Ormondroyd
    • , Stuart Rankin
    • , Augusto Rendon
    • , Sylvia Richardson
    • , Irene Roberts
    • , Noemi B. A. Roy
    • , Moin A. Saleem
    • , Kenneth G. C. Smith
    • , Hannah Stark
    • , Rhea Y. Y. Tan
    • , Andreas C. Themistocleous
    • , Adrian J. Thrasher
    • , Hugh Watkins
    • , Andrew R. Webster
    • , Martin R. Wilkins
    • , Catherine Williamson
    • , James Whitworth
    • , Sean Humphray
    • , David R. Bentley
    • , Nathalie Kingston
    • , Neil Walker
    • , John R. Bradley
    • , Sofie Ashford
    • , Christopher J. Penkett
    • , Kathleen Freson
    • , Kathleen E. Stirrups
    • , F. Lucy Raymond
    •  & Willem H. Ouwehand

    Nature (2020)


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.