International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways

Article metrics


Primary biliary cirrhosis (PBC) is a classical autoimmune liver disease for which effective immunomodulatory therapy is lacking. Here we perform meta-analyses of discovery data sets from genome-wide association studies of European subjects (n=2,764 cases and 10,475 controls) followed by validation genotyping in an independent cohort (n=3,716 cases and 4,261 controls). We discover and validate six previously unknown risk loci for PBC (Pcombined<5 × 10−8) and used pathway analysis to identify JAK-STAT/IL12/IL27 signalling and cytokine–cytokine pathways, for which relevant therapies exist.


Primary biliary cirrhosis (PBC) is a rare cholestatic liver disease characterized by progressive autoimmune destruction of intrahepatic bile ducts, leading to cirrhosis and liver failure in a substantial proportion of cases1. To date, four genome-wide association studies (GWAS) and two Illumina immunochip studies of PBC have confirmed associations at the human leukocyte antigen (HLA) locus and identified 27 non-HLA risk loci2,3,4,5,6,7,8. Consistent with GWAS data for other autoimmune diseases, results of these studies implicate immune-related genes in disease pathogenesis, but in general fail to pinpoint the disease-causal variants within the identified risk loci. To identify risk alleles that may be relevant to disease biology and treatment and illuminate additional PBC risk loci, we undertook a genome-wide meta-analysis (GWMA) combining North American, Italian and UK PBC GWAS data sets2,4,5. Functional annotation of the risk loci and pathway analyses were then performed to identify the alleles and pathways most relevant to disease cause and treatment.


Discovery of new PBC risk loci

Following quality control, the combined discovery data set for GWMA consisted of 1,143,634 genotyped or imputed single-nucleotide polymorphisms (SNPs) in 2,764 cases and 10,475 controls (Supplementary Table 1). After genomic control correction and exclusion of known PBC risk loci from the final set of results, the inflation factor was λ=1.043 (Supplementary Fig. 1). Meta-analysis of this data set identified 23 loci at genome-wide level of significance (P<5 × 10−8, calculated using logistic regression of individual discovery data sets in ProbABEL followed by genomic control correction of individual discovery data sets in R and fixed-effects meta-analysis in META, see Methods). Of these, 22 had been detected in previous studies and the 23rd corresponded to a most-likely spurious signal from a single imputed SNP on chromosome 13 (Supplementary Fig. 2, Supplementary Table 2). However, we found suggestive evidence of association (P<2 × 10−5 from fixed-effects meta-analysis in META) at 41 loci not previously known to be associated with PBC. The top-scoring SNPs (or close proxies in strong linkage disequilibrium with the top-scoring SNP) from these and nine other loci (including the likely spurious chromosome 13 signal) were taken forward for genotyping in an independent panel consisting of 3,716 PBC cases and 4,261 controls. In total, 120 SNPs at 50 independent loci were taken forward for validation, of which 114 were successfully genotyped (Supplementary Data 1).

In the validation analysis, we confirmed association with SNPs at six loci not previously known to be associated with PBC (P<4.4 × 10−4, equivalent to P=0.05 with a Bonferroni correction for 114 tests, calculated using logistic regression analysis of individual validation data sets in PLINK followed by meta-analysis in META, see Methods); meta-analysis of discovery and validation cohorts at these loci reached genome-wide levels of significance (Pcombined <5 × 10−8 from fixed-effect meta-analysis in META) (Table 1, Supplementary Figs 3 and 4). Furthermore, SNPs at two additional loci achieved P values suggestive of association (P<1 × 10−3 from fixed-effect meta-analysis in META, equivalent to P=0.05 with a Bonferroni correction for testing at 50 independent loci; Table 1, Supplementary Fig. 5). Newly identified PBC risk loci overlap with those of other autoimmune disorders and harbour several immunologically relevant candidate genes, most notably chemokine ligand 20 (CCL20) and interleukin 12B (IL12B; Table 1).

Table 1 PBC risk loci identified in the current study.

Discovery of candidate causal disease variants

In functional annotation of risk loci, we identified 199 candidate variants across 28 non-HLA risk loci with probabilistic identification of causal SNPs (PICS) probability >0.0275 (ref. 9). At each risk locus, the most-likely causal variant was the index variant, with median PICS probability of 0.224 and values up to 0.998 for rs2546890 at 5q33.3 (Supplementary Data 2). Looking at all candidate variants across all risk loci, the majority were intronic, upstream or downstream gene variants with no predicted functional consequence (99/199, 40%). However, a substantial proportion (59/199, 30%) were regulatory region variants, defined as SNPs located within regulatory features, including enhancers, promoters, transcription factor-binding sites and open chromatin regions (Supplementary Data 3). Notably, candidate variants at 18 (64%) of the 28 annotated risk loci included at least one regulatory region variant. In contrast, only 5 of 199 candidates were missense variants (2.5%) (Supplementary Table 3a). However, these included rs2297067 in EXOC3L4 at 14q32.32 and rs2304256 in TYK2 at 19p13.2, both predicted by SIFT and/or PolyPhen to be deleterious or potentially damaging10,11. Candidate variants included a single splice region variant, that is, rs17641524 at 1q31.3 that is predicted to affect splicing of DENND1B (Supplementary Table 3b).

We found that candidate variants at several risk loci are methylation quantitative trait loci (mQTLs), including mQTLs for DENNDIB, PLCL2, IRF5 and TNFRSF1A, all genes that are implicated in risk for other autoimmune diseases (Supplementary Data 4). We also found that candidate variants at several risk loci are expression quantitative trait loci (eQTLs) in lymphoblastoid and other cell lineages, including eQTLs for CCL20, IL12A, IRF5 and TYK2 (Supplementary Data 5).

At many risk loci, functional annotation highlighted a single candidate gene (Supplementary Data 2). However, most risk loci contained multiple compelling candidate variants. This complexity is well illustrated by the composite of candidate variants at the PLCL2 gene and MANBA gene loci, which include multiple eQTL and mQTL SNPs. Thus, despite the presence of many candidate variants with regulatory or epigenetic roles within PBC risk loci, more direct biological experimental approaches are required to pinpoint the disease-causal variants at these loci.

We also applied functional GWAS (FGWAS) and its associated annotation file12 to our full set of discovery GWMA results and thereby identified 75 annotations with enrichment (P<0.01 from FGWAS) of GWMA association signals (Supplementary Data 6). After a stepwise selection approach similar to that of Pickrell12, the best-fitting model included six annotations highlighting negative enrichment of repressed chromatin regions in a lymphoblastoid cell line, and positive enrichment of DNase-I-hypersensitive sites in a variety of cell types, in particular CD20+ and Th1 T cells (Supplementary Table 4).

Identification of candidate targetable biological pathways

To identify biological pathways involved in development of PBC, we conducted pathway analysis using GCTA13 followed by i-GSEA4GWAS14. We identified several immunoregulatory pathways associated with PBC, in particular, IL-12 and other cytokines as well as T-cell signalling pathways. To account for bias that might result from the strong HLA association with PBC, we repeated this analysis with SNPs/genes in the HLA region excluded. Notably, IL-12, IL-27 and JAK-STAT signalling pathways were still associated with PBC, even after their HLA contribution had been removed (Table 2).

Table 2 Results from pathway analysis in iGSEA4GWAS.

We identified molecules that targeted these pathways by overlaying the Drug Gene Interaction database15 and calculating a pathway specificity score and Jaccard index of each drug for each of the pathways that remained associated with PBC after the HLA contribution had been removed (Table 2, Supplementary Data 7). This combined analysis identified pathways and immunomodulatory agents that represent promising leads for further study in models of PBC.


The current study adds to our knowledge of the genetic architecture of PBC. Notably, our data identify CCL20 as a candidate risk gene for PBC. Chemokine ligand 20 (CCL20) and its chemokine receptor CCR6 contribute to the formation and function of mucosal lymphoid tissues and are notably, in the context of the immune-mediated lymphocytic cholangitis characteristic of PBC, involved in the localization of Th17 cells and CD8 effector T cells to cholangiocytes and the periductal area in portal tracts16. This study also reinforces the importance of IL-12 and JAK-STAT signalling in this disease.

The functional annotation of risk loci has helped to assign priority to the candidate genes at newly identified and established risk loci. Furthermore, the identification of disease-associated regulatory variants at multiple risk loci emphasizes the potential importance of gene regulation in the pathogenesis of PBC (and presumably other complex disorders). This possibility is corroborated by the finding of numerous risk loci wherein the index and/or closely related SNPs that appear to represent regulatory, mQTL and/or eQTLs variants related to the nearby gene. Via the FGWAS analysis, this study also suggests particular importance of CD20+ B cells and Th1 cells in the pathogenesis of PBC. However, both the cell types and the specific gene variants most relevant to PBC require further investigation and in particular exploration of the tissue-specific functional effects of the disease-associated variants.

By looking for drug–gene interactions, we have identified candidate drugs targeting specific, PBC-associated pathways, creating new opportunities to re-purpose available drugs for targeted immune therapy. Despite the speculative nature of this analysis, the data provide a start point in the search for novel therapies that are urgently needed to improve outcomes for PBC patients.


Study samples and genotyping

The use of human subjects for this study was approved by the University Health Network Research Ethics Board, The Mayo Clinic Institutional Review Board, Etico Indipendente IRCCS Istituto Clinico Humanitas, UC Davis Institutional Review Board and the Oxford Research Ethics Committee.

All PBC cases included in the Canadian–US, Italian and UK discovery and validation cohorts fulfilled the American Association for the Study of Liver Diseases criteria for PBC.

The Canadian–US discovery cohort included 499 PBC cases who were self-reported whites of European descent and 390 healthy Canadian controls, all genotyped using the Illumina HumanHap370 BeadChip. Additional controls included in this cohort were 1,094 control subjects provided from the Prostate Cancer Genetics Markers Susceptibility (CGEMS), 1,142 controls from the Breast CGEMS studies and 1,748 controls from the New York Cancer Project, all of whom who were genotyped on an Illumina 550 K bead array4. Following all quality control (QC) procedures, the final Canadian–US discovery set included 499 PBC cases and 4,374 controls.

The PBC cases included in the Italian discovery cohort were self-reported whites of Italian descent genotyped using the Illumina Human610-Quad BeadChip. Controls in this cohort were healthy Italians genotyped using the Illumina 1M-duo array. Following QC procedures, the final Italian discovery set comprised 449 cases and 940 controls.

The PBC cases included in the UK discovery cohort were self-reported whites of British descent, genotyped using the Illumina Human-660 W Quad array. Controls in this cohort were 5,163 population controls genotyped on the Illumina 1M-Duo array as part of the Wellcome Trust Case Control Consortium 2 project. Following QC procedures, the UK discovery set comprised 1,816 cases and 5,161 controls.

The ‘Canadian’ 903 PBC cases and 834 controls included in the validation studies were self-reported whites of European descent recruited from Canada, Europe and the United States to an ongoing PBC genetics study based in Toronto. The 721 ‘US’ PBC cases and 294 controls included in the validation studies were self-reported whites of European descent enroled in the Mayo Clinic PBC Genetic Epidemiology registry and biorepository based at the Mayo Clinic in Rochester ( Italian PBC cases and controls included in the validation studies were self-reported whites of Italian descent recruited to the Italian PBC Genetics study based at Instituto Humanitas in Milan. The Italian controls were obtained from Ospedale Alessandro Manzoni, Lecco, Italy and were unrelated healthy volunteers with no known non-Italian heritage. Cases and controls from the Canadian, Italian and the US cohorts were genotyped at the University Health Network/Mount Sinai Hospital Clinical Genomics Centre using a Sequenom iPLEX Gold assay. Following QC procedures, the final validation set included 903 cases and 834 controls from Canada; 300 cases and 618 controls from Italy; and 721 cases and 294 controls from the United States (Supplementary Table 1).

The ‘UK’ PBC cases included in the validation studies were self-reported whites of British descent recruited to the UK-PBC project via the UK-PBC Consortium ( Cases were genotyped using Sequenom iPLEX Gold assay at the Wellcome Trust Sanger Institute Genotyping Facility ( The UK validation control data were obtained from the TwinsUK resource, an adult twin registry comprising 12,000 (predominantly female) British twins. Genotype data for 3,512 twin individuals (genotyped using the Illumina HumanHap610 array) were obtained from the Department of Twin Research and Genetic Epidemiology at King’s College London. One twin from each genotyped pair was included in the current study, amounting to 2,603 unrelated individuals. Following QC procedures, the final UK validation set comprised 1,792 PBC cases and 2,515 TwinsUK controls (Supplementary Table 1).

Quality control

We implemented a standard QC pipeline across all three discovery data sets, over-and-above QC procedures carried out in the respective primary analyses2,4,5. QC checks were carried out using the software package PLINK17. Within each discovery data set we removed SNPs with a genotype call rate <95%; minor allele frequency <0.05; significant deviation from Hardy Weinberg Equilibrium in controls (P<10−5) or a large difference (>5%) in the proportion of missing genotypes in cases versus controls. We removed samples showing high rates of missing data (>90%); whole-genome heterozygosity >six s.d. from the mean; estimated proportion of identity by descent (IBD) sharing with another sample >0.1, or apparent gender discrepancies (based on X-chromosomal heterozygosity >0.2 for men and <0.2 for women). Principal component analysis (based on a subset of 32,000 highly informative SNPs) was carried out using the ‘smartpca’ routine of the EIGENSOFT package18 to identify population outliers for exclusion and to identify principal components that differed between cases and controls; these principal components were used as covariates in subsequent association analyses.

Genome-wide imputation

We used the SNPs and samples passing QC to carry out genome-wide imputation within each of our cohorts using the software package MaCH19 with HapMap3 CEU+TSI samples as reference data sets. Within each cohort we used approximately the same set of genotyped SNPs in cases and controls to ensure similar levels of informativity. Following imputation, we retained only those SNPs displaying minor allele frequency >0.005 and imputation quality score R2>0.5 in all three cohorts.

Statistical analysis of discovery cohorts

Within each cohort we carried out association analysis of the genome-wide imputed data allowing for imputation uncertainty using the software package ProbABEL20. We performed logistic regression of disease phenotype on allele dosage; principal components that differed between cases and controls were included as covariates to help correct for population stratification. Quantile–quantile plots of the genome-wide set of test statistics were examined and genomic control correction was carried out within each cohort by multiplying the standard error of the estimated log odds ratio for each SNP by the square root of the genomic control inflation factor λ (ref. 21). The resulting log odds ratios and adjusted standard errors from all three cohorts were meta-analysed using the software package META to produce the final set of genome-wide discovery results22.

Validation analysis

We selected loci for validation if they achieved suggestive level of significance in the discovery analysis (minimum P<2 × 10−5) and were not already known to be associated with PBC. We also selected loci for validation if they had achieved genome-wide significant association in one previous study but had never been validated in an independent cohort. We selected approximately two validation SNPs per locus; for loci displaying extended patterns of linkage disequilibrium or harbouring several putative independent association signals we attempted to select two validation SNPs within each subregion.

Within each locus chosen for validation we assigned priority to SNPs according to whether they had been genotyped in the TwinsUK cohort (which was used as a validation cohort for the UK validation cases). One SNP selected for validation (rs2297067) did not have genotype data available in TwinsUK and was therefore imputed within TwinsUK based on genotyped SNPs in the surrounding 5-Mb region using the software packages SHAPEIT23 and IMPUTE2 (ref. 24), with 1,000 Genomes (Phase I version 3 integrated data, released on March 2012) used as a reference sample. The TwinsUK cohort was subjected to a variety of additional QC checks as described previously25; the 2,515 controls used here correspond to the 2,520 controls used previously with an additional five exclusions due to discrepant gender25.

Within each validation cohort we carried out case/control association analysis of those SNPs that were successfully genotyped using logistic regression in PLINK. Results from the four validation cohorts (or from the combined discovery and validation cohorts) were combined via meta-analysis in META.

Imputation to 1,000 Genomes within validated loci

Imputation within the discovery cohorts was carried out at the six validated loci using the software packages SHAPEIT23 and IMPUTE2 (ref. 24) with the 1,000 Genomes (Phase I integrated variant set, release December and June 2013) used as a reference panel. The same genotyped SNPs that had been used to inform HapMap3 imputation for the discovery analysis were used for the 1,000 Genomes imputation within these targeted regions. Association analysis of SNPs passing post-imputation QC (‘info’ score >0.5) was carried out separately within each cohort, the results were genomic control corrected by multiplying the standard error of the estimated log odds ratio for each SNP by the square root of the previously estimated genomic control inflation factor λ for each cohort, and results were combined across the cohorts via meta-analysis in META. This confirmed the findings from our original (HapMap3) imputation experiment but did not identify any substantially stronger associations or candidate causal variants than we had already found.

Functional annotation of validated loci

Left and right boundaries for each associated region were defined by finding a 0.1-cM interval either side of the most strongly associated SNP where no SNP has P<1 × 10−5. We looked for overlap between PBC risk loci and confirmed risk loci for other autoimmune conditions using ImmunoBase, a web-based resource focused on the genetics and genomics of immunologically related human diseases ( To assign priority to candidate genes and candidate variants at risk loci, we used the online PICS (Probabilistic Identification of Causal SNPs) algorithm to identify candidate variants at each risk locus with a PICS probability >0.0275 ( We adopted this threshold to be consistent with Farh et al.9 in their paper describing the approach. Given an index SNP corresponding to the most associated SNP in a locus, the PICS algorithm calculates (based on the known linkage disequilibrium pattern in the region, as measured in a large Immunochip or 1000 Genomes reference sample) a score for each SNP in the region, representing the extent to which that SNP could, in fact, be the true causal SNP, allowing for statistical sampling variation.

We then used the Ensembl Variant Effect Predictor web tool to annotate candidate variants for their predicted functional consequences ( We used Genevar to evaluate the measured effects of these variants on DNA methylation in tissue collected from 856 healthy female twins of the MuTHER resource (,27. We used Genevar26, seeQTL ( and the University of Chicago eQTL browser ( to identify eQTLs amongst candidate variants.

We also used the FGWAS software and its associated annotation file (containing 450 genomic annotations of various types), applied to our full set of GWMA results, to investigate the extent to which genetic variants associated with PBC were enriched within specific annotation categories12. Testing each annotation individually, we found 75 annotations that showed enrichment (P<0.01) of GWMA association signals; as many of these annotations are correlated with one another we used a stepwise selection approach followed by cross-validation to mitigate overfitting (similar to the procedure performed by Pickrell12) on these 75 annotations to identify a final best-fitting model that included 6 annotations. Annotation information used by FGWAS was derived from a variety of sources including Maurano et al.29, Thurman et al.30 and Hffman et al.31 (see Appendix of Pickrell12 for details).

Pathway analysis

Using summary results from the GWMA (effect size, standard error and allele frequency) along with SNP linkage disequilibrium estimated from the Italian GWAS individual-level genotype data, we performed approximate conditional analysis using the software GCTA13. Only the independently associated signals with conditional P value and PGWMA both <0.001 were retained for further consideration. We submitted the rsIDs and PGWMA of these SNPs as well as gene sets from BioCarta, KEGG, PID and Reactome curated by MSigDB (as of 26 March 2014) to the i-GSEA4GWAS web server14. This programme identified genes within 20 kb of the SNPs and represented each gene by the greatest –log PGWMA of the SNP(s) mapped to it. Gene sets were then assessed for enrichment with significant genes while SNP label permutations were conducted to correct for bias from variations in gene size and gene set size. False discovery rate was used to correct for multiple testing based on the distributions of enrichment scores generated by permutation.

Drug-pathway analysis

To identify drugs that affected the pathways associated with PBC (when the HLA locus was excluded), we first identified the genes participating in each pathway. We then downloaded drug–gene associations from the Drug Gene Interaction database15 and scored each drug by the proportion of each its targets that were in each pathway, which we termed as the drug’s pathway specificity. As a secondary scoring metric, we evaluated the proportion of each pathway affected by the drug using the Jaccard index on the respective sets of pathway genes and targeted genes. To identify promising drug candidates, we ranked drugs first by our primary specificity metric and then by the secondary Jaccard index.

Additional information

How to cite this article: Cordell, H. J. et al. International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways. Nat. Commun. 6:8019 doi: 10.1038/ncomms9019 (2015).


  1. 1

    Hirschfield, G. M. & Gershwin, M. E. The immunobiology and pathophysiology of primary biliary cirrhosis. Annu. Rev. Pathol. 8, 303–330 (2013).

  2. 2

    Hirschfield, G. M. et al. Primary biliary cirrhosis associated with HLA, IL12A, and IL12RB2 variants. N. Engl. J. Med. 360, 2544–2555 (2009).

  3. 3

    Hirschfield, G. M. et al. Variants at IRF5-TNPO3, 17q12-21 and MMEL1 are associated with primary biliary cirrhosis. Nat. Genet. 42, 655–657 (2010).

  4. 4

    Liu, X. et al. Genome-wide meta-analyses identify three loci associated with primary biliary cirrhosis. Nat. Genet. 42, 658–660 (2010).

  5. 5

    Mells, G. F. et al. Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 43, 329–332 (2011).

  6. 6

    Juran, B. D. et al. Immunochip analyses identify a novel risk locus for primary biliary cirrhosis at 13q14, multiple independent associations at four established risk loci and epistasis between 1p31 and 7q32 risk variants. Hum. Mol. Genet. 21, 5209–5221 (2012).

  7. 7

    Liu, J. Z. et al. Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 44, 1137–1141 (2012).

  8. 8

    Nakamura, M. et al. Genome-wide association study identifies TNFSF15 and POU2AF1 as susceptibility loci for primary biliary cirrhosis in the Japanese population. Am. J. Hum. Genet. 91, 721–728 (2012).

  9. 9

    Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

  10. 10

    Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

  11. 11

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  12. 12

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

  13. 13

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  14. 14

    Zhang, K., Cui, S., Chang, S., Zhang, L. & Wang, J. i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res. 38, W90–W95 (2010).

  15. 15

    Griffith, M. et al. DGIdb: mining the druggable genome. Nat. Methods 10, 1209–1210 (2013).

  16. 16

    Oo, Y. H. et al. CXCR3-dependent recruitment and CCR6-mediated positioning of Th-17 cells in the inflamed liver. J. Hepatol. 57, 1044–1051 (2012).

  17. 17

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  18. 18

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  19. 19

    Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

  20. 20

    Aulchenko, Y. S., Struchalin, M. V. & van Duijn, C. M. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11, 134 (2010).

  21. 21

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

  22. 22

    Liu, J. Z. et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat. Genet. 42, 436–440 (2010).

  23. 23

    Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).

  24. 24

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

  25. 25

    Cordell, H. J. et al. Genome-wide association study of multiple congenital heart disease phenotypes identifies a susceptibility locus for atrial septal defect at chromosome 4p16. Nat. Genet. 45, 822–824 (2013).

  26. 26

    Yang, T. P. et al. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics 26, 2474–2476 (2010).

  27. 27

    Grundberg, E. et al. Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am. J. Hum. Genet. 93, 876–890 (2013).

  28. 28

    Xia, K. et al. seeQTL: a searchable database for human eQTLs. Bioinformatics 28, 451–452 (2012).

  29. 29

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  30. 30

    Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  31. 31

    Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

Download references


This study was funded by the Isaac Newton Trust, PBC Foundation, Medical Research Council (grant reference MR/L001489/1), Wellcome Trust (grants 085925), Ontario Physician Services Inc., Canadian Institutes for Health Research (MOP74621), the Ontario Research Fund (RE01-061) and National Institutes of Health (R01DK091823 and RO1DK80670). H.J.C. is a Wellcome Trust Research Fellow in the Basic Biomedical Science (087436 and 102858). C.S.G. is a Moore Investigator in Data-Driven Discovery (GBMF4552) and was supported in part by GM103534. G.F.M. is a post-doctoral clinical fellow of the National Institute for Health Research Rare Diseases (NIHR-RD) initiative. R.N.S. and G.M.H. receive salary support from a MRC-stratified medicine award (UK-PBC). C.I.A. is partially supported by P30 CA023108. K.A.S. is supported by the Sherman Family Chair in Genomic Medicine and a Canada Research Chair award. This study makes use of data generated by the WTCCC2 and WTCCC3, funded by the Wellcome Trust under awards 085475 and 090355. Access to genotype data from the TwinsUK cohort was kindly provided by the Department of Twin Research and Genetic Epidemiology at King’s College London. TwinsUK is funded by the Wellcome Trust and the European Community’s Seventh Framework Programme (FP7/2007-2013) and also receives support from the UK Department of Health via a National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre award to Guy’s & St Thomas’ NHS Foundation Trust in partnership with King’s College London. TwinsUK SNP genotyping was performed by the Wellcome Trust Sanger Institute and the National Eye Institute via the US National Institutes of Health/Center for Integrated Disease Research. Data derived from CGEMs studies ( were retrieved from dbGAP under a protocol led by Dr Amos. Data derived from the New York Cancer Project were provided by Dr Peter Gregersen.

Author information

This study was initially conceived and designed by H.J.C., G.F.M., C.A.A., M.F.S., R.N.S., C.I.A. and K.A.S.; the collection and processing of samples for the study were supervised and coordinated by G.F.M., G.M.H., D.P., A.L., D.C., M.E.G., P.I., K.N.L., M.F.S., R.N.S. and K.A.S.; lab work was supervised by G.X. and the statistical analyses of the data were performed by H.J.C., G.F.M., G.M.H., C.S.G., C.I.A. and K.A.S.; the paper was written primarily by H.J.C., G.F.M., G.M.H., C.S.G., C.I.A. and K.A.S. and critically reviewed and revised by all of the above authors.

Correspondence to Heather J. Cordell or Katherine A. Siminovitch.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

Supplementary Figures 1-5, Supplementary Tables 1-4 and Supplementary References (PDF 951 kb)

Supplementary Data 1

SNPs taken forward for genotyping in the validation cohort (XLSX 26 kb)

Supplementary Data 2

Most-likely causal variants (‘candidate variants’) at PBC risk loci (XLSX 27 kb)

Supplementary Data 3

Regulatory feature variants at PBC risk loci (XLSX 19 kb)

Supplementary Data 4

Methylation quantitative trait loci (mQTLs) at PBC risk loci (XLSX 21 kb)

Supplementary Data 5

Expression quantitative trait loci (eQTLs) at PBC risk loci (XLSX 19 kb)

Supplementary Data 6

Enrichment of genomic annotations in FGWAS (XLSX 14 kb)

Supplementary Data 7

Jaccard Index and Specificity Score of drugs for PBC-associated gene sets (XLSX 57 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.