We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
International HapMap Consortium. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).
Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).
Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Fuchsberger, C., Abecasis, G.R. & Hinds, D.A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).
O'Connell, J. et al. Haplotype estimation for biobank-scale data sets. Nat. Genet. 48, 817–820 (2016).
Ferrucci, L. et al. Subsystems contributing to the decline in ability to walk: bridging the gap between epidemiology and geriatric practice in the InCHIANTI study. J. Am. Geriatr. Soc. 48, 1618–1625 (2000).
Melzer, D. et al. A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet. 4, e1000072 (2008).
Wood, A.R. et al. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant–phenotype associations undetected by HapMap based imputation. PLoS One 8, e64343 (2013).
Bathurst, I.C., Travis, J., George, P.M. & Carrell, R.W. Structural and functional characterization of the abnormal Z α1-antitrypsin isolated from human liver. FEBS Lett. 177, 179–183 (1984).
Ferrarotti, I. et al. Serum levels and genotype distribution of α1-antitrypsin in the general population. Thorax http://dx.doi.org/10.1136/thoraxjnl-2011-201321 (2012).
Sharp, K., Kretzschmar, W., Delaneau, O. & Marchini, J. Phasing for medical sequencing using rare variants and large haplotype reference panels. Bioinformatics 32, 1974–1980 (2016).
CONVERGE Consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).
Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).
Rosenberg, N.A. et al. Genetic structure of human populations. Science 298, 2381–2385 (2002).
Wang, Y., Lu, J., Yu, J., Gibbs, R.A. & Yu, F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res. 23, 833–842 (2013).
Völzke, H. et al. Cohort profile: the study of health in Pomerania. Int. J. Epidemiol. 40, 294–307 (2011).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
We are grateful to all participants of all the studies that have contributed data to the HRC. J.M. acknowledges support from the ERC (grant 617306). W.K. acknowledges support from the Wellcome Trust (grant WT097307). S. McCarthy and R.D. acknowledge support from Wellcome Trust grant WT090851. A full list of acknowledgments for the cohorts is given in the Supplementary Note.
The author declare no competing financial interests.
Integrated supplementary information
The top figure shows the per-sample transition-transversion ratio (Ts/Tv) for chromosome 20 after running the GLPhase genotype calling method on the full MAC5 site list. In the bottom figure, GLPhase was run after the site filtering described in the text.
Figure a shows the number of sites in the unfiltered and filtered MAC5 site lists (chromosome 20) stratified by non-reference allele frequency. The allele frequency here is calculated from the genotypes made after running the GLPhase genotype calling method on the full MAC5 site list. Figure b shows the corresponding transition-transversion ratio (Ts/Tv) of these sites.
The x-axis shows the non-reference allele frequency of the SNP being imputed on a log scale. The y-axis shows imputation accuracy measured by aggregate r2 when imputing SNP genotypes into 10 CEU samples. These results are based on using genotypes from sites on Illumina Core Exome SNP array.
The x-axis shows the non-reference allele frequency of the SNP being imputed on a log scale. The y-axis shows imputation accuracy measured by aggregate r2 when imputing SNP genotypes into 10 CEU samples. These results are based on using genotypes from sites on Illumina OMNI 5M SNP array.
On the x-axis we show the number of studies a variant was called in (out of 20) and on the y-axis we show the number of times it was filtered out by the cohort-specific internal QC pipelines. The color shows the percentage of variants in each such cell (red means more than 10% of variants lie in that cell while blue means less than 0.1%). The number to the top right of each cell denotes the Ts/Tv ratio for all sites in that cell. Cells higher in the plot have been filtered out relatively often and usually represent poor variants, as is also seen from the low Ts/Tv ratio. All variants above the red line were filtered out (which excludes all cells which had been filtered independently by more than 4 studies or have Ts/Tv ratio less than 1.7)
The figure shows a log-log plot of run time vs sample size for four different methods of genotype calling from GL data. For each sample size 5 random 1024 site chunks from chromosome 20 were used. Each dot represents the run time of a single dataset. Lines are drawn between successive means of run times for each value of sample size
About this article
Cite this article
the Haplotype Reference Consortium., McCarthy, S., Das, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–1283 (2016). https://doi.org/10.1038/ng.3643
Nature Genetics (2021)
Annals of Neurology (2021)
Communications Biology (2021)
Comparative host transcriptome in response to pathogenic fungi identifies common and species-specific transcriptional antifungal host response pathways
Computational and Structural Biotechnology Journal (2021)
Association between polygenic risk score of Alzheimer’s disease and plasma phosphorylated tau in individuals from the Alzheimer’s Disease Neuroimaging Initiative
Alzheimer's Research & Therapy (2021)