Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
The UK Biobank is a prospective cohort study with deep genetic, physical and health data collected on ~500,000 individuals across the United Kingdom from 2006-2010. This unprecedented open access database has enabled an order of magnitude larger studies on genetic and epidemiological associations for an extensive range of health related traits. The UK Biobank has generously made their datasets, and research results resulting from these, accessible to researchers as an open access resource to benefit public health.
This collection accompanies the publication of the first main papers from UK Biobank in Nature and associated commentaries. We also highlight a selection of research publications from Nature journals that showcase how these UK Biobank datasets have already been widely used in a broad range of studies in order to advance the understanding of the genetic basis of disease, genetic epidemiology and public health.
- Orli G. Bahcall, Senior Editor, Nature
LISTEN: Professor Jonathan Marchini discusses the UK Biobank genetics and brain imaging publications in Nature. Podcast: UK Biobank opens a new era of health research.
UK Biobank publications
Genome-wide association studies of brain imaging data from 8,428 individuals in UK Biobank show that many of the 3,144 traits studied are heritable, and genes associated with individual phenotypes are identified.
Analysis of genotyping data for more than 150,000 individuals from the UK Biobank using long-range phase information sheds light on mechanisms of clonal haematopoiesis.
The UK Biobank combines detailed phenotyping and genotyping with tracking of long-term health outcomes in a large cohort. This study describes the recently launched brain-imaging component that will ultimately scan 100,000 individuals. Results from the first 5,000 subjects are reported, including thousands of associations, population modes and hypothesis-driven results.
News & Commentaries
Treatments tailored to individuals rely on the wisdom of crowds.
UK Biobank contains a wealth of data on genetics, health and more from 500,000 participants. A detailed overview of the biobank and an analysis of its brain-imaging data show the value of this resource.
Two studies in Nature describe the full data set of the UK Biobank resource, which contains genome-wide genetic data, clinical measurements and health records for ~500,000 individuals, and reveal insights into the brain’s genetic architecture.
Polygenic risk scores represent a giant leap for gene-based diagnostic tests. Here’s why they’re still so controversial.
Association analysis in over 329,000 individuals identifies 116 independent variants influencing neuroticism
Analysis of 329,000 individuals in the UK Biobank identifies 116 loci associated with neuroticism. Genes implicated are enriched in neuronal differentiation pathways, and genetic correlations between neuroticism and other mental health traits are elucidated.
Genome-wide analyses using UK Biobank data provide insights into the genetic architecture of osteoarthritis
Genome-wide association study for osteoarthritis using data from UK Biobank identifies loci for knee- and hip-specific disease. Functional analyses of chondrocytes provide further insight into candidate causal genes.
Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability
Genome-wide meta-analysis identifies >100 loci associated with hair color variation in humans of European ancestry. These loci explain a large portion of the heritability of this trait & provide insights into pathways regulating hair pigmentation.
BayesS estimates SNP-based heritability, polygenicity, and the relationship between effect size and minor allele frequency using genome-wide SNP data. Applying BayesS to UK Biobank data identifies signatures of natural selection for 23 complex traits.
Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence
Meta-analysis of genome-wide association studies for cognitive ability identifies 190 new loci and implicates 939 new genes related to neurogenesis, neuron differentiation and synaptic structure.
Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways
A meta-analysis of genome-wide association studies for neuroticism identifies novel loci, pathways and potential drug targets. Further analysis implicates specific brain regions and evaluates genetic overlap with other neuropsychiatric traits.
Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals
Gene discovery and polygenic predictions from a genome-wide association study of educational attainment in 1.1 million individuals.
This large, multi-ethnic genome-wide association study identifies 97 loci significantly associated with atrial fibrillation. These loci are enriched for genes involved in cardiac development, electrophysiology, structure and contractile function.
Large-scale association analyses identify 142 independent risk variants for atrial fibrillation. Pathway and functional enrichment analyses suggest that many of the putative risk genes act via cardiac structural remodeling.
Genome-wide analyses identify 42 risk loci for diverticular disease, 39 of which are new. Genes in associated regions are enriched for expression in connective tissue cell types and are coexpressed with genes involved in vascular and mesenchymal biology.
Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits
Association analyses in over 1 million individuals identify 535 new loci influencing blood pressure traits. The results provide new insights into blood pressure regulation and highlight shared genetic architecture between blood pressure and lifestyle exposures.
Neuroticism can be assessed as a composite score of individual items. Here, Nagel et al. perform genetic association studies for 12 neuroticism items and the sum-score and demonstrate genetic heterogeneity at the item-level.
Analysis of predicted loss-of-function variants in UK Biobank identifies variants protective for disease
Examination of predicted loss-of-function (pLOF) genetic variants allows direct identification of genes with therapeutic potential. Here, Emdin et al. perform association analysis for 3759 pLOF variants with 24 traits and highlight protective variants against cardiometabolic and immune phenotypes.
Protein-truncating variants (PTVs) are predicted to significantly affect a gene’s function and, thus, human traits. Here, DeBoever et al. systematically analyze PTVs in more than 300,000 individuals across 135 phenotypes and identify 27 associations between PTVs and medical conditions.
Testing the association between genetic variants and a range of phenotypes can assist drug development. Here, in a phenome-wide association study in up to 697,815 individuals, Diogo et al. identify genotype–phenotype associations predicting efficacy, alternative indications or adverse drug effects.
Little is known about the genetic determinants of social isolation and loneliness despite their well-established importance for health. Here, using multi-trait GWAS, Day et al. identify 15 genomic loci for loneliness and further show a bidirectional causal relationship between BMI and loneliness by MR.
Genome–wide association study for risk taking propensity indicates shared pathways with body mass index
Emma Clifton et al. report a genome-wide association study of risk taking propensity amongst UK Biobank participants. They identify 26 loci, 24 of which are novel, and use Mendelian randomisation analysis to explore the relationship between risk-taking propensity and BMI.
Genome-wide association study of developmental dysplasia of the hip identifies an association with GDF5
Konstantinos Hatzikotoulas et al. report the largest genome-wide association study to date for developmental dysplasia of the hip using national clinical audit data from the UK. They find a significant association with the GDF5 locus and evidence for shared genetic architecture with hip osteoarthritis.
Rosa Thorolfsdottir et al. report a genome-wide association study of atrial fibrillation in 29,502 cases and 767,760 controls from Iceland and the UK Biobank. They identify a significant association with coding variants in RPL3L, the first ribosomal gene implicated in atrial fibrillation, and MYZAP, an intercalated disc gene.
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies
SAIGE (Scalable and Accurate Implementation of GEneralized mixed model) is a generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.
Identifying loci affecting trait variability and detecting interactions in genome-wide association studies
The heteroskedastic linear mixed model is a new framework for testing both mean and variance effects on quantitative traits. Applying the heteroskedastic linear mixed model to body mass index in the UK Biobank shows that the approach increases the power to detect associated loci.
MTAG is a new method for joint analysis of summary statistics from genome-wide association studies of different traits. Applying MTAG to summary statistics for depressive symptoms, neuroticism and subjective well-being increased discovery of associated loci as compared to single-trait analyses.