Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Day-Williams, J. McElwee, D. Diogo, W. Astle, E. Di Angelantonio, E. Birney, A. Richard, J. Mason and M. Inouye commented on the manuscript, and M. Sharp helped with mapping drug indications to GWAS traits. We thank INTERVAL study participants; staff at recruiting NHSBT blood donation centres; and the INTERVAL Study Co-ordination team, Operations Team (led by R. Houghton and C. Moore) and Data Management Team (led by M. Walker). Funding sources are listed in the Supplementary Information.Reviewer information
Nature thanks T. Lappalainen, M. McCarthy and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Flowchart of sample processing and quality control stages for proteomic and genetic measurements before genetic analyses.
SDS–PAGE with Alexa-647-labelled proteins captured by the IL1RL2 SOMAmer (a) or GP1BA SOMAmer (b). For each protein target, the protein captured by the SOMAmer is compared to the standard. The cognate targets are the only ones with protein visible in the capture lanes, whereas the proteins homologous to the target proteins show no evidence of binding. These experiments were performed once. MW markers, molecular weight markers.
Extended Data Fig. 3 Evidence for the reliability of protein measurements made using the SOMAscan assay.
a, Distribution of coefficients of variation of all proteins on the SOMAscan assay in each subcohort. b, Spearman’s correlations for all proteins passing QC derived from contemporaneous assay of baseline and two-year samples from 60 participants. c, Scatterplot of pQTL effect size estimates from SOMAscan versus Olink showing all 163 pQTLs tested (top) and the 106 that replicated (bottom). r is Pearson’s correlation coefficient. d, Distribution of inflation factors across proteins that underwent genome-wide association testing, stratified by subcohort and allele frequency (MAF ≥ 5%, MAF < 5%).
a, Regional association plots of the trans pQTL (sentinel variant rs11079936) for GDF11/8 before and after adjusting for levels of WFIKKN2 (upper panels), and the WFIKKN2 cis pQTL after adjusting for GDF11/8 levels (bottom panel). A similar pattern of association for WFIKKN2 was seen before GDF11/8 adjustment (not shown). b, Attenuation of the GDF11/8 trans pQTL upon adjustment for plasma levels of the cis protein WFIKKN2.
pQTL mapping in n = 3,301 individuals. a, Distribution of the predicted consequences of the sentinel pQTL variants compared to matched permuted null sets of variants, stratified by cis and trans. Asterisks indicate empirical enrichment using a permutation test (10,000 permuted sets of non-associated variants) at a Bonferroni-corrected significance value (P < 0.005). Bar height represents the mean proportion of variants within each class and error bars reflect one standard deviation from the mean. b, Number of proteins associated (P < 1.5 × 10−11) with each sentinel variant across the genome.
Circle shows enrichment for DNase I hypersensitive sites (‘hotspots’) for each of 55 tissues (183 cell types) available from the ENCODE and Roadmap Epigenomics projects, with tissues or cell types clustered and coloured by anatomical grouping. Some tissues have multiple values due to availability of multiple cell types or multiple tests per cell type. Radial lines show fold-enrichment, while dots around the inside edge of the circle denote statistically significant enrichment at a Bonferroni-corrected significant threshold P < 5 × 10−5. Enrichment testing performed using GARFIELD (which tests enrichment against permuted sets of variants matched for MAF, distance to TSS and LD). pQTL data from n = 3,301 individuals.
Extended Data Fig. 7 Scheme outlining the combined ‘bottom-up’ and ‘top-down’ process used for candidate gene annotation of trans pQTL regions.
See Methods. GbA, guilt-by-association; KEGG, Kyoto Encyclopedia of Genes and Genomes; OMIM, Online Mendelian Inheritance in Man; STRINGdb, STRING database.
These experiments were repeated three times independently with similar results. a, SOMAmer pulldowns with purified PR3, A1AT, and PR3–A1AT complex. SOMAmer PRTN3.3514.49.2 enriched the PR3–A1AT complex to a much greater degree than free PR3. Conversely, SOMAmer PRTN3.13720.95.3 enriched free PR3 to a greater degree than the PR3–A1AT complex. b, Solution affinity of PRTN3.3514.49.2 and PRTN3.13720.95.3 for PR3, A1AT, and the PR3–A1AT complex. SOMAmer PRTN3.3514.49.2 has a higher affinity for the PR3–A1AT complex than for free PR3. SOMAmer PRTN3.13720.95.3, on the other hand, has a higher affinity for free PR3 than SOMAmer PRTN3.3514.49.2. c, Competitive binding of SOMAmers PRTN3.13720.95.3 and PRTN3.3514.49.2 to PR3. A limiting amount of radiolabelled PRTN3.13720.95.3 was incubated with 1 nM proteinase-3 and a titration of either cold PRTN3.13720.95.3 or cold PRTN3.3514.49.2.
Comparison between a randomized controlled trial and Mendelian randomization to assess the causal effect of changes in protein biomarker levels on disease risk.
a, Compartment distribution with annotations of all proteins in the Human Protein Atlas for comparison. b, GO molecular functions.
This file contains funding details, full Supplementary Table Legends, Supplementary Notes and Supplementary References
A three-dimensional interactive plot of sentinel variant-protein associations (red-cis, blue-trans). X-axis (“pQTL position”) represents position of the sentinel variant along chromosomes 1-22. Y-axis (“Protein position”) represents the start position of the gene encoding the protein. Z-axis represents the –log10(p) of the association. Additional details can be viewed when hovering over the points. Clicking on cis/trans in the legend toggles display of points by cis/trans. Additional viewing controls are available at the top right of the window. For clarity, associations with p<10-300 (diamonds) are plotted at -log10(p)=300. The plot is generated using “plotly” R package v4.5.6 (Plotly Technologies Inc., Montréal, Canada)
Supplementary Tables 1-21 – see Supplementary Information file for full descriptions