Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits

Genome-wide association studies of 146 plasma protein levels in 818 individuals revealed 56 genome-wide significant associations (28 novel) with 47 analytes. Loci associated with plasma levels of 39 proteins tested have been previously associated with various complex traits such as heart disease, inflammatory bowel disease, Type 2 diabetes, and multiple sclerosis. These data suggest that these plasma protein levels may constitute informative endophenotypes for these complex traits. We found three potential pleiotropic genes: ABO for plasma SELE and ACE levels, FUT2 for CA19-9 and CEA plasma levels, and APOE for ApoE and CRP levels. We also found multiple independent signals in loci associated with plasma levels of ApoH, CA19-9, FetuinA, IL6r, and LPa. Our study highlights the power of biological traits for genetic studies to identify genetic variants influencing clinically relevant traits, potential pleiotropic effects, and complex disease associations in the same locus.

Scientific RepoRts | 6:18092 | DOI: 10.1038/srep18092 technological developments have made possible the quantification of multiple proteins in a single analytical procedure, allowing both broader and deeper molecular profiling of large cohorts 2,[7][8][9][10] . Genetic analyses of these data have discovered numerous genomic regions associated with clinically relevant proteins, with recent large-scale proteome analyses having identified many loci associated with serum and plasma concentrations of individual proteins 2,7-10 . Nevertheless, our understanding of the genetic basis and pathophysiological impact of variations in protein levels remains far from complete. Most of these studies limited analyses to cis variants or focused on candidate regions rather than genome-wide scans 2,[7][8][9] . Recent research suggests the importance of investigating protein phenotypes beyond those used in traditional genetic studies 10 .
Here we present the results of an unbiased large genetic investigation of protein phenotypes in 818 unrelated individuals from the Washington University Knight Alzheimer's Disease Research Center (KADRC) and Alzheimer's Disease Neuroimaging Initiative (ADNI) who were analyzed for both genome-wide SNP genotypes and for 146 phenotypic measures obtained from multi-analyte panels (Human DiscoveryMAP) of human plasma samples.

Results
Before any genetic analyses we performed extensive quality control (QC) in the genotype and phenotype data. After log transformation and standardization (see materials and methods) we confirmed that the protein levels followed a normal distribution. We also tested the correlation between the analyte values and covariates such as age, gender, and Alzheimer's disease (AD) status (Supplementary Tables S1 and S2). Age, gender, disease status, study, and principal components factors (PCs) from population stratification were included as covariates.
We decided to perform a one-stage GWAS rather than a two-stage GWAS because 1) we have GWAS for all the samples, and 2) it has been shown that combining data from both stages of a two-stage GWAS to perform a single analysis almost always has increased power to identify genetic association than analyzing the groups separately even though a lower statistical threshold is required to determine significance 11 . So to maximize our statistical power, we combined the two datasets to perform a joint one-stage GWAS with all 818 individuals from ADNI and KADRC (characteristics shown in Table 1). To verify our results, we followed up with additional analyses stratified by study and performed meta-analyses of the results from each dataset for each analyte, and we found that the p-values from the meta-analyses were similar to the joint GWAS p-values (Supplementary Table S3). In order to avoid spurious association and consider a single nucleotide polymorphism (SNP) as a real signal, we required each genome-wide significant association from the joint analysis to meet additional criteria: 1) the SNP association had to be consistent between the two series, in the same direction and with similar effect size, which represents an internal replication (Supplementary Table S3) and 2) since we were using cohorts from AD studies, we wanted to be sure our results were not confounded by AD status. In addition to using AD status as a covariate in our initial analyses, we performed separate GWAS on cases and controls and found no difference in effect size or direction indicating the associations found in the combined GWAS were not confounded by AD status (Supplementary Table S4).
We decided to use the common threshold for genome-wide significance (p < 5.0 × 10 −8 ) instead of p < 3.42 × 10 −10 (Bonferroni multiple test correction taking into account SNPs and phenotypes) because the latter would consider that all the analytes are independent and not correlated. However there is extensive evidence that this is not the case and in a recent study we demonstrated that some analytes are highly correlated 12 . Additionally five of the associations in this study in the p = 5 × 10 −8 -3.42 × 10 −10 range have been previously reported and others are located in receptors and genes known to regulate levels of the analyte (Table 2 and Supplementary Table  S5) which indicate that these are real signals. We also found complex loci and potential pleiotropic effects that support the evidence that not all of the SNPs and analytes act independently of others. These findings suggest that a multiple test correction threshold of p < 3.42 × 10 −10 would be too stringent. For this reason we decided to report all the loci with a p < 5.0 × 10 −8 , but we also highlight on Table 2 those that pass the p < 3.42 × 10 −10 threshold.
Genome-wide association study results. After performing the linear regression with each analyte as a phenotype, there were 56 genome-wide significant loci for 47 analytes (Table 2). Twenty-eight of these associations have been reported in the literature previously and 28 (50%) were novel. Thirty-two of the 56 associations (9 novel) pass the p < 3.42 × 10 −10 threshold.
Previously reported findings. Twenty-eight of our genome-wide signals replicated associations reported by 14 different genetic studies of plasma or serum protein levels in humans (Table 2 and Supplementary Table  S5) 6,9,[13][14][15][16][17][18][19][20][21][22][23][24] . Six of our most significant SNPs were the same SNP reported previously and the remaining SNPs were in linkage disequilibrium (LD) with reported SNPs (Supplementary Table S5). Fifteen of these 28 genome-wide loci had p < 3.42 × 10 −10 in our study and five others were in the p = 5 × 10 −8 to 3.42 × 10 −10 range, indicating that signals in this range in our study constitute strong associations. Twenty-three of these previously reported loci are  Table 2). None of the trans effects are located in untranslated regions (UTR) but two of the analytes had cis effects that are in the UTR (CD40: 5′ UTR of CD40 and HCC4: 3′ UTR of CCL16). All of the trans effects are located within genes (four coding, one intronic) that have interactions with the analyte that are not known or well understood ( Table 2). However our results and the previous published studies suggest that these loci in trans proteins play an important role in regulating the levels of CA19-9, CEA, CRP, SELE, and ACE in plasma 15,17,21,24 . More interestingly, some of these loci, like ABO or FUT2, are genome-wide for more than one analyte, which also indicates that these may constitute master regulatory signals (see pleiotropic section).
Twenty-three of our 28 novel findings were trans effects. Twelve analytes were associated with loci that contained only intergenic SNPs and eleven analytes (ANG2, BLC, CEA, F7, FGF4, GROa, MIP1b, MMP7, RAGE, THPO, TNC) were associated with SNPs on intronic regions in gene-rich areas. Interestingly some of these loci contain intronic SNPs that are likely to be regulatory based on RegulomeDB 25 : SCARA5 (associated with TNC levels) and PARVG (associated with ANG2 levels) contain SNPs with RegulomeDB 25 scores lower than 3 (Supplementary Table  S6). Plasma MIP1b levels were also associated with a locus that contains SNPs that are likely to be regulatory. We found that rs145617407, located in the intron of CCR3, was significantly associated with MIP1b levels in plasma (p = 2.58 × 10 −10 ) and this SNP is located less than 119 KB from CCR5 which is the receptor for CCL4/MIP1b ( Supplementary Fig. S21).

GWAS Conditional on top hits revealed additional signals within same loci.
We then performed conditional analyses to determine whether more than one signal in the same loci exists. When we added the most significant SNP to the linear regression model, five analytes (ApoH, CA19-9, FetuinA, IL6r, and LPa) still showed independent and genome-wide significant SNPs at the same locus ( Fig. 1, Table 3 and Supplementary Fig. S5, S13, S17 and S19). It is interesting to note that three of four of the complex loci we found were in cis with the respective protein whereas the FUT2/FUT6/FUT3 locus was associated with CA19-9 plasma levels. Since we decided to use the traditional genome-wide p-value threshold (p < 5 × 10 −8 ) for the conditional analyses, we may be missing some additional independent signals.
After conditioning on rs2070633, located in an AHSG intron, we found that rs4917, a missense variant also located in AHSG, was still significantly associated with plasma levels of FetuinA (p = 7.27 × 10 −9 , original p = 2.61 × 10 −42 ; Table 3 and Supplementary Fig. S13). After conditioning on both SNPs no additional signals were found. An intronic variant in IL6R, rs7526131, was still significantly associated with IL6r plasma levels after conditioning on rs12126142, also located in an intron of IL6R (p = 1.43 × 10 −10 , original p = 4.47 × 10 −72 ; Table 3 and Supplementary Fig. S17). Plasma levels of LPa were significantly associated with rs783147, located in an intron of PLG 506 KB from LPA, and after conditioning on this SNP an intronic variant of SLC22A1 approximately 0.4 MB from LPA (rs783147), was still significantly associated with LPa levels (p = 1.64 × 10 −9 , original p = 9.86 × 10 −9 ; Table 3 and Supplementary Fig. S19).
Scientific RepoRts | 6:18092 | DOI: 10.1038/srep18092 Potential pleiotropy. In addition to finding that some proteins have complex regulation within the structural gene (or a different gene in the case of CA19-9), we also found potentially pleiotropic effects with one gene affecting more than one protein. Potential pleiotropic effects were found for three groups of analyte/associations even though the analyte levels were not correlated: ABO associated with plasma levels of SELE, ACE, and vWF (p = 1.01 × 10 −52 , beta = -0.882; p = 1.90 × 10 −8 , beta = -0.352; p = 8.87 × 10 −8 , beta = 0.253 respectively; Table 4 and Fig. 2). ABO has been previously reported to be associated with ACE activity 26 and SELE plasma and serum levels 15,17 . ABO has also been associated with vWF plasma levels, and although the locus did not reach genome-wide significance in our analysis it was very close 27 . (b) Regional plot for genome-wide significant association on chromosome 17 with ApoH plasma levels; (c) Regional plot for genome-wide significant association on chromosome 18 with ApoH plasma levels; (d) Regional plot for genome-wide significant association on chromosome 9 with ApoH plasma levels; (e) Regional plot for genome-wide significant association on chromosome 17 after conditioning for rs2873966; (f) Regional plot for genome-wide significant association on chromosome 17 after conditioning for rs2873966 and rs17690171.
Interestingly none of these analyte pairs or trios are highly correlated (r < 0.25; Table 4), which again supports the idea that these loci (ABO, FUT2, and APOE-TOMM40 region) are truly master-regulatory regions, that protein levels are highly and complexly regulated, and that studying the genetic architecture of biological traits can lead to a deeper knowledge of the biological processes.
Impact of these findings with complex diseases. Of the 56 loci that we found associated with plasma protein levels, 46 loci have also been reported to be associated with complex traits and diseases including coronary artery disease (ACE and SELE), stroke (ACE and SELE), various cancers (ACE, CA19-9, CEA, RAGE, and SELE), age-related macular degeneration (ApoE, CFHR1, and CRP), periodontitis (ApoH), multiple sclerosis (BLC and CD40), inflammatory bowel disease (CD40 and ENA78), and Type 2 diabetes (IL13, MCSF, and RAGE) ( Table 5; see supplementary results for a complete description). As an example, the AGER variant rs2070600, which in our study was associated with plasma RAGE levels (p = 1.86 × 10 −11 ) has been reported to be associated with pulmonary function 28 . A recent study of RAGE plasma levels suggests they are a promising biomarker for acute respiratory distress syndrome, supporting our hypothesis 29 .
Similarly our genetic analysis for BLC revealed a significant association with SNPs located in DDAH1 (rs7541151, p = 6.44 × 10 −9 ; Table 2), a gene that has been associated with multiple sclerosis (MS). Interestingly BLC levels have recently been reported to be different between patients with MS and controls 30 , which further supports BLC as a potential biomarker.
Since levels of CD40 in plasma were associated with the CD40 locus and CD40 variants have been associated with MS in three independent GWAS 30-32 , we hypothesized that plasma levels of CD40 may also be associated with MS status. As a proof of concept, we used a Quantikine sandwich ELISA kit (R&D Systems cat #DCCD40) to measure plasma levels of CD40 in 20 individuals with relapsing remitting MS in remission at time of plasma collection (   We found plasma levels of CD40 were significantly higher in MS cases (753.26 ± 235.71 pg/mL) than controls (603.02 ± 139.01 pg/mL; p = 0.041, beta = − 1.837; Fig. 3), supporting our hypothesis.
More than half of the loci associated with plasma protein levels in our study have been previously reported to be associated with various complex diseases. Based on the current knowledge for RAGE and BLC, and in the concept of Mendelian randomization, we hypothesize that these protein levels constitute informative biomarkers for these complex traits although additional studies would be necessary to validate this hypothesis. More detailed information about potential novel biomarkers for complex traits is included in Supplementary Results and analyte abbreviations with full names are in Supplementary Table S8. Regional plot for genome-wide significant associations in ABO locus with ACE plasma levels; (c) Manhattan plot of − log 10 p-values for association with plasma levels of SELE; (d) Regional plot for genome-wide significant associations in ABO locus with SELE plasma levels; (e) Manhattan plot of -log 10 p-values for association with plasma levels of vWF; (f) Regional plot for associations in ABO locus with vWF plasma levels, rs687289 was close to genome-wide significance (p = 8.87 × 10 −8 ).  Discussion GWAS of complex traits have been very successful in identifying novel loci associated with those traits, but these studies require extremely large sample sizes, and in some cases it is difficult to interpret the results because the associations are with surrogate tag SNPs which may not be the causal SNPs. Many loci contain multiple genes which also makes it difficult to determine the causal gene or variant. Additionally some loci are located in non-protein coding regions where functional effects are poorly understood. Genetic analyses of biological traits may provide more power than traditional GWAS and may be more informative about the biological effects for specific loci. Using a more unbiased approach than previous genetic studies, we were able to replicate many previously reported associations with various plasma protein levels and uncover several novel associations that could warrant further research. The results from our careful analyses suggest that even though we utilized two datasets from Alzheimer's disease studies there was no confounding effect due to disease status or dataset. Combining datasets from high-throughput technologies that deliver genome-wide genetic data and quantification of protein levels in a single procedure provides a great deal of power to analyses that may help researchers understand the biology of complex traits including the complex loci involved and pleiotropic effects.
Our results clearly indicate that the protein levels are highly and complexly regulated. We found master regulatory regions (pleiotropic; Table 4, Fig. 2, and Supplementary Fig. S4, S10) as well as several independent regulatory elements in the same locus for the same proteins (Table 3, Fig. 1, and Supplementary Fig. S5, S13, S17 and S19). We found protein levels associated with variants in or near the gene coding that protein (cis effects) as well as variants located elsewhere in the genome (trans effects) demonstrating that protein levels are not only affected by the genes that encode the protein but also by interaction with other proteins as in the case of ABO or FUT2 (Table 4).
Interestingly, we found that for almost half of the cis effects (13 out of 28), the association could be explained by a coding variant but for the trans effects most of the loci (24 out of 28) only contain regulatory variants ( Table 2). Although these non-coding signals could be synthetic association and are being driven by low frequency variants, our results and those recently published by ENCODE and the GTEx consortium would suggest that those associations are likely to affect gene expression 33,34 . For this same reason, it is more likely that the association in cis (more frequently due to a non-synonymous variant) will present a higher effect size and are easier to identify in a genetic study than a trans signal, which is more likely to affect gene expression through regulation. Table 2 shows that most of the trans effects associated with plasma protein levels had less significant p-values and lower betas than most of the cis effects. This could explain why only three of the trans effects we found were previously reported while the other 24 were novel. It is of vital importance to identify trans effects because that will help us to identify novel biological interactions and pathways. Of the 28 trans effects we found in our study, only one corresponded to a protein that constituted the receptor of the studied analyte or a gene known to interact directly with the analyte (rs145617407 located less than 119 KB from CCR5 which is the receptor for CCL4/ MIP1b) 35 . However, the fact that the associations of SELE, ACE, and vWF with the ABO locus or CA19-9 and CEA with FUT2 have been identified in other studies, indicates that these signals are real and some of these novel loci may be implicated in regulating the levels of one or more proteins. Additional work is needed because currently it is not clear how ABO regulates plasma levels of SELE, ACE, and vWF or how FUT2 regulates CEA and CA19-9 levels. For the novel loci this can be more complicated because several signals are located in very gene-rich regions and several genes could drive the association ( Fig. 1 and Supplementary Fig. S1, S6, S8, S10, S21, S24, S28, 29, S33, S36-S37, S42, S44, 46).
Another important finding related to this study is its implication on complex traits. Proteins play a key role in many complex traits, so understanding the genetic variations associated with protein levels is important in understanding the biological basis of these traits. We used the concepts of Mendelian randomization, our data, and the data from the NHGRI GWAS catalog to identify genetic regions that are genome-wide significant for various analyte levels as well as previously associated with complex traits. While most of these loci have been associated with complex traits, the associations of most of the plasma analytes with the complex traits have not been previously reported. Our results suggest that some of these plasma protein levels could be novel biomarkers or even endophenotypes for these complex traits.
As an example of our approach providing information useful for understanding potential pleiotropic effects in promising biomarkers for complex diseases that has been supported by previous research: rs485073 in FUT2 was associated in our study with plasma levels of both CEA and CA19-9, which are only weakly correlated in plasma (r = 0.166, p = 2.98 × 10 −6 ). This potential pleiotropy strongly suggests that rs485073 is part of a master regulatory region. In this case this means that plasma levels of CEA and CA19-9 could be important for understanding gastric cancer because FUT2 variants have also been associated with gastric cancer risk 36 . This is further supported by the fact that both CEA and CA19-9 have been reported as FDA approved biomarkers for other types of cancer 37 .
We found several promising plasma biomarkers for complex traits including IL13, ENA78, BLC, and CD40. Based on our results, plasma levels of IL13 may be informative in Type 2 diabetes research. We found rs7433647, located near UBE2E2, was associated with IL13 plasma levels (p = 1.21 × 10 −8 ). UBE2E2 has previously been associated with Type 2 diabetes in a large GWAS meta-analysis of more than 26,000 cases and 83,000 controls with varied ancestry 38 . A recent study using a mouse model for Type 2 diabetes suggests that expression of IL13 plays a key role in adipose tissue inflammation and insulin resistance, further supporting the idea that IL13 levels may be important in studying Type 2 diabetes 39 . ENA78/CXCL5 expression is elevated in the inflamed tissues of patients with rheumatoid arthritis, ulcerative colitis and Crohn's disease 40,41 . Several studies have reported association of CXCL5 variants with inflammatory bowel disease and metabolite levels 42,43 . In our study rs409336, near the CXCL5 gene, showed the strongest effect on plasma ENA78/CXCL5 levels. Because of the similarity in genetic influences on ENA78/CXCL5 levels and inflammatory bowel disease, it is possible that these traits share a common pathophysiological pathway and our findings support further investigation of the involvement of ENA78/CXCL5 in the etiology of inflammatory bowel disease.
Scientific RepoRts | 6:18092 | DOI: 10.1038/srep18092 We found two promising plasma protein biomarkers for MS: BLC and CD40. In our study rs7541151 in DDAH1 was associated with plasma BLC levels. DDAH1 is responsible for the degradation of ADMA into citrulline and dimethylamine, and previous studies showed an association of DDAH1 variants with MS and ADMA levels 30,44 .
Previous studies indicate that CSF levels of BLC/CXCL13 may be an informative biomarker for studying treatment effects in MS [45][46][47] . Our results indicate plasma BLC/CXCL13 levels may be informative as well. The CD40 locus has been associated with MS [30][31][32] but our study appears to be the first to associate CD40 plasma levels with CD40 variants. Plasma levels of CD40 have not been reported as a potential biomarker for MS, but our preliminary data suggests they may be a biomarker for MS. Although we did find a significant difference in CD40 levels in plasma between MS cases and controls, our sample size was small and only contained patients in remission so it would be prudent to evaluate a larger, more varied cohort to determine the possible utility of plasma levels of CD40 as an MS biomarker.  Table 1.

Washington University Knight Alzheimer's Disease Research Center (KADRC) cohort. The
KADRC sample included 124 AD cases and 188 cognitively normal controls. These individuals were evaluated by Clinical Core personnel of Washington University. Cases received a clinical diagnosis of Alzheimer's disease in accordance with standard criteria and dementia severity was determined using the Clinical Dementia Rating (CDR) 48 . Plasma from all KADRC samples was collected in the morning after an overnight fast, immediately centrifuged, and stored at − 80°C until assayed according to standard procedures 49 . Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The ADNI sample included 434 AD cases and 72 cognitively normal controls. Data used in the preparation of this article were obtained from the ADNI database (http://adni.loni.usc.edu/). See Supplementary Methods for further information about ADNI's methods and for up-to-date information see http://www.adni-info.org/. Plasma was collected in the morning after an overnight fast, immediately centrifuged, and stored at − 80°C until assayed as described previously 9 . Genetic and phenotypic data for 506 samples was available for this study.
Genotyping and Quality Control. The ADNI protocol for collecting genomic DNA samples has been previously described 50 . All ADNI samples were genotyped using the Illumina Human610-Quad BeadChip, which contains over 600,000 SNP markers. KADRC samples were genotyped with the Human610-Quad BeadChip or the Omniexpress chip 51 . Prior to association analysis, all samples and genotypes underwent stringent QC. Genotype data was cleaned using PLINK v1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) 52 by applying a minimum call rate for SNPs and individuals (98%) and minimum minor allele frequencies (MAF = 0.02). SNPs not in Hardy-Weinberg equilibrium (P < 1 × 10 −6 ) were excluded. Gender identification was verified by analysis of X-chromosome SNPs. We tested for unanticipated duplicates and cryptic relatedness (Pihat ≥ 0.5) using pairwise genome-wide estimates of proportion identity-by-descent using PLINK v1.07 (http://pngu.mgh.harvard. edu/purcell/plink/) 52 . When a pair of identical samples or a pair of samples with cryptic relatedness was identified, the sample with a higher number of SNPs that passed QC was prioritized. EIGENSTRAT 53 was used for each cohort separately to calculate principal component factors for each sample and confirm the ethnicity of the samples. The 1000 genomes data (June 2011 release) and BEAGLE v3.3.1 54 were used to impute up to 6 million SNPs. SNPs with a BEAGLE R 2 < 0.3, a minor allele frequency (MAF) <0.025, a call rate lower than 95%, a Gprobs score lower than 0.90 and those out of Hardy-Weinberg equilibrium (p < 1 × 10 −5 ) were removed. After imputation, 5,815,690 SNPs passed the QC process.

Assessment of Analyte Profiles and Quality Control.
A set of 0.5 mL EDTA plasma samples from ADNI and KADRC participants was selected and shipped to Myriad Rules Based Medicine, Inc. (Myriad RBM, Austin, TX). A set of 190 protein levels from plasma for each selected individual was measured by multiplex immunoassay on the Human DiscoveryMAP panel v1.0 (https://rbm.myriad.com/products-services/ humanmap-services/human-discoverymap/) using the Luminex100 platform by RBM. Samples with more than 10% of missing data across analytes were removed, then analytes were excluded if they had missing data for 10% of the samples or values were below the detection limit, in either of the studies. After the QC step, a total of 146 metabolites were included in each dataset of the present study.
Statistical analyses. For each study, prior to the analyses, all analyte values were log-transformed, standardized so the mean for each analyte was equal to zero, and outliers were removed as previously described 12,51,[55][56][57][58][59] . Log-transformed, standardized values were tested for significant deviations from a normal distribution using the Shapiro-Wilk test. We performed a single variant analysis for each analyte using PLINK v1.9 (http://pngu. mgh.harvard.edu/purcell/plink/) 52 , including age, gender, AD status, and the first 2 principal components as covariates. The significance threshold for the joint analyses was defined as p < 5.0 × 10 −8 based on the commonly used threshold thought to be appropriate for the likely number of independent tests with Bonferroni correction. To approximate an internal replication, all SNPs that passed the genome-wide significance threshold had to pass the threshold p < 0.05 in single variant analyses of the individual datasets and had to have similar effect sizes in the same direction. To ensure that results were not confounded by AD status, single variant analyses were performed on all of the AD cases from both datasets separately from all of the controls from both datasets. All genome-wide significant SNPs from the joint analyses also had to have similar effect sizes in the same direction in the case-control stratified analyses. QQ plots were generated for each analysis to illustrate the distribution of the observed and expected p-values for all eligible SNPs 60 . Regional plots showing LD and the location of nearby genes were generated for the top ranking SNPs for each metabolite using LocusZoom v1.1, build hg19/1000 Genomes Mar 2012 EUR (http://csg.sph.umich.edu/locuszoom/) 61 . If more than one significant SNP clustered at a locus, the SNP with the smallest p-value was reported as the sentinel marker. All analyses were performed using BEAGLE v3.3.1 54 , EIGENSTRAT 53 , SAS v9.2 for Linux (copyright © 2008 by SAS Institute Inc) and PLINK v1.07 and v1.9 (http://pngu.mgh.harvard.edu/purcell/plink/) 52 software.

Meta-analyses.
We performed the single variant analyses as described above for ADNI and KADRC separately. We used METAL (version released 2011-03-25, http://www.sph.umich.edu/csg/abecasis/Metal/index. html) 62 to perform meta-analyses of the two datasets for each analyte by combining p-values across studies, weighting each study by its sample size.

Conditional analyses.
To identify additional independent signals in a locus we conducted conditional analyses. We performed a series of sequential conditional analyses by adding the most strongly associated SNP into the regression model as a covariate and testing all remaining regional SNPs for association. This approach was used to determine additional secondary signals and was performed by adding SNPs one at a time until no significance was seen. Consistent with the locus-specific analysis statistical significance for the conditional analysis was defined at p < 5.0 × 10 −8 . This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/