Lymphocyte activation gene-3-associated protein networks are associated with HDL-cholesterol and mortality in the Trans-omics for Precision Medicine program

Deficiency of the immune checkpoint lymphocyte activation gene-3 (LAG3) protein is significantly associated with both elevated HDL-cholesterol (HDL-C) and myocardial infarction risk. We determined the association of genetic variants within ±500 kb of LAG3 with plasma LAG3 and defined LAG3-associated plasma proteins with HDL-C and clinical outcomes. Whole genome sequencing and plasma proteomics were obtained from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Framingham Heart Study (FHS) cohorts as part of the Trans-Omics for Precision Medicine program. In situ Hi-C chromatin capture was performed in EBV-transformed cell lines isolated from four MESA participants. Genetic association analyses were performed in MESA using multivariate regression models, with validation in FHS. A LAG3-associated protein network was tested for association with HDL-C, coronary heart disease, and all-cause mortality. We identify an association between the LAG3 rs3782735 variant and plasma LAG3 protein. Proteomics analysis reveals 183 proteins significantly associated with LAG3 with four proteins associated with HDL-C. Four proteins discovered for association with all-cause mortality in FHS shows nominal associations in MESA. Chromatin capture analysis reveals significant cis interactions between LAG3 and C1S, LRIG3, TNFRSF1A, and trans interactions between LAG3 and B2M. A LAG3-associated protein network has significant associations with HDL-C and mortality. Rodriguez et al. use whole genome sequencing and plasma proteomics from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Framingham Heart Study (FHS) cohorts of the Trans-Omics for Precision Medicine program and perform in situ Hi-C chromatin capture in cell lines isolated from four MESA participants. They demonstrate that lymphocyte activation gene-3 protein networks are associated with HDL-cholesterol and mortality, which could guide the development of precision medicine.

A recent review on the topic of high HDL-C paradox examined findings from two large observational studies (Cardiovascular Health in Ambulatory Care Research Team [CANHEART] and Copenhagen City Heart Study and the Copenhagen General Population Study [Copenhagen Heart Studies]) that identified a U-shaped relationship between mortality and HDL-C showing increased mortality risk with low HDL-C but also with high HDL-C levels 13 . Voight et al 12 . showed that in a mendelian randomization study, using either a single nucleotide polymorphism (SNP) for the endothelial lipase gene (LIPG) or a genetic score encompassing fourteen common SNPs, high HDL-C levels did not predict lower myocardial infarction (MI) risk. The study concluded that selected genetic variants associated with higher HDL-C levels were not causally associated with MI risk. There have now been a number of randomized clinical trials showing that pharmacologic inhibition of cholesteryl ester transfer protein did not reduce or minimally reduced MI risk despite the effect of significantly raising HDL-C levels from baseline values 7,10,14 .
Candidate gene approaches have shown that common and rare genetic variants within the HDL receptor, scavenger receptor class B type I (SCARB1) gene, are significantly associated with increased CVD risk, contributing to the concept of high HDL-C paradox [15][16][17][18][19] . In CARDIoGRAM, a common variant within SCARB1, rs10846744 with effect allele C that resides within an enhancer region in the first intron of the gene, is significantly associated with prevalent CVD 20 . A number of experimental approaches have been used to examine the effects of rs10846744 on distally regulating neighboring genes on chromosome 12 (wherein SCARB1 is located) with a novel physical interaction between SCARB1 and the immune checkpoint molecule lymphocyte activation gene-3 (LAG3 gene) and effects on the LAG3 protein having been shown 15 .
LAG3 is a member of the IgG superfamily and is an important immune checkpoint molecule in regulating further activation of T effector cells 21 . The prevailing paradigm is that the extracellular domain of LAG3 on T cells binds with high avidity to a select region on MHC-II molecules on antigen presenting cells to suppress further activation of T cells and regulate T cell homeostasis 22 . In humans, the LAG3 gene resides on the short arm of chromosome 12 (12p13.32) and is within 8.4 kB of CD4 23 . LAG3 is expressed in B cells, T cells, NK lymphocytes, monocytes, and dendritic cells and its distribution is approximately 50% intracellular and 50% on the cell surface [24][25][26] . Activation of these cells promotes transit of intracellular LAG3 to the cell surface, where extracellular LAG3 is then subject to cleavage by ADAM10 and ADAM17 metalloproteases, resulting in soluble LAG3 (sLAG3) 27 . In addition to transmembrane LAG3 binding to MHC class II to limit effector T cell expansion, in vitro studies have demonstrated that sLAG3 also binds to MHC class II and regulates CD4-driven signaling pathways 28 . A subset of Tregs (alternative Tr1 Tregs) has been characterized using flow cytometric LAG3 expression, with these cells being a major source of secretion of the immunosuppressive interleukin 10 (IL-10) cytokine [29][30][31][32] . Zhu et al. 33 observed that in patients with documented coronary artery disease there was a significantly lower expression by flow cytometry of these CD49b + LAG3 + Tr1 Tregs cells compared with control subjects. Our results and that of Zhu et al 33 . are consistent that humans with LAG3 deficiency are at increased risk for CVD and have lower circulating IL-10 levels.
In participants of the Multi-Ethnic Study of Atherosclerosis (MESA), we observed that deficiency of plasma or soluble LAG3 protein was significantly associated with elevated HDL-C levels, and paradoxically with increased risk of CVD 15 . We also reported that age, smoking, lipid medications, and the SCARB1 rs10846744 were independent predictors of plasma LAG3 levels 15 . In MESA and the Framingham Heart Study (FHS), we have now directly examined the region surrounding the LAG3 locus to identify common and rare variants associated with plasma LAG3 and HDL-C concentrations, and with clinical outcomes. Additionally, in adjusted models, we examined the association of LAG3 coexpressed proteins and their association with HDL-C, coronary heart disease (CHD) and all-cause mortality. We have now identified a common noncoding variant within the LAG3 gene that is significantly associated with plasma LAG3, and a number of LAG3 co-expressed proteins that are significantly associated with LAG3 protein, HDL-C, and all-cause mortality.

Results
Characteristics of the MESA and FHS participants. The median baseline age of participants from MESA (Exam 1) was 61 years (with an interquartile range of 53.0-69.0 years); 50.7 % were women; and self-reported race/ethnic distribution was 41.5% White, 23.2% Hispanic, 22.8% African-American, and 12.5% Chinese-American (Table 1). In FHS (Offspring Exam 5), the median age of participants was 55 years (with an interquartile range of 47-63 years); 53.5% were women; and were 100% White participants (Table 2).

Genetic associations with phenotypic traits
Discovery study in MESA. Meta-analysis identified one significant genetic variant within the LAG3 region at FDR < 0.05 (based on a Z-test from meta-analysis for the regression coefficient; Table 3, Fig. 1). The common LAG3 variant rs3782735 (chr12:6775910) effect allele G was positively associated with LAG3 protein levels (Beta = 0.24 for the additive genetic effect on inverse normal transformed protein level; 95% confidence interval [CI] = 0.14-0.34; Z-stat = 4.8; P-value = 4.28 × 10 −6 ; FDR = 0.014) at Exam 1. In race/ethnic-stratified analysis, rs3782735 showed statistically significant association in the larger White group only (FDR = 0.003) (associations of rs3782735 with other phenotypic traits shown in Supplementary Data 1). The common C1S variant rs7970720 (chr12:7048232) effect allele G was negatively associated with peripheral vascular disease as measured by ABI at Exam 5 (FDR = 0.016) ( Supplementary  Fig. 1). In race/ethnic-stratified analysis, rs7970720 was statistically significant in the larger White group only (FDR = 0.0004).
Validation study in FHS. In FHS, rs3782735 was significantly associated with LAG3 protein levels (Beta = 0.105 for the additive genetic effect on log-transformed protein level; 95% CI = 0.018-0.192; t-stat = 2.38; P-value = 0.018 based on a two-sided t-test for the coefficient from regression with n = 1007), but rs7970720 did not demonstrate association with ABI (P-value > 0.05).
Discovery study in FHS. In the FHS cohort, we did not identify any SNPs that were significantly associated with LAG3 protein levels after adjusting for multiple testing (FDR < 0.05 based on a two-sided t-test for each coefficient from regression with n = 1007). Thus, there was no follow-up of FHS-based findings in MESA (Fig. 2, validation study 2).
LAG3 protein association analysis Discovery study in MESA and validation in FHS. Next, we examined the extent of plasma LAG3-associated protein networks by performing a multivariate linear regression analysis with Overlap of proteins discovered and validated in both MESA and FHS. We identified 183 LAG3 associated proteins that overlapped among the validated lists of associated proteins in FHS, MESA Exam 1, and MESA Exam 5 (Supplementary Data 6). Pathway analysis of the 183 overlapped proteins using GeneAnalytics TM algorithms revealed sixty-four significant high disease score matches   Race/ethnic-specific P-values are based on two-sided t-tests for regression coefficients with covariate adjustment for age, sex, study site, principal components (PCs) of ancestry (2 PCs for White, 1 PC for Chinese, 1 for African-American, and 3 for Hispanic, and 5 PCs for race/ ethnic pooled analyses), self-reported race/ethnicity (pooled-group analysis only), HDL-C, LDL-C, triglycerides, body mass index (BMI), fasting glucose, SBP, diastolic blood pressure (DBP), current smoking, former smoking, and lipid medication use.
P-values combined across race/ethnic groups are based on Z-tests for fixed-effect meta-analysis across the four groups.
with malignancy, autoimmune disease, neurological disorders, and vascular disease and one hundred fifty-one high scores in the innate immune system pathway (Supplementary Data 7).
LAG3 protein network and association with HDL-C. Since we previously observed a significant inverse association between plasma LAG3 and HDL-C 15  LAG3 protein network and association with clinical outcomes (coronary heart disease and all-cause mortality). Of the 183 proteins examined for association with CHD and all-cause mortality in MESA, we did not observe any results at FDR < 0.05. In FHS, while there were no associations at FDR < 0.05 for CHD, we observed that 18 of the 183 LAG3-associated proteins demonstrating FDR < 0.05 were significantly associated with all-cause mortality based on two-sided Z-tests for the coefficients from Cox regression with a total sample of n    Chromatin capture interactions between LAG3 and loci associated with HDL-C and all-cause mortality. In a hypothesis seeking effort, we performed in situ Hi-C chromatin capture to examine possible chromatin high frequency interactions between LAG3 (chr12: 6772483) and the loci associated with HDL-C (LRIG3, GFRA1, IGF1R, and DCTPP1) and all-cause mortality (TNFRSF1A, TNFRSF1B, B2M, and CST3). As a representative example of the Hi-C assays, results from an African-American female MESA participant homozygous for the SCARB1 rs10846744 effect C allele showed significant cis interactions between LAG3 and LRIG3 (chr12: 58872149; interaction score [IS] = 17.4; n = 22 read pairs; P < 0.0001 based on two-sided Student t-test) (Fig. 4). No significant trans interactions were observed between LAG3 and GFRA1 (chr10:116056925), IGF1R (chr15:98648539) or DCTPP1 (chr16: 30423615).

Discussion
LAG3 is an important immune checkpoint molecule with continued interest in its role as an immuno-oncology therapeutic agent 34 . We were the first to report the association of LAG3 protein deficiency with HDL-C and increased risk of MI, and had identified independent predictors for plasma LAG3 protein 15 . In the present study, we examined the association of variants within ±500 kb of the LAG3 locus in TOPMed MESA and FHS to identify those significantly associated with plasma LAG3 protein concentrations and clinical outcomes. We identified the LAG3 rs3782735 variant as significantly associated with plasma LAG3 protein levels, and neighboring C1S rs7970720 variant as significantly associated with peripheral vascular disease (ABI) in the MESA population ( Supplementary Fig. S1). The LAG3 rs3782735 is an intronic variant that appears to reside within an enhancer region, has been associated with multiple myeloma risk in women, but has not yet been associated with lipids, CHD or mortality risk 35 .
Given that plasma LAG3 was shown to be inversely associated with HDL-C, we examined the association of the 183 LAG3-associated overlapping proteins in MESA and FHS with Fig. 4 Hi-C near cis and cis interactions from the LAG3 locus on chromosome 12: MESA. We performed in situ Hi-C analysis in EBV-transformed B lymphoblasts from two female (one African-American and one Hispanic) MESA carriers homozygous for the SCARB1 rs10846744 reference G and two female (one African-American and one Hispanic) MESA carriers homozygous for the effect C alleles. The in situ Hi-C analysis was performed as recommended by the 4D Nucleome Consortium using the four base pair cutter DpnII restriction enzyme and high read depth next gen sequencing (NGS) to maximize resolution of the high frequency interactions between chromatin contacts (https://www.4dnucleome.org/protocols.html). Each cell library underwent deep NGS at read depths between 1.4-3.3 billion and this was done independently twice as technical replicates for each cell library. Bioinformatic analysis was conducted using Hi-C Pro software with binning of the matrix at different resolutions and iterative correction and eigenvector decompensation normalization of the matrix for each of the four libraries 62,63 . Readouts were all valid paired-end reads and corresponding high frequency contact interaction scores. We then used data generated from the million binning resolution and filtered it based on the LAG3 chromosomal coordinates (chr12:677250-6778455) using human assembly GRCh38/hg38 (https://genome.ucsc.edu/cgi-bin/hgGateway), which yielded both cis (chr12) and trans interactions. We set interaction scores at the LAG3 locus arbitrarily at 1 and then compared interaction scores from direct and indirect cis and trans interactions. Student t-test was performed with P ≤ 0.05 considered statistically significant. The results shown are representative from one of the MESA cell lines, with the schematic representing the near cis (panel a) and cis interactions (panel b) from the LAG3 locus on chromosome 12.
HDL-C using adjusted models as shown in Fig. 3. In MESA and FHS, GFRA1 showed an inverse association with HDL-C, while LRIG3, IGF1R, and DCTPP1 were all positively associated with HDL-C. A PubMed literature search did not reveal a known connection between these four proteins and LAG3 or connections with HDL-C except for LRIG3, suggesting novel findings of these proteins with lipid metabolism. LRIG3 has been shown to be associated with HDL-C in both humans and animals [36][37][38] . In particular, FHS investigators examined single-locus and epistasis variants on HDL-C and identified a region near LRIG3 as being significant 36 . In another study in African-American subjects, the region near LRIG3 was found to be significantly associated with incident congestive heart failure (CHF) 37 . In Lrig3 −/− mice, investigators showed that aging mice demonstrated cardiac hypertrophy and low plasma HDL-C levels 38 .
In MESA plasma LAG3 was shown to be inversely associated with increased MI risk 15 . Therefore, we examined the association of the overlapping 183 proteins with CHD and all-cause mortality in MESA and FHS as shown in Fig. 3. In MESA, there were significant associations of EFNA5, ENG, IL1R1, and NRCAM with CHD risk but the number of events were extremely small and thus caution was taken in interpreting these results. In MESA, we did not observe significant association with all-cause mortality. In FHS, there were 18 proteins significantly associated with all-cause mortality, with 4 proteins nominally associated with mortality in MESA. Abers et al 39 . recently reported the significant independent association of soluble TNFRSF1A with mortality in patients hospitalized with COVID-19. Schnabel et al 40 . reported that TNFRSF1B (also known as TNFR2) was significantly associated with cardiovascular disease and mortality in FHS. Zaghlool et al 41 . performed a stepwise epigenome-wide association analysis with plasma proteomics measured by SOMAscan in two cohorts (Cooperative Health Research in the Region of Augsburg and Qatar Metabolomics Study on Diabetes), and identified nine association pathways. One of these was identified as a chronic inflammatory pathway represented by NLRC5 and included LAG3, B2M, CD48, CXCL10, FCGR3B, CD163, and CXCL11. We have now identified a LAG3-association network with all-cause mortality that includes B2M (a component of the class I major histocompatibility complex molecule involved in antigen presentation 42 ) and CD48 (Supplementary Data 12). We also identified that LAG3 and B2M have significant trans chromatin interactions, which is consistent with the trans contacts between NLRC5 and LAG3 and B2M reported by Zaghlool et al 41 . Also of importance, these authors identified an inverse association of LAG3 with HDL-C, confirming our initial observation.
Pioneers in the field of 3D chromatin architecture have developed and continue to refine methodologies that evaluate the effect of cis and trans gene-gene interactions on downstream gene regulation [43][44][45][46] . We used Hi-C assays to evaluate unbiased global chromatin interactions (many-to-many) without immunoprecipitation followed by high depth NGS. We then used a number of bioinformatic programs to assess the quality of the technical and biological replicates in order to evaluate the chromatin interactions from the LAG3 locus [47][48][49] . In particular, HiCcompare allows statistical analysis of technical and biological replicates for cis interactions, while trans interactions can be challenging 50 . We observed that LAG3 has strong chromatin interactions with C1S, LRIG3, and TNFSRSF1A, all in close proximity to the LAG3 locus, which is consistent with observations that chromatin contact interactions are strongest with small differences in genomic distance. While we did observe significant LAG3 chromatin interactions with trans contacts, we want to proceed with caution as the absolute value of the interaction scores were low as well as the number of read pairs. It could be possible there are biological effects with the LAG3 trans chromatin contacts but we acknowledge there are post-translational processes that more likely influence LAG3 protein interactions. Nonetheless, the significant trans interactions lends support to the hypothesis that trans chromatin contacts could explain some of the variance in the LAG3-associated protein network.
In conclusion, we have identified a common LAG3 variant that is associated with plasma LAG3 protein levels, while adjacent genes in significant chromatin contact with the LAG3 locus were associated with HDL-C (LRIG3) and clinical outcomes such as all-cause mortality (TNFRSFA1 and B2M). This LAG3-associated network analysis identifies novel targets for further study in HDL metabolism, cardiovascular diseases, and all-cause mortality.

Methods
Trans-omics for precision medicine (TOPMed). This study was approved by the TOPMed Publications & Presentations Steering Committees with data access provided by an approved project (#21279) through a database of Genotypes and Phenotypes (dbGaP) application. The study cohorts that comprise this analysis include MESA and FHS.
Study participants. MESA is a longitudinal study of subclinical CVD and risk factors that predict progression to clinically overt CVD or progression of the subclinical disease. The first clinic visits (Exam 1) occurred from 2000 to 2002 in 6,814 participants recruited from 6 field centers across the United States, and all participants were free of clinical CVD at Exam 1. The self-reported ancestry distribution is approximately 38% White, 28% African-American, 22% Hispanic, and 12% Asian (predominantly of Chinese descent) 51 . MESA has been enhanced by many ancillary studies focused on specific phenotypic and exposure domains. One ancillary study (MESA Family Study) exclusively recruited African-American and Hispanic family members specifically for genetic studies. In contrast, the FHS is a single community-based cohort initiated in 1948. Three generations of participants have been recruited, and the majority of them were white individuals of European ancestry. Participants were invited to attend physical examinations every 2-8 years. The current study was restricted to the Offspring cohort participants who attended their fifth clinical examination cycle during 1991-1995 52 .
All MESA participants provided written informed consent for participation at the six field sites, and FHS participants provided written informed consent for participation at the single site in Framingham, MA. MESA and FHS study protocols were both reviewed and approved by the Institutional Review Boards (IRBs) at each of the participating study sites, as well as at the University of Virginia (AM), Boston University Medical Campus (HL) and the University of Connecticut Health (AR).
Genotype data. We used the TOPMed Freeze 8 whole genome sequencing (WGS) data, focused on the region within ±500 kb of the LAG3 gene for examination of our genotype data. TOPMed WGS was conducted at a mean depth of >30X using Illumina HiSeq X Ten instruments. Variant discovery and genotype calling was performed jointly across all studies by the TOPMed Informatics Research Center (IRC) using the GotCloud pipeline. Variant quality control (QC) was performed by the IRC using support vector machine filtering to identify high quality variants. WGS sample QC including sample identity and consent checks was performed by the TOPMed Data Coordinating Center.
Proteomic data. For MESA Exam 2 (n = 5623), LAG3 protein ELISA kits were purchased from RayBiotech, Inc. (Norcross, GA) and LAG3 was measured per the manufacturer's instructions 15 . Briefly, aliquots of fasting plasma samples stored at −80°C were thawed, diluted 3-fold, and then 100 µl were used for duplicates per sample for plasma LAG3 measurement. The intra-assay coefficient of variance was <10% and inter-assay CV < 12%. The R 2 for the standard curve, which was run for each plate, had mean values >0.98. This ELISA has not been validated for clinical use. The single-stranded DNA aptamer-based SOMAscan™ proteomics platform was used to assay plasma samples from MESA Exams 1 (years 2000-2002) and 5 (years 2010-2012), and from FHS Exam 5 (years 1991-1995). All MESA samples were profiled with Version 1.3k. FHS samples were either profiled with Version 1.1k with 1124 aptamers or with Version 1.3k with 1305 aptamers. All assays were performed using SOMAscan reagents according to the manufacturer's detailed protocol 53 . The strength of this platform is its ability to measure multiple proteins in a single small aliquot. As reported by SOMALogic, the assay measures >1300 proteins from a small biological sample (150 µl) with low limits of detection and 5% median coefficient of variation 53,54 . This technology uses DNA aptamers (short single-stranded oligonucleotides) that bind to proteins and is quantified by hybridization to custom DNA microarrays. The units are reported as Relative Fluorescent Units (RFU) that are directly proportional to the amount of target protein. data. Fasting blood samples were drawn and processed using a standardized protocol and sent for measurement of lipid levels 55 . LDL-C was calculated using the Friedewald formula. Triglycerides were measured using a glycerol-blanked enzymatic method with the Triglyceride GB reagent on the Roche COBAS FARA centrifugal analyzer. Measures of subclinical atherosclerosis examined included the presence (defined by a cutoff of coronary artery calcium [CAC] > 0) or absence of CAC, calcium Agatston score (phantom-adjusted) and ultrasound measurements of intima-media thickness (IMT) in millimeters (mm) for common and internal carotid IMT, and the ankle brachial index (ABI) 56 . Systolic blood pressure (SBP) measurements in the bilateral brachial, dorsalis pedis, and posterior tibial arteries were obtained in the supine position using a hand-held Doppler instrument with a 5-mHz probe. Cardiovascular events including stroke, MI, CHD and all-cause mortality were adjudicated by a MESA committee of neurologists, cardiologists, and physician epidemiologists, who provided a detailed description of the cardiovascular event adjudication process 57 . In FHS, the Friedewald formula was used to calculate LDL-C, and a modification of the Kessler-Lederer method was used to measure triglyceride levels 58,59 .
Phenotypes for genetic association. In MESA, eleven phenotypes were used for genetic association study analyses: plasma LAG3 protein levels as measured by ELISA (Exam 2) and SOMAscan proteomic assay (Exams 1 and 5), HDL-C, LDL-C, triglycerides, CAC presence/absence, Agatston calcium score (among those with presence of CAC), common carotid IMT, internal carotid IMT, ABI, stroke, and MI. All of the phenotypes were examined as quantitative traits, except for CAC (presence/absence), stroke and MI which were examined as dichotomous traits. We performed inverse normal transformations for LAG3 protein levels, and performed natural-log transformation for Agatston calcium score, common carotid IMT, and internal carotid IMT due to non-normal distribution of these outcome traits. Genetic analysis of clinical events in MESA was carried out based on MESA adjudication through the year 2017.
Genetic association analysis in MESA. We approached discovery for genetic association analyses in MESA in two ways (Fig. 1). First, we stratified data by race/ ethnic groups, and performed genetic association analyses in each race/ethnic group using EPACTS software for quantitative phenotypes (https://github.com/ statgen/EPACTS) and SNPTEST software for dichotomous outcome traits 60 . Quantitative trait analysis was conducted including all participants with available genotype and phenotype data, and family structure was accounted using linear mixed models in EPACTS, while the dichotomous trait analysis was conducted for a subset of participants with first-degree relatives removed. We then meta-analyzed the genetic association results across all four race/ethnic groups by fixed effects model using METAL 61 . Second, we combined all four MESA race/ethnic groups into one simple pooling group without weighting and performed genetic association analyses using EPACTS and SNPTEST. Analyses were conducted across MESA Exams 1, 2 and 5 due to the availability of LAG3 protein measurements at these examination times ( Table 1).
The association of genetic variants with eleven phenotypes was conducted with adjustment for some of the following covariates: age, sex, study site, principal components (PCs) of ancestry (2 PCs for White, 1 PC for Chinese, 1 for African-American, and 3 for Hispanic, and 5 PCs for race/ethnic pooled analyses), selfreported race/ethnicity (pooled-group analysis only), HDL-C, LDL-C, triglycerides, body mass index (BMI), fasting glucose, SBP, diastolic blood pressure (DBP), current smoking, former smoking, and lipid medication use. Covariate adjustment for all lipid variables (HDL-C, LDL-C, and triglycerides) was omitted in the analysis of these three traits as phenotypic outcomes.
In genetic association analysis, we excluded variants with heterozygosity count (HC) less than 30 overall (for quantitative traits) or HC less than 30 among cases (for dichotomous traits). We then applied the false discovery rate (FDR) method to control for inflation of the type I errors due to multiple comparisons. Variants with FDR < 0.05 were considered statistically significant in this study.
Validation study. We performed validation analyses for genetic association analysis of LAG3 protein levels in two ways (Fig. 1). First, we selected significant variants from genetic association analysis/meta-analysis results of MESA (discovery) and investigated the association of these variants in FHS (validation) that mainly focuses on white individuals of European ancestry. Second, we selected significant variants from FHS (discovery) and verified if they were also significant in MESA (validation). Genetic association analysis in FHS followed the same approach as that used for pooled analysis in MESA, with the exception that analysis in FHS did not require covariate adjustment for race/ethnicity and study site; as noted for MESA, the linear mixed models implemented in EPACTS do account for family relationships that are present in the FHS data.
LAG3 protein association analysis. To identify proteins associated with LAG3, we performed linear regression analysis using inverse normal transformed protein levels (in MESA) and log-transformed protein levels (in FHS) for LAG3 levels as an independent variable and the individual proteins as dependent variables.
These regression analyses were carried out separately in MESA Exam 1, MESA Exam 5, and FHS Offspring Exam 5 using SOMAscan proteomic measurements. In MESA, regression analyses were carried out for an unrelated subset of participants using linear regression, while in FHS related individuals were included and familial relationship adjusted using linear mixed models. LAG3 protein association analysis included covariate adjustment for age, sex, study site (in MESA), race/ethnicity (in MESA), PCs of ancestry, pack-years of smoking (in MESA), current smoking (in FHS), lipid medication, plate ID and batch (in FHS). In FHS, regression analyses did not require adjustment for race/ethnicity and study site as all of the FHS participants were white and from a single study site. In MESA, all proteomics tests were carried out in a single batch, so analyses did not require adjustment for batch. Independent discovery and validation of LAG3-associated proteins in MESA and FHS proceeded following the approach described for genetic association analysis of LAG3 protein levels (Fig. 1).
For a set of 183 LAG3-associated proteins identified by discovery and supported by validation in both MESA and FHS, we further carried out regression analyses to examine their association with HDL-C, CHD and all-cause mortality. Protein association was conducted using adjudicated events in MESA through the year 2017. In MESA, we examined associations with inverse normal transformed protein levels, while in FHS, we used log-transformed protein levels. Association of HDL-C with protein levels was examined by linear regression with covariate adjustment for age, sex, study site (in MESA), race/ethnicity (in MESA), PCs of ancestry, BMI, triglycerides, pack-years of smoking (in MESA), current smoking (in FHS), current alcohol use, LAG3 protein level, plate ID and batch (in FHS). Association of baseline protein levels with CHD and all-cause mortality was examined under a Cox proportional hazards model with covariate adjustment for age, sex, race/ethnicity (in MESA), PCs of ancestry, BMI, total cholesterol, HDL-C, SBP, DBP, pack-years of smoking (in MESA) and current smoking status (in FHS) and batch (in FHS).
Chromatin capture analysis. We performed in situ Hi-C analysis in EBVtransformed B lymphoblasts from two female (one African-American and one Hispanic) MESA carriers homozygous for the SCARB1 rs10846744 reference G and two female (one African-American and one Hispanic) MESA carriers homozygous for the effect C alleles. The in situ Hi-C analysis was performed as recommended by the 4D Nucleome Consortium using the four base pair cutter DpnII restriction enzyme and high read depth next gen sequencing (NGS) to maximize resolution of the high frequency interactions between chromatin contacts (https://www.4dnucleome.org/ protocols.html). Each cell library underwent deep NGS at read depths between 1.4-3.3 billion and this was done independently twice as technical replicates for each cell library. Bioinformatic analysis was conducted using Hi-C Pro software with binning of the matrix at different resolutions and iterative correction and eigenvector decompensation normalization of the matrix for each of the four libraries 62,63 . Readouts were all valid paired-end reads and corresponding high frequency contact interaction scores. We first used HiCcompare to evaluate the quality of the technical replicates of each library 50 . We then used data generated from the million binning resolution and filtered it based on the LAG3 chromosomal coordinates (chr12:677250-6778455) using human assembly GRCh38/hg38 (https://genome.ucsc.edu/cgi-bin/hgGateway), which yielded both cis (chr12) and trans interactions. We set interaction scores at the LAG3 locus arbitrarily at 1 and then compared interaction scores from direct and indirect cis and trans interactions. Student t-test was performed with P ≤ 0.05 considered statistically significant.
GeneAnalytics LAG3 protein network analysis. We inputted the protein list generated from the LAG3 protein association analyses into GeneAnalytics (GA) (https://ga.genecards.org/#input). Based on GA proprietary algorithms, scores that were identified as high (adjusted p ≤ 0.0001) were analyzed for the LAG3 protein network analysis.
Statistics and reproducibility. The sample sizes for the MESA population was 3867 and for FHS study it was 1913. Multivariate linear regression analyses were performed using fully adjusted models and FDR ≤ 0.05 was considered significant. For the HiC chromatin capture assays, two-sided Student's t-test was performed and P ≤ 0.05 was considered significant. Statistical software programs used were JMP v15 or R.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Data access for MESA (phs001416) and FHS (phs000974) was approved by the TOPMed Publications & Presentations Steering Committees with data access provided by an approved project (#21279) through a database of Genotypes and Phenotypes (dbGaP) application. Hi-C chromatin capture data are available from dbGAP by using MESA phs000209 as the identifier. Interested researchers would have to apply to TOPMed and dbGaP in order to obtain access to these datasets. Source data underlying Fig. 3a