Chromosome Xq23 is associated with lower atherogenic lipid concentrations and favorable cardiometabolic indices

Autosomal genetic analyses of blood lipids have yielded key insights for coronary heart disease (CHD). However, X chromosome genetic variation is understudied for blood lipids in large sample sizes. We now analyze genetic and blood lipid data in a high-coverage whole X chromosome sequencing study of 65,322 multi-ancestry participants and perform replication among 456,893 European participants. Common alleles on chromosome Xq23 are strongly associated with reduced total cholesterol, LDL cholesterol, and triglycerides (min P = 8.5 × 10−72), with similar effects for males and females. Chromosome Xq23 lipid-lowering alleles are associated with reduced odds for CHD among 42,545 cases and 591,247 controls (P = 1.7 × 10−4), and reduced odds for diabetes mellitus type 2 among 54,095 cases and 573,885 controls (P = 1.4 × 10−5). Although we observe an association with increased BMI, waist-to-hip ratio adjusted for BMI is reduced, bioimpedance analyses indicate increased gluteofemoral fat, and abdominal MRI analyses indicate reduced visceral adiposity. Co-localization analyses strongly correlate increased CHRDL1 gene expression, particularly in adipose tissue, with reduced concentrations of blood lipids.


Results
Baseline characteristics, blood lipids, and chromosome X genotypes. TOPMed sequences were aggregated and aligned, and variants were called by the TOPMed Informatics Research Center. A total of 65,367 out of 140,000 individuals in TOPMed freeze 8 with WGS data, including X chromosome sequence data had harmonized lipid levels available (Supplementary Fig. 1). Fortyfive individuals with anomalous X chromosome copy number were excluded, leaving 65,322 individuals for analysis. 40,577 (62.1%) individuals were female and mean (standard deviation [SD]) age was 52.4 (14.9) years. Across all 21 included cohorts, 29,513 (45.2%) were white, 16,431 (25.2%) black, 13,432 (20.6%) Hispanic, 4714 (7.2%) Asian, 1182 (1.8%) Samoan, and 50 (0.1%) Native American (Supplementary Table 1; Supplementary Fig. 2). The included studies were largely observational cohorts with some variations in ascertainment schemes as described in the Supplementary Note. Blood lipid distributions were generally similar across cohorts with some differences due to differences in study design and ancestry (Supplementary Table 2 and Supplementary Fig. 3). After adjusting for lipid-lowering medicines within each cohort and ancestry, we generated residuals within each cohort and race group adjusted for age, age 2 , sex, 11 principal components of ancestry, and cohort-specific covariates. These residuals were inverse rank normalized and multiplied by the standard deviation within each cohort and race group to obtain effects in mg/dl units (see Methods) ( Supplementary  Fig. 4). Among 65,322 TOPMed participants with lipid levels and WGS, we identified 19,898,222 total variants on the X chromosome by WGS. Of these variants, 88,008 (0.4%) were nonsynonymous variants and 4632 (0.02%) were rare (MAF < 1%) predicted protein-truncating variants. As expected, participants of African ancestry had the most X chromosome variants (Fig. 1a). Likely due to sample size differences, there were overall more total variants observed in our dataset among white participants compared to other ancestries (Fig. 1b). Within the X chromosome, females had a greater average [SD] number of variants per individual (133,255 [22,455]) than males (90,117 [12,166]), as expected (Supplementary Table 3). Generally, most of the variation observed across individuals was uncommon (i.e., 98.8% of variants had MAF < 5%) (Supplementary Table 4).
X chromosome single-variant association with lipid levels. In single-variant discovery analyses in TOPMed, we performed X chromosome-wide association analyses for genetic variants with minor allele count >20 that are not in the pseudoautosomal region, yielding 2.2 million analyzed of the 19.8 million detected. To maximize power, all samples (i.e., males and females) were included in the linear mixed model association analyses with SDadjusted residuals of lipid levels as the outcome, where adjustments included sex ( Supplementary Fig. 5).
The three variants occurred on chrXq23 and were all in at least moderate linkage disequilibrium across all included TOPMed participants (Supplementary Fig. 7 and Supplementary Table 8). They were also in moderate linkage disequilibrium with a previously described nearby variant, rs5985471 12 , (r 2 0.61-0.76). All three associated variants in our dataset have similar nonreference allele frequency (0.34-0.43), which was also similar between males and females. We observed similar associations for both males and females within TOPMed except male rs5985504-T carriers had greater decrease in triglycerides compared to female rs5985504-T carriers (P interaction = 0.001) (Supplementary Table 9).
The minor alleles for these variants are common in all TOPMed ancestries except for Asian Americans (MAF 0.02) and Samoans (MAF 0.01). Nevertheless, effect estimates were largely of similar magnitude across ancestries in TOPMed for total cholesterol (Supplementary Table 10) with no evidence of heterogeneity (P heterogeneity > 0.05).
The three chrXq23 variants were associated with reduced atherogenic lipoproteins (i.e., total cholesterol, triglycerides, and LDL-C) ( Table 1). The rs5942634-T allele is an intergenic variant and is 8 kb downstream from RTL9 (also referred to as RGAG1 in the literature), and was the top variant for total cholesterol, associated with 1.95 mg/dl lower concentration (P = 2 × 10 −16 ). The rs5942648-A allele occurs 81 kb downstream, is intergenic between RTL9 and CHRDL1, and was the top variant for LDL-C, associated with 1.53 mg/dl lower concentration (P = 1 × 10 −12 ). The rs5985504-T allele resides 60 kb further downstream and is 68 kb from CHRDL1 and was the top variant for log(triglycerides) leading to 2% lower triglycerides concentration (P = 4 × 10 −11 ). Overall, the associated variants reside within a~0.22 Mb linkage disequilibrium block spanning RTL9 and CHRDL1 (Supplementary Fig. 7). Within this block, variants within predicted active adult liver enhancers are in proximity to both the RTL9 and CHRDL1 genes ( Supplementary Fig. 8). Only two variants reside within both an adult liver enhancer and DNase hypersensitivity site-rs2883091 in an intron of RTL9, and rs2143760 residing 4 kb from CHRDL1 but 214 kb from RTL9. These variants are in at least moderate linkage disequilibrium (r 2 > 0.60) with the top associated variants in the locus. Virtual 4 C data additionally demonstrate a contact between the rs5985504 site and upstream of CHRDL1 (Supplementary Fig. 9).
To determine whether our signal was independent of previously reported variants in the region, we performed conditional analysis for the associated between total cholesterol and rs5942634 with rs5943057 11 , rs5985471 12 , and rs5942937 13 (Supplementary Table 11). Previously reported SNPs were highly associated with total cholesterol when the variants were individually modeled. However, after adjusting for our reported total cholesterol variant (rs5942634), the known variants have dramatically lower effect estimates and are no longer associated with total cholesterol. On the other hand, rs5942634 remains marginally associated with total cholesterol and with a less of a change in effect size after adjusting for the three known variants. Similar results were obtained when adjusting the association between total cholesterol and rs5942634 for the individual previously reported variants in the region (results not shown). This indicates that rs5942634 is only partially explained by the three reported variants.
Phenome-wide association Of Chrxq23 variants. Given prior genetic associations of LDL-C-lowering and triglyceride-lowering autosomal variants with lower risk for CHD, we hypothesized that sex chromosome variants lowering LDL-C or triglycerides would also lower risk for CHD. In HUNT, UK Biobank, and FinnGen (Supplementary Table 12), we observed that the top lipid-lowering alleles at this locus showed a reduced risk for CHD (Fig. 2). We found a 0.98 (95% CI 0.96, 0.99; P = 1.7 × 10 −4 ) odds of CHD for each rs5942634-T allele, the lead cholesterol-lowering variant (alpha = 0.05 for the single haplotype assessment), and a correlation between the effect sizes of variants on total cholesterol in the chrXq23 locus and the effect sizes of these variants on CAD (r = 0.25), T2D (r = 0.33), and BMI (r = −0.34) ( Supplementary  Fig. 10).
To explore the range of phenotypes associated with the chrXq23 locus, we evaluated the associations of each of these three variants with 80 manually curated diverse clinical traits and conditions in the UK Biobank (Supplementary Table 13). Given the high degree of correlation among these variants, phenomewide association results were similar (Supplementary Tables 14-16). As expected, the strongest associations were for reduced odds of hypercholesterolemia. Associations reaching a P < 6.3 × 10 −4 (P < 0.05/80 traits) included reduced odds for diabetes mellitus type 2 (T2D), hypertension, and glaucoma, but increased odds for ever smoking as well as increased body-mass index (BMI) and body fat percentage. Notably, we observed lower odds of T2D for rs5942648 (OR = 0.97; 95% CI 0.96, 0.99; P = 1.4 × 10 −5 ) (Fig. 2).
We additionally explored the association between each of these three variants with lipoprotein subspecies identified through nuclear magnetic resonance spectroscopy (NMR) within the Framingham Heart Study and Multi-Ethnic Study of Atherosclerosis cohorts (up to 6356 individuals). While we did not find any associations that passed a Bonferroni-corrected significance threshold (0.05/(3 SNPs × 16 lipoprotein subspecies) = 0.001; Supplementary Table 17), we found two lipoprotein subspecies  associated with suggestive evidence (p < 0.05), including greater concentration of medium HDL particles (but no effect on small or large HDL particles) and greater LDL size. We assessed for evidence of replication for indices related to LDL size (alpha 0.05) since the chrX variants associated with LDL-C. Among 6443 participants of the Atherosclerosis Risk in Communities cohort, we concordantly observed a −0.034 SD (P = 0.022) lower concentration of small dense LDL for rs5942648-A. Among 365,365 participants of the UK Biobank, when using LDL-C/ apolipoprotein B ratio as a proxy for LDL particle size, we observed a nominal increase in LDL size even with adjusting for both LDL-C and apolipoprotein B (Beta = 1.1 × 10 −5 , P = 0.048).
Rare pathogenic variants in CHRDL1 were previously linked to X-linked recessive megalocornea, a condition characterized by enlarged corneal diameters with associated complications, including reduced visual acuity. Given these prior observations, we asked whether common variants associated with cholesterol at the CHRDL1 locus were associated with differences in visual acuity. Among 112,842 UK Biobank participants (46.5% women; median age at assessment 58.5 years), we observed no association of lipid-associated chrXq23 alleles with altered visual acuity (P > 0.05; Supplementary Table 19). Given our sample size of 112,842 and SNP frequency of 34.4%, we had >99% power to detect effects >1/10th of a standard deviation unit of visual acuity at an alpha of 0.05.
Interrogating eQTL data for a single variant may lead to biased interpretations for causal gene inference. Therefore, we colocalized eQTL results for 8 genes (i.e., ACSL4, TMEM164, AMMECR1, RTL9, CHRDL1, PAK3, CAPN6, DCX) within the ChrXq23 region across prespecified lipid-related tissues (i.e., subcutaneous adipose, terminal ileum, visceral omentum adipose, whole blood, liver, and skeletal muscle) to relate aggregate blood lipid-association data with gene expression data. We observe that increased gene expression of CHRDL1 shows consistent colocalization with decreased cholesterol p-values < 5.7 × 10 −9 are considered significant. rs5942634 was the top association for total cholesterol, rs5985504 for triglycerides, and rs5942648 for LDL-C. Since the variants were in at least moderate linkage disequilibrium (minimum pairwise r 2 = 0.61), there was evidence of association across all three of the aforementioned lipid traits for these three variants.
Rare chromosome X variant association analyses. We performed SKAT gene-based tests of rare variants (MAF < 1%) across the X chromosome implemented by the GENESIS package within the TOPMed samples. We tested a maximum of 746 genes with more than 1 rare variant and at least 10 individuals carrying a minor allele with each lipid trait. No genes reached a Bonferroni-corrected significance threshold (0.05/746 = 6.7 × 10 −5 ; Supplementary Table 22).

Discussion
Using deep-coverage next-generation sequencing of the X chromosome in the NHLBI TOPMed program, we identified a locus that was previously associated with lipids in primarily European ancestry datasets, which we now extend to diverse ancestries along with strong replication in two independent studies. In addition to replicating the associations of chrXq23 lipid-lowering alleles with reduced odds for CHD, we also observe associations with reduced T2D odds and favorable adiposity indices. These observations allow us to draw several conclusions about X chromosome genetic variation with blood lipid levels, as well as related cardiometabolic effects. First, bioinformatic analyses implicate CHRDL1 as a candidate causal gene for the association of chrXq23 variants with lipids. Based on genomic proximity of the strongest signal, RTL9 was assigned as the likely causal gene in prior work 12 . However, colocalization analyses strongly prioritize increased CHRDL1 gene expression in lipid-related, particularly adipose, tissues with reduced lipoprotein measures over other genes in the region. CHRDL1 is not a previously known Mendelian lipid gene. In our study, disruptive rare coding variants in CHRDL1 were not significantly associated with lipids, nor was any gene in the region.
Second, despite observing an association with increased BMI, chrXq23 lipid-associated alleles may lead to favorable effects on adiposity. We observed that chrXq23 lipid-lowering alleles were associated with increased gluteofemoral adipose tissue. Autosomal alleles similarly linked to expansion of gluteofemoral adipose tissue are associated with favorable risk for CHD and T2D 28,29 . Autosomal alleles associated with body fat distribution are also associated with various functional adipose measures, including morphology, lipolysis, and lipogenesis 30 . Recent gene expression analyses of human adipose-derived stromal cells showed persistent upregulation of CHRDL1 after inducing adipogenesis 31 . CHRDL1 is believed to influence adipogenic differentiation in human isolated preadipocytes 32 . Comparative gene expression analyses suggest relatively greater CHRDL1 expression in subcutaneous versus visceral fat 32,33 . In our analyses, the chrXq23 lipid-lowering alleles were associated with an increase in abdominal subcutaneous adipose tissue but a decrease in visceral adipose tissue. A proteomic discovery analysis showed that increasing circulating CHRDL1 concentrations were associated with increased birth weight but decreased triglycerides and homeostatic model assessment of insulin resistance 34 .
Third, chrXq23 lipid-lowering alleles have favorable cardiometabolic effects that appear to reduce risk for CHD and T2D. Our results for CHD at chrXq23 are consistent with prior work at this locus 12 , and extend to prior observational epidemiology, genetic, functional, and clinical trial evidence implying a causal relationship between reduced LDL-C and reduced CHD risk 35 . In aggregate, autosomal LDL-C-reducing alleles are associated with increased T2D risk but the effects are inconsistent across individual variants 7,36,37 . In meta-analyses of randomized controlled trials, statins are associated with a modestly increased risk of  38,39 . The effects of triglyceride-lowering alleles and T2D risk are generally inconsistent [40][41][42] . Common triglyceridelowering variants at LPL and ANGPTL4 p.E40K in the lipoprotein lipase pathway are associated with reduced triglyceride concentrations, CHD odds, and T2D risk 40,43,44 . Rare loss-of-function variants in ANGPTL3, also implicated in the lipoprotein lipase pathway, are associated with reduced LDL-C and triglyceride concentrations as well as reduced CHD odds but favorable effects on T2D risk have not yet been observed 40,45,46 . However, we uniquely describe a genetic locus associated with reduced LDL-C concentrations, reduced triglyceride concentrations, lower CHD risk, and lower T2D risk. Similarly, we observed that lipidlowering chrXq23 alleles were independently associated with increased LDL-C/apoB ratio, which has been independently associated with reduced CHD and T2D risk in observational epidemiologic studies [47][48][49][50] . These data imply that therapeutic modulation of the causal pathway may lead diverse favorable cardiometabolic indices. Whether implicated variants influence the lipoprotein lipase pathway or represent a novel lipid-related pathway for combined CHD and T2D should be addressed by future research. Fourth, common lipid-associated variants linked to increased CHRDL1 expression are not associated with visual acuity measures. Ventropin, the product of CHRDL1, was first described as a bone morphogenic protein 4 inhibitor and a regulator of retinal development 51 . Pathogenic disruptive variants in CHRDL1 are implied in X-linked megalocornea 52 . However, common lipidassociated variants linked to increased CHRDL1 expression in the present study do not associate with measures of visual acuity in the UK Biobank. These data imply that therapeutic modulation to recapitulate the protective effects associated with chrXq23 lipidlowering alleles is not anticipated to lead to on-target adverse visual acuity effects.
Important limitations should be considered in the interpretation of our findings. First, our genetic association analyses of the X chromosome do not account for random X inactivation. Accounting for random X inactivation is expected to modestly improve power and thus our approach biases our findings toward the null 53,54 . We found that there was slightly higher variance of total cholesterol in heterozygous females (sd = 44.1 mg/dl) compared to homozygous females (homozygous ref sd = 43.0, homozygous alt sd = 43.9) of rs5942634 using the Levene's Test for Homogeneity of variance (P = 0.002), indicating this locus may be subject to random X inactivation. Second, while our in silico analyses and prior literature strongly implicate CHRDL1 as the causal gene, additional analyses including perturbational experiments in model systems are necessary to confirm our hypotheses. Furthermore, whether additional cis-acting or transacting gene expression for other genes are additionally relevant for the observed effects on blood lipids are currently unknown. Third, chrXq23 lipid-lowering alleles were associated with both increased truncal and gluteofemoral adiposity indices, and whether the former associations result in adverse clinical consequences is not known. Nevertheless, phenome-wide association analyses did not reveal concerning clinical phenotype associations related to modest increases in BMI. Notably, chrXq23 lipid-lowering alleles were associated with decreased visceral adipose tissue.
In conclusion, we observe a consistent association of chrXq23 alleles with reduced total cholesterol, LDL-C, triglycerides, CHD, and T2D. Despite an increase in BMI, these alleles were favorably associated with increased gluteofemoral and abdominal subcutaneous adiposity, decreased visceral adiposity, and increased LDL-C/apolipoprotein B ratio. Colocalization analyses strongly implicate increased CHRDL1 expression in adipose tissues with these favorable cardiometabolic indices, pointing to CHRDL1 as the leading candidate gene in the region.

Methods
Study participants. For discovery, 65,322 individuals from 21 studies in the freeze 8 release of the NHLBI TOPMed Program with WGS passing central quality control by the TOPMed Informatics Research Core and blood lipid data available were included for analysis ( Supplementary Fig. 1 The UK Biobank is a large, prospective cohort of~500,000 United Kingdom residents aged 40-69 years 25,55 . Patients provided answers to questions regarding socio-demographic, lifestyle, and health-related factors; additionally, participants provided blood, urine, and saliva for genetic and other future assays. Genotyping was performed on a custom Affymetrix array followed by imputation. Various additional measurements were performed on all recruited participants (e.g., electrocardiography, etc) and some measurements in subsets (e.g., cardiac magnetic resonance imaging, etc). Study participants provided consent per the UK Biobank's IRB approved protocol. We excluded UK Biobank individuals that met the following criteria: (1) Individuals whose submitted gender is not same as inferred gender; (2) Individuals with putative sex chromosome aneuploidy; (3) Individuals with second degree or higher degree relatives; and (4) Individuals who withdrawn consent. We analyzed individuals who were British white separately from those that were not considered British White. These data were secondarily analyzed through a protocol approved by the Partners Healthcare IRB.
The effects of lipid-associated X chromosome variants on CHD risk were estimated in 69,635 participants of HUNT, 390,606 British White participants of UK Biobank, and 176,899 participants of FinnGen. The effects on risk of diabetes mellitus were estimated in 69,635 participants of HUNT, 390,606 participants of UK Biobank, and 171,087 participants of FinnGen. FinnGen (https://www.finngen. fi/en) is a large biobank study that aims to genotype 500,000 Finns on a FinnGen ThermoFisher Axiom custom array, with the current data freeze comprising 181,820 Finnish individuals 27 . FinnGen includes prospective epidemiological and disease-based cohorts, and hospital biobank samples. The data were linked by the unique national personal identification numbers to national hospital discharge (available from 1968), death (1969-), cancer (1953-), and medication reimbursement (1995-) registries and disease endpoints were defined by harmonizing the International Classification of Diseases (ICD) revisions 8, 9, and 10, cancer-specific ICD-O-3 and ATC-codes. The FinnGen project is approved by Finnish Institute for Health and Welfare (THL).
Sequencing, genotyping, and quality control of TOPmed. Whole-genome sequencing of at least 30× was performed across six sequencing centers using PCRfree library preparation kits for TOPMed samples 24 . In most cases, all samples for a given study were sequenced at the same center. Samples were excluded if estimated contamination by verifyBamId was >3%{Jun, 2012 #278} or <95% of the genome attained >10× coverage. The reads were centrally realigned at the TOPMed Informatics Research Center (IRC) to human genome build GRCh38 at each center using BWA-MEM 56,57 . Joint variant discovery was subsequently performed with the 'GotCloud' pipeline by the IRC 58 . The variant calling software tools are under active development; updated versions can be accessed at http://github.com/atks/vt, http://github.com/hyunminkang/apigenome, and https://github.com/statgen/ topmed_variant_calling. Using sequencing quality metrics, a catalogue of previously discovered variants, and variants with Mendelian inconsistencies from included families, variant-level quality controlled was performed using a support vector machine algorithm. Sample-level quality control was performed by the TOPMed IRC and Data Coordinating Center (DCC) to remove samples genotypic/ reported inconsistencies for pedigree and sex, and substantial discordance with prior genome-wide array genotyping. Only variants and samples that passed quality control were included in the call set.
One individual from duplicate pairs identified by the DCC was removed, retaining the individual with lipid levels available when one did not have lipid levels. If both individuals had lipid levels, one individual was randomly selected. Individuals were excluded when their genotype determine sex did not match phenotype reported sex (n = 6) and individuals <18 years old where excluded (n = 865).
Blood lipid measurements and phenotypic modeling. Conventionally measured fasting blood lipids, including total cholesterol, LDL-C, HDL-C, and triglycerides, were included for analysis. Harmonization of the lipid values, lipid-lowering medication status, and fasting status at lipid blood draw was performed by the TOPMed Data Coordinating Center. LDL-C was either calculated by the Friedewald equation when triglycerides were <400 mg/dl or directly measured. Given the average effect of lipid-lowering medicines, when lipid-lowering medicines (largely statins) were present, total cholesterol was adjusted by dividing by 0.8 and LDL-C by dividing by 0.7, as previously done 2 . Triglycerides were natural log transformed for analysis. Standard deviation scaled inverse-normalized residuals adjusting for age, age 2 , sex, the first 11 PCs of ancestry (as recommended by the TOPMed DCC), as well as cohort-specific covariates (study site or known founder mutations), where created within each study by self-reported race. Effect sizes are reported in mg/dl or log(mg/dl) for TG. In FinnGen, CHD cases were defined as subjects with either an underlying or direct cause of death with ICD codes I20-I25, I46, R96 or R98 (ICD10) or 410-414 or 798 (ICD9/8), a hospital discharge diagnosis with ICD codes I200, I21-I22 (ICD10) or 410, 4110 (ICD9/8) and/or a coronary revascularization procedure (coronary artery bypass surgery procedure or coronary angioplasty, or an entry of invasive cardiac procedures in the country-wide register. Type 2 diabetes was defined as subjects with an underlying or direct cause of death or as the main or side diagnosis at hospital discharge with ICD codes E11 (ICD10)/250(0-9)A (ICD9) with ICD codes E11 (ICD10); 250(0-9)A (ICD-8/9) or at least three prescription medicine purchases with ATC class A10B, or as the specially reimbursed medication for diabetes. Cases with both type 2 and type 1 diabetes mellitus were excluded. In these definitions, the ICD10, ICD9 and ICD-8 below refer to the Finnish versions of the ICD codes.
UK Biobank phenotype ascertainment. Our approach to phenome-wide association analyses are similar to prior efforts 59,60 . Briefly, a phenome-wide association analysis was performed to evaluate the associations of X chromosome lipidassociated variants with a broad range of clinical phenotypes 61 . A total of 80 manually curated traits were classified according to a combination of self-report and billing codes, except for the following which were based on corresponding UK Biobank data fields: death (40000), ever smoked (20610), BMI (23104), and percentage body fat (23099) (Supplementary Table 13). Lipid-associated variants were associated with each trait, using linear regression for continuous traits and logistic regression for dichotomous traits, adjusting for age, sex, array type, and the first five PCs in the model.
For adiposity analyses, body composition values were obtained using bioelectrical impedance measurement (Category 100009) with a Tanita BC418MA body composition analyzer. Separate readings for fat percentage, mass, and free mass as well as predicted muscle mass are generated for the whole body, trunk, each leg, and each arm. In linear regression analyses, lipid-associated variants were associated with fat mass (in kilograms) for each of the aforementioned components adjusting for age, sex, and the first five PCs in the model. We also separately analyzed the association of lipid-associated variants with abdominal subcutaneous adipose fat (22408) and visceral adipose fat (22407) among unrelated UK Biobank participants with abdominal MRI measures available 62 . Each of these phenotypes was natural log-transformed and inverse rank standardized; linear regression models were then adjusted for age, sex, array type, and the first 11 PCs.
For visual acuity analyses, we included data where baseline visual acuity was available for at least one eye and genotyping data was available in the UKBB 63 . Methods from visual acuity assessment in UKBB were previously described. Briefly, visual acuity was measured using the logarithm of the minimum angle of resolution ("LogMAR") chart (Precision Vision, LaSalle, Illinois, USA) at a distance of four meters. Using one eye at a time, participants were tasked with identifying the five letters displayed at the top row; they proceeded to read letters from successive rows, which had progressively smaller text. The test was terminated when two or more of the five letters on a given row were read incorrectly. Visual acuity was computed based on the number of successfully read rows; the visual acuity result was provided by the UK Biobank as Field ID 5208 (left eye) and Field ID 5201 (right eye). Data from the right eye was available in 60,421 women and 51,245 men, while data from the left eye was available in 60,402 women and 51,280 men. Linear mixed models from the lme4 package in R (v3.6.1) were used to estimate the association if each of the three variants with visual acuity, adjusting for age, sex, the first five PCs in the model, as well as a random effect accounting for which eye was tested (left or right).
Single-variant association analyses. For discovery, each single variant on the X chromosome with at least 20 copies of the minor allele was analyzed for association with each adjusted blood lipid residual across all TOPMed samples with lipid levels available (see Blood lipids measurements and phenotypic modeling) using a fast linear mixed model with kinship adjustment (SAIGE-QT, version 0.29.4.4 64 ) since a large proportion of TOPMed participants are related in ENCORE (https://encore. sph.umich.edu) additionally adjusting for the first 11 PCs in the model. SAIGE-QT was specifically used to maximize computational efficiency given hosting and kinship precomputation by the TOPMed Informatics Research Core. Heterozygous and homozygous women were coded as having 1 or 2 nonreference alleles, respectively. Hemizygous males were coded as having two reference alleles. This modeling is consistent with random X inactivation of one of two X chromosomes in females yet consistent expression of the single X chromosome in males.
For SNPs with suggestive evidence of association in TOPMed (P < 1 × 10 −6 ), we sought replication in UKBB. For replication in UKBB unrelated individuals, we performed linear regression associations using R version 3.6.0. Covariates included age, age 2 , sex, the first 10 PCs in the model, and genotyping array.
For replication in HUNT, a cohort within a founder population, plasma lipids were analyzed using efficient linear mixed models implemented by BOLT-LMM v2.3.1 65 with covariates for sex, age, age 2 , batch, and principle components 1-4. CAD was analyzed using SAIGE with birth year, batch, sex, and principle components 1-4 as covariates.
Covariates included in the models of association for each contributing study were based on study characteristics and recommendations from study investigators. We took loci reaching a suggestive association with lipid levels (P < 1 × 10 −6 ) in TOPMed onto replication in UKBB and HUNT. For replication, we used a significance level of 0.002 (Bonferroni correction for 21 loci) that met a suggestive level of association in TOPMed. We used a fixed effects meta-analysis to combine the association results from TOPMed, UKBB, and HUNT. We set the statistical significance for our meta-analysis to be alpha = 5.7 × 10 −9 (0.05/2.2 M variants/4 traits = 5.7 × 10 −9 ), which is more stringent than a standard genomewide significance threshold of 5 × 10 −8 . Heterogeneity of effect sizes in the metaanalysis was determined through Cochran's Q and I 2 is reported. Additionally, we tested for the interaction between rs5985504-T and sex on log triglycerides adjusting for the same covariates as the main analysis.
To determine the correlation of the effect sizes of variants on total cholesterol in the chrXq23 locus and the effect sizes of these variants on CAD, T2D, and BMI, we performed analysis of chrXq23 variation on these three outcomes adjusting for age, age2, sex, genotyping array, and PCs in the UK Biobank, using the main effects model assuming X inactivation. We correlated effect sizes of total cholesterol-variants with effect sizes of variants on CAD, T2D, or BMI limiting to total cholesterol-variants with a MAF > 0.05 and P < 0.05.
Expression quantitative trait analyses. We downloaded the v7 SNP gene association results in tissue-specific files from the GTEx portal (https://gtexportal.org/ home/datasets). We limited to six tissues that have been implicated in lipid biology or CHD (Adipose_Subcutaneous, Small_Intestine_Terminal_Ileum, Adipose_-Visceral_Omentum, Whole_Blood, Liver, Muscle_Skeletal) and looked at expression of eight genes within the ChrXq23 region (ACSL4, TMEM164, AMMECR1, RTL9, CHRDL1, PAK3, CAPN6, DCX). We set our significance threshold to 0.001 (0.05/[6 tissues × 8 genes]). First, we determined eQTLs of our top association with lipids. Second, we performed correlation of our lipid-variant associations with the association of each of the eight genes expression on the variants using the gene transcripts ± 100 KB. Lastly, we used the lmekin function in the R package kinship2 to run linear mixed effects models and predict the lipid-variant test statistic (Z = beta/SE) from the expression-variant test statistic to adjust for the correlation between the variants. Lipid subfractions association analyses. Concentrations of lipoprotein particles were measured at LipoScience, Inc. (Raleigh, NC) using NMR spectroscopy on plasma EDTA specimens. LipoScience has developed validated software for analysis of NMR measured LipoProfile spectra that uses an optimized deconvolution algorithm to quantify lipoprotein subspecies 66,67 . MESA was measured with the LipoProfile-III assay while FHS samples were measured with the LipoProfile-I assay, which provides less accuracy for some measurements but is similar to LipoProtein-III. We associated lipoprotein profiles with top associated SNPs within up to 1,802 FHS and 4551 MESA participants adjusting for age, sex, and lipidlowering therapy.
For individuals who participated in ARIC study visit 4 (1996-1998), a homogeneous assay method was used for the direct measurement of sdLDL-C in plasma (sd-LDL-EX "Seiken", Denka Seiken, Tokyo, Japan) on a Hitachi 917 automated chemistry analyzer 68 . We associated top associated SNPs with ARIC participants adjusting for age, sex, lipid-lowering therapy, race, study center, and the first 11 principal components of ancestry.
Rare variant association analyses. We performed the SKAT test to associate aggregates of rare coding variants with blood lipid levels within TOPMed as implemented by GENESIS v2.14.4 in the CHARGE Analysis Commons 69,70 . For this gene-based test, high confidence loss-of-function (HC LOF by LOFTEE 71 ) and missense metaSVM 72 damaging variants with MAF < 1% were collapsed into regions based on the gene annotations generated by snpEff 4.3t (http://snpeff. sourceforge.net/) using the GRCh38.86 database 73 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Controlled access of the individual-level TOPMed data is available through dbGaP, and the individual-level UK Biobank data are available upon application to the UK Biobank (https://www.ukbiobank.ac.uk/). FinnGen summary-level data are fully freely available at https://www.finngen.fi/en/access_results. Individual-level access to FinnGen and HUNT cohorts may be obtained through reasonable request and suitable institutional review board approvals. The dbGaP accessions for TOPMed cohorts are as follows: Atherosclerosis Risk in Communities (ARIC) phs001211 and phs000280; Old Order Amish phs000956 and phs000391; Mt Sinai BioMe Biobank phs001644 and phs000925; Coronary Artery Risk Development in Young Adults (CARDIA) phs001612 and phs000285; Cleveland Family Study (CFS) phs000954 and phs000284; Cardiovascular Health Study (CHS) phs001368; Diabetes Heart Study (DHS) phs001412 and phs001012; Framingham Heart Study (FHS) phs000974 and phs000007; Genetic Epidemiology Network of Arteriopathy (GENOA) phs001345 and phs001238; Genetics of Lipid-Lowering Drugs and Diet Network (GOLDN) phs001359 and phs000741; Genetic Epidemiology Network of Salt Sensitivity (GenSalt) phs001217 and phs000784; Genetic Studies of Atherosclerosis Risk (GeneSTAR) phs001218 and phs000375; Hispanic Community Health Study-Study of Latinos (HCHS/SOL) phs001395 and phs000810; Hypertension Genetic Epidemiology Network and Genetic Epidemiology Network of Arteriopathy (HyperGEN) phs001293; Jackson Heart Study (JHS) phs000964 and phs000286; Multi-Ethnic Study of Atherosclerosis (MESA) phs001416 and phs000209; Massachusetts General Hospital Atrial Fibrillation Study (MGH_AF) phs001062 and phs001001; San Antonio Family Study (SAFS) phs001215 and phs000462; Samoan Adiposity Study phs000972 and phs000914; Taiwan Study of Hypertension using Rare Variants (THRV) phs001387; Women's Health Initiative (WHI) phs001237 and phs000200. Source data are provided with this paper.