DNA methylation architecture of the ACE2 gene in nasal cells of children

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has led to the global coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 enters cells via angiotensin-Converting Enzyme 2 (ACE2) receptors, highly expressed in nasal epithelium with parallel high infectivity.1,2 The nasal epigenome is in direct contact with the environment and could explain COVID-19 disparities by reflecting social and environmental influences on ACE2 regulation. We collected nasal swabs from anterior nares of 547 children, measured DNA methylation (DNAm), and tested differences at 15 ACE2 CpGs by sex, age, race/ethnicity and epigenetic age. ACE2 CpGs were differentially methylated by sex with 12 sites having lower DNAm (mean = 12.71%) and 3 sites greater DNAm (mean = 1.45%) among females relative to males. We observed differential DNAm at 5 CpGs for Hispanic females (mean absolute difference = 3.22%) and lower DNAm at 8 CpGs for Black males (mean absolute difference = 1.33%), relative to white participants. Longer DNAm telomere length was associated with greater ACE2 DNAm at 11 and 13 CpGs among males (mean absolute difference = 7.86%) and females (mean absolute difference = 8.21%), respectively. Nasal ACE2 DNAm differences could contribute to our understanding COVID-19 severity and disparities reflecting upstream environmental and social influences. Findings need to be confirmed among adults and patients with risk factors for COVID-19 severity.

We used the Illumina EPIC array that estimates DNAm at 16 CpG sites annotated to the ACE2 gene. These sites cover regions between 1,500-200 and 200-0 bases of the start site, exon, body and 3′ UTR regions of the ACE2 gene. Six CpGs were annotated to a Transcription Factor Binding Site (TFBS) with relatively high evidence from 4 to 6 experiments extracted from Encyclopedia of DNA Elements (ENCODE) supporting transcription binding at that region. After excluding one CpG due to low detection performance, DNAm levels of the other 15 CpGs ranged from 2.53% to 95.47% and 14 sites had DNAm levels above 60%, all with robust significant differences by sex (FDR < 0.003), Table 1 and Fig. 1A. After adjusting for age, 12 of the 15 CpGs in the ACE2 gene had significantly lower DNAm levels in females relative to males ranging from -31.56% at cg04013915 to − 0.45% at cg18877734, Table 1. Three CpGs (cg18458833, cg08559914 and cg05039749) showed greater DNAm in females relative to males; these were also the only sites negatively correlated with DNAm of other CpGs. Most CpGs were strongly positively correlated (r s > 0.5) except for cg05039749, annotated to the body of the gene towards the 3′ UTR, showing moderate negative correlations with all other sites, Fig. 1B. Given large differences by sex and ACE2 escaping X-inactivation, we analyzed age associations stratified by sex and observed that chronological age was associated with consistently lower but weak DNAm among females and only 4 CpGs reached unadjusted significance (p < 0.05); associations did not survive false discovery adjustment. No associations with age were observed among males, Table S2. However, power was limited by the narrow age range. Using an external data of 72 children aged 10-12 years of nasal DNA measured with the 450 K Illumina array by Yang and colleagues 20 we observed consistent strong differences by sex at cg18458833; cg21598868; cg08559914; cg20119767 and cg23232263. However, two sites had significant differences by sex (cg18877734 and cg05039749) in the opposite direction compared to our data. Of note is that these two CpGs showed the weakest differences by sex in our data, Table 1. This cohort was predominately Black not allowing us to test differences by race/ethnicity and half of the children had asthma which could have influenced the results at the two sites.
Hispanic females had significantly greater DNAm of 6.63% at cg03536816, 1.12% at cg08559914, 4.44% at cg16734967 and 2.21% at cg23232263 while cg10408040 showed a 1.69% lower DNAm relative to white females (FDR < 0.05), Table 2. A local ACE2 gene Manhattan plot for the association with Hispanic females with genomic annotation and correlation structure is shown in Fig. 1C. Among Black males, significantly lower DNAm at 8 CpG sites was present relative to white males. Specifically, lower DNAm of 0.75% at cg18458833, 0.77% at cg18877734, 1.76% at cg08559914, 3.44% at cg16734967, 1.36% at cg05241917, 1.30% at cg20119767, 0.64% at cg25176872 and 0.60% at cg23232263 for Black relative to white males was observed, adjusting for age (FDR < 0.05), Table 2. A local ACE2 gene Manhattan plot for the association among Black males with genomic Table 1. Genomic annotation of probes located within the Illumina EPIC array and nasal DNA methylation descriptive statistics for the angiotensin-converting enzyme (ACE2) gene located in the X-chromosome and sex differences. † Differences in DNAm for females relative to males adjusted by age from a multivariate regression models all results were significant at FDR < 0.003. www.nature.com/scientificreports/ www.nature.com/scientificreports/ annotation and correlation structure is presented in Fig. 1D. No differences were observed among Asian or participants of more than one race, Table S3. DNAm telomere length (DNAmTL) and the Skin and Blood Epigenetic clock were chosen as markers of epigenetic aging as they performed relatively well against chronological age with significant correlations of r = − 0.11 and r = 0.40, Figure S1. Longer DNAm telomere length adjusted for age (DNAmTLadjAge) was consistently and strongly associated with greater ACE2 DNAm for males with a mean increase of 7.86% increased across 11 CpGs. For females a mean of 8.21% greater DNAm across 13 CpGs was observed. Weaker positive relationships were observed for 5 CpGs with epigenetic age acceleration of the skin and blood clock for females, Table 3.
Hierarchical clustering of the DNAm of ACE2 CpGs confirmed the large influence of sex across all DNAm levels and a relatively weak influence of age and race/ethnicity, Figure S2.

Discussion
In summary, all ACE2 CpGs tested differed strongly and significantly by sex with most sites showing a decrease in DNAm among females relative to males. In sex stratified analyses, moderate greater DNAm for Hispanic females and consistently lower DNAm among Black males relative to white participants was observed. Longer telomere length adjusted for age was strongly associated with greater ACE2 DNAm and this was consistent among males and females across most CpGs. Our findings need to be confirmed among adults with a wider age range and associations should be tested for known risk factors of COVID-19 severity. The ACE2 gene is located in the X-chromosome and escapes X-inactivation in females with potential tissue specificity and heterogenous male expression bias, partially attributed to sex steroids 21 . This heterogeneity has been documented in data from an RNAseq Atlas identifying co-expression of both ACE2 and the accessory proteases among subsets of respiratory epithelial cells as putative targets of SARS-CoV-2 infection. Specifically, nasal goblet cells and ciliated cells comprised the highest fraction of cells expressing both ACE2 and the viral entry associated protease TMPRSS2 2,22 . The relatively high ACE2 expression in the nasal epithelium its coupled with high infectivity in nasal epithelium with progressively lower expression and infectivity in the bronchial and alveolar regions 1,23 . Increased ACE2 expression has been hypothesized to partially explain differences in COVID-19 severity 24 . For example, lung transcriptome data from patients with comorbid conditions associated with increased risk of severe COVID-19 like hypertension, diabetes, and chronic obstructive lung disease show increased ACE2 expression 25 .
ACE2 expression has been reported to be higher in Asians compared with white and African Americans from type II alveolar cells 26 . However, other reports have found no difference in ACE2 expression by race 27 . Data from lung tissue has shown lower DNAm in females relative to males at cg23232263 and cg16734967 consistent with our finding for females using nasal cells 28 . In addition, a strong negative association between chronological age and DNAm of cg08559914 from airway epithelial cells was reported. This was not confirmed in our data as associations with age were not observed in our sample. However, expression of ACE2 in the lung has been demonstrated to increase with age 27 . In nasal samples, lower ACE2 expression among children relative to older adults has been reported 29 . Age associated expression differences are hypothesized to partially explain the relatively mild COVID-19 documented cases in children. Yet, large racial inequalities in need for hospitalization among children have been reported. Data from the U.S. shows that hospitalizations for Hispanic and Black children are much more common relative to white children 30 . Our results support the hypothesis that ACE2 hypomethylation in nasal epithelium among black males could lead to increased SARS-CoV-2 infectivity and COVID-19 severity via greater abundance of ACE2 receptors. Results among females need to be cautiously interpreted given ACE2 escape from X-inactivation. ACE2 genetic variants are rare, not shown to differ by sex or populations 31,32 . Therefore, we hypothesize that observed DNA methylation variation originates from social and environmental exposures which include social and environmental racism.
Our study has some limitations. We do not have ACE2 expression which would inform the underlying hypothesis of epigenetic regulation. We hypothesize that the relationship between DNAm, expression and ACE2 receptors is likely complex and depends on the genomic context, cells and life-stage. The absence of age-related DNAm changes could be due to the narrow age range and young participant profile of our study. However, measures of epigenetic age acceleration might be more sensitive to age-related physiological changes among children as reflected in strong associations with DNAm telomere length. Young children have lower rates of COVID-19 infection compared to adults 33,34 . Therefore, the generalizability among adults and implication related to COVID-19 severity need to be further investigated. Study designs that incorporate functional genomics with known risk factors such as co-morbid conditions among adults and patients with varying degrees of COVID-19 severity could yield important insight among this population for which the risk of severe disease is greater 34 . Additionally the systematic investigation beyond nasal tissue in leukocytes across diverse cohorts could yield insights into severity and infection risk. Although most participants in our cohort were white, the relatively large sample allowed us to test differences stratifying by sex and test differences by race/ethnicity. Additionally, while the magnitude of DNA methylation differences by sex were moderate, smaller differences were observed by race/ ethnicity. Small to moderate differences in DNA methylation have been shown to be functionally relevant 35 . Even if absolute changes are small, they must be interpreted relative to mean levels. The accuracy of the array might vary by probe and experiment. However, the EPIC array has been demonstrated to be highly reproducible across technical and biological replicates across probes 36 .
The epigenome is at the interface of the environment and genomic regulation. Nasal cells which are directly in contact with the environment have been shown to play a key role in SARS-CoV-2 infection. Our results demonstrate that ACE2 nasal DNAm reflects differences by sex, race/ethnicity and biological aging. The nasal epigenetic architecture of the ACE2 gene could contribute to our understanding of COVID-19 and environmental disparities. These findings need to be confirmed in adults and patients with risk factors for severe COVID-19.

Materials and Methods
Study Population. We included participants from Project Viva, a prospective pre-birth cohort from eastern Massachusetts recruited between 1999 and 2002 during the initial prenatal visit at Atrius Harvard Vanguard Medical Associates 37 . Eligibility criteria for cohort recruitment included fluency in English, gestational age < 22 weeks at the first prenatal visit, and singleton pregnancy. At the early childhood visit, we asked the mother to choose one or more racial/ethnic groups for her child. We created five racial/ethnic groups based on the mother's response including white, Black, Hispanic, Asian, and other (> 1 race or other). We used mother's race to fill in missing child race values. Of the total 2128 live births, 547 children were re-contacted during an early-teen in-person visit at a mean age 12.9 years (range; 11.8 to 15.4 y) and provided consent for nasal swab sample collection and epigenetic studies. Mothers provided written informed consent at recruitment and at post- www.nature.com/scientificreports/ partum follow-up visits. Written informed consent was obtained from parent and/or legal guardian of minors and children provided written assent. Institutional Review Board of Harvard Pilgrim Health Care reviewed and approved all study protocols. All methods were carried out in accordance with relevant guidelines and regulations stipulated by the institutional review board.
Nasal DNA methylation analyses. Trained research staff collected nasal swabs from the anterior nares, previously shown to yield respiratory epithelial cells 38 . Nasal swabs were immediately stored in lysis buffer and frozen until processing. We isolated DNA using the Maxwell 16 Buccal Swab LEV DNA Purification Kit following the manufacturer's instructions (Promega, Madison, WI, USA). After DNA extraction from the nasal samples we measured DNA methylation with the Infinium MethylationEPIC BeadChip. All DNAm data were imported into the R statistical software and samples were excluded based quality control check as previously described 39 . We preprocessed all of the DNA methylation data using functional normalization to adjust for technical variability 40 and adjusted for probe-type variability using RCP 41 43 and the "skin and blood epigenetic clock" 44 . We evaluated their performance with scatterplots and Pearson's correlation coefficient (r). We modeled the acceleration of these measures resulting from regressing DNAmTL and the skin and blood epigenetic clock on chronological age. A positive value of DNAmTL adjusted for age (DNAmT-LadjAge) would indicate that DNAmTL is longer than expected based on chronological age (thus marker of "younger" cells), while positive residuals from the skin and blood clock would indicate epigenetic age acceleration (thus marker of "older" cells). We modeled the DNAmTLadjAge and epigenetic age acceleration for the skin and blood clock to estimate associations independent of chronological age.

External replication.
We identified an external dataset in the Gene Expression Omnibus (GEO) repository (GSE65205). The data contained 72 samples from mostly African American children (> 90%) with persistent atopic asthma (n = 36) versus healthy controls (n = 36) aged 10 to 12 years with half of the samples from each sex 20 . We extracted data from GEO and preprocessed DNA methylation data in the same fashion with the exception that this study used the older Illumina methylation 450 K array and therefore only 7 CpGs overlapped with the EPIC measurements. We tested associations by sex with the same approach as our data.

Statistical analyses.
We described our study sample using means, standard deviations (SDs), counts and proportions. We extracted the ACE2 gene genomic annotation from Bioconductor using the IlluminaHuman-MethylationEPICanno.ilm10b4.hg19 annotation package and calculated means and SDs of DNAm at the selected CpGs in the overall cohort and stratified by sex. We estimated the overall Spearman correlation structure among CpGs and used boxplots to visualize differences in DNAm by sex. We tested differences in DNAm by sex, age, race/ethnicity, DNAmTLadjAge, and the residuals of the skin and blood clock using robust linear regression models on beta values (0-1) to account for heteroskedasticity or influential points. We estimated differences by sex adjusting for chronological age and estimated differences by age stratified by sex. We estimated differences by race/ethnicity (each other category relative to white, the largest group) adjusted for age and stratified by sex. Differences in epigenetic aging biomarkers were adjusted for race/ethnicity and stratified by sex. We used localized Manhattan plots with the correlation matrix across CpGs to display differences by race/ethnicity using coMET 45 . We report findings as %-DNA methylation differences and adjusted for multiple comparisons for each model independently using a False Discovery Rate of 5% (FDR < 0.05). All statistical tests were two sided and analyses were carried out using R, version 4.0.2 (www.r-proje ct. org/).

Data availability
Datasets generated and analyzed during the current study are not publicly available because we did not obtain consent for such public release of epigenetic data from participants. Raw data to generate figures and tables are available from the corresponding author with the appropriate permission from the Project Viva study team and investigators (project_viva@hphc.org) upon reasonable request and institutional review board approval.