## Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel single-stranded RNA virus of the Coronaviridae family. This recently emerged virus is the cause of a pandemic infection that can result in severe life-threatening disease (coronavirus disease-2019; COVID-19) [1, 2]. Similar to SARS-CoV (which caused SARS), SARS-CoV-2 entry into target host cells is mediated through binding of the viral spike glycoprotein to angiotensin-converting enzyme 2 (ACE2) on the cell surface [3,4,5,6]. The interaction between coronaviral spike proteins and their attachment receptor ACE2 is believed to be key for viral transmissibility and dissemination of infection to organs and tissues [7]. In addition, the host cell serine protease TMPRSS2 cleaves the spike protein of SARS-CoV-2 to allow fusion of the viral and host cell membranes, which is an essential step in viral entry [3, 8]. Other molecules have been proposed to play alternative roles in the SARS-CoV-2 entry mechanisms. It has been suggested that cathepsin B and L may cleave the spike protein in the absence of TMPRSS2 [3]. The results of other studies proposed that SARS-CoV-2 entry is primarily mediated through endocytosis and that PIKfyve, TPC2, and cathepsin L but not cathepsin B are essential for viral entry [5, 8]. In addition, it has been described that furin may have an effect along with TMPRSS2 and cathepsins on activating viral entry [8]. However, a previous study revealed inconclusive results since furin preactivation enhanced or reduced SARS-CoV-2 pseudovirus entry in different cell types [6, 8]. These data suggest that SARS-CoV-2 may use alternative mechanisms to enter the cells but the role of these mechanisms and molecules involved is less clear and needs further investigation.

To gain insight into genetic determinants of SARS-CoV-2 transmissibility and potential viremia and disseminated infection, we devised a quantitative measure to assess the cumulative effect of genetic variants upon the expression of the two key molecules involved in SARS-CoV-2 viral entry, ACE2 and TMPRSS2, and then applied this measure to 2504 individuals from 5 different populations around the world.

## Results and discussion

We calculated a cumulative genetic expression score (GES) for ACE2 and TMPRSS2, as a measure to estimate host genetic determinants of viral entry of SARS-CoV-2. We evaluated this measure in 2504 individuals across the 5 major populations included in the 1000 Genomes project (African, Admixed American, European, East Asian, and South Asian) [9]. Because ACE2 is located on the X-chromosome, we analyzed female and male individuals separately. There was a significant difference in the cumulative GES of ACE2 between populations (ANOVA P < 0.0001 for both male and female groups). Genetic determinants of highest expression of ACE2 were observed in South Asian and East Asian populations, while African populations were genetically associated with the lowest ACE2 expression levels (Fig. 1).

Similarly, significant differences for TMPRSS2 were observed in both female and male groups across populations (ANOVA P < 0.0001). East Asian populations had the highest values for genetic determinants of TMPRSS2 expression, and Africans showed genetic predisposition for the lowest TMPRSS2 expression levels across populations (Fig. 1).

As mentioned earlier, ACE2 is located on the X-chromosome (and not on the pseudoautosomal region). Therefore, female individuals will have two copies of the gene while males will only have one. Normally, X-chromosome genes are subject to random X-chromosome inactivation, silencing one gene copy in females to keep gene expression balance between females and males. However, a number of X-chromosome genes, including ACE2, are known to escape X-chromosome inactivation [10]. We did not observe differences in genetically determined ACE2 expression between male and female individuals. Previous reports showed higher expression of ACE2 in male compared to female tissues, which was predominantly attributed to non-genetic factors, consistent with our findings [10, 11]. No difference between male and female individuals for TMPRSS2 was observed in our study.

These data suggest that genetic determinants of ACE2 and TMPRSS2 expression might play a role in the variability of transmission and severity of SARS-CoV-2 between populations. African populations showed a genetic predisposition for lower expression levels of both ACE2 and TMPRSS2, which are vital for SARS-CoV-2 entry into host cells. These data suggest that a genetic component might contribute to lower numbers of reported COVID-19 cases in Africa. However, it remains likely that non-genetic factors such as age and comorbidities might play a more important role than host genetic elements, especially in determining disease severity and outcome in infected individuals. In addition, genome-wide association studies will be needed to characterize genetic susceptibility to a more severe disease course in patients infected with COVID-19. Additional studies to replicate and extend our findings and examine expression levels of ACE2 and TMPRSS2 in different cell types across populations and in patients infected with COVID-19 are warranted.

## Methods

We devised a cumulative GES to estimate genetically determined expression of ACE2 and TMPRSS2. We used expression quantitative trait loci (eQTL) data for ACE2 and TMPRSS2 in tissues included in the Genotype-Tissue Expression project (GTEx, release V8) [12]. All eQTL variants affecting ACE2 and TMPRSS2 expression across all cell types and tissues were identified, and then pruned to remove variants in linkage disequilibrium (LD). LD pruning was performed using PLINK v.1.9 and the combined haplotypes of the 1000 Genomes Project populations [13]. For variants that demonstrate eQTL effects in multiple tissues, the most significant eQTL normalized effect size value was used. The genetic variants used to calculate GES values for ACE2 and TMPRSS2 are shown in Supplementary Dataset 1. A total of 21 genetic polymorphisms that affect ACE2 expression, and 14 that affect TMPRSS2 expression were identified and used in subsequent analyses. The cumulative GES was derived using the formula: $${\mathrm{GES = }}\mathop {\sum}\nolimits_{{\mathrm{i = 1}}}^x {\left( {n_i \times {\mathrm{NES}}_i} \right)}$$, where n is the number of alternative alleles (0, 1, or 2), x is the number of evaluated variants in ACE2 and TMPRSS2, and NES is the normalized effect size which reflects the expression in the alternative allele relative to the reference allele for each variant. We calculated the GESs for ACE2 and TMPRSS2 in a total of 2504 individuals from the five major populations included in the 1000 Genomes Project phase 3 release: African, n = 661; Admixed American, n = 347; East Asian, n = 504; European, n = 503; and South Asian, n = 489 [9]. To determine if the cumulative GES was different across populations, one-way ANOVA following by Tukey’s multiple comparison test was performed using GraphPad Prism version 8.1.1 (GraphPad Software, La Jolla California USA). ANOVA P values < 0.05 and Tukey’s adjusted P values < 0.05 were considered significant.