Main

The coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)1. Detection of the virus was first reported in Wuhan2, China and has since spread worldwide, emerging as a global pandemic3.

In symptomatic patients, nasal swabs have yielded higher viral loads than throat swabs4. The same distribution was observed in an asymptomatic patient4, implicating the nasal epithelium as a portal for initial infection and transmission. Cellular entry of coronaviruses depends on the binding of the spike (S) protein to a specific cellular receptor and subsequent S protein priming by cellular proteases. Similarly to SARS-CoV5,6, SARS-CoV-2 employs ACE2 as a receptor for cellular entry. The binding affinity of the S protein and ACE2 was found to be a major determinant of SARS-CoV replication rate and disease severity4,7. Viral entry also depends on TMPRSS2 protease activity and cathepsin B/L activity may be able to substitute for TMPRSS27.

ACE2 and TMPRSS2 have been detected in both nasal and bronchial epithelium by immunohistochemistry8. Gene expression of ACE2 and TMPRSS2 has been reported to occur largely in alveolar epithelial type II cells9,10,11, which are central to SARS-CoV pathogenesis, whereas a different study reported the absence of ACE2 in the upper airway12. To clarify the expression patterns of ACE2 and TMPRSS2, we analyzed their expression and the expression of other genes potentially associated with SARS-CoV-2 pathogenesis at cellular resolution, using single-cell RNA sequencing (scRNA-seq) datasets from healthy donors generated by the Human Cell Atlas consortium and other resources to inform and prioritize the use of precious, limited clinical material that is becoming available from COVID-19 patients.

We investigated gene expression of ACE2 in multiple scRNA-seq datasets from different tissues, including those of the respiratory tree, cornea, retina, esophagus, ileum, colon, heart, skeletal muscle, spleen, liver, placenta/decidua, kidney, testis, pancreas, prostate gland, brain, skin and fetal tissues. We note that studies may lack specific cell types due to their sparsity, the challenges associated with isolation or analysis methodology. Moreover, expression may be under-detected due to technical dropout effects. Thus, while positive (presence) results are highly reliable, absence should be interpreted with care.

ACE2 expression was generally low in all analyzed datasets. Consistently with independent studies10,11, ACE2 was expressed in cells from multiple tissues, including airways, cornea, esophagus, ileum, colon, liver, gallbladder, heart, kidney and testis (Fig. 1a; first column). TMPRSS2 was highly expressed with a broader distribution (Fig. 1a; second column), suggesting that ACE2, rather than TMPRSS2, may be a limiting factor for viral entry at the initial infection stage. Cells from the respiratory tree, cornea, esophagus, ileum, colon, gallbladder and common bile duct expressed both genes in the same cell (Fig. 1a; third column). We also assessed ACE2 and TMPRSS2 expression in developmental datasets from fetal tissues, including liver, thymus, skin, bone marrow, yolk sac and lung, and found little to no expression of ACE2 in all but fetal liver and thymus (Fig. 1a) where there was no co-expression with TMPRSS2 (data not shown) except for a cluster of medullary thymic epithelial cells (Fig. 1a). ACE2 expression is noticeable in certain cell types in placenta/decidua without TMPRSS2 (Fig. 1a). Additional fetal data across relevant tissues and stages are needed to determine the generality of these findings.

Fig. 1: Expression of ACE2 and TMPRSS2 across different tissues and its enrichment in nasal epithelial cells.
figure 1

a, RNA expression of SARS-CoV-2 entry receptor ACE2 (first column), entry-associated protease TMPRSS2 (second column) and their co-expression (third column) from multiple scRNA-seq datasets across different tissues. DC, dendritic cells; mac, macrophages, RBC, red blood cells; TA, transit-amplifying cells; LSC, limbal stem cells; Epi, epithelial cells; IEL, intraepithelial lymphocytes; ILC, innate lymphoid cells; GC, germinal center B cells; MT, mitochondria; LSEC, liver sinusoidal endothelial cells; VSMC, vesicular smooth muscle cells; cTEC, cortical thymic epithelial cells; mTEC, medullary thymic epithelial cells; mcTEC, medullary/cortical thymic epithelial cells; EC, endothelial cells; FB, fibroblasts; SMC, smooth muscle cells; aCM, atrial cardiomyocytes; vCM, ventricular cardiomyocytes; MNP, mononuclear phagocytes; EVT, extravillous trophoblast cells; HB, Hofbauer cells; MAIT, mucosal-associated invariant T cells; MO, monocytes; SCT, syncytiotrophoblast cells; VCT, villous cytotrophoblast cells; dM, decidual macrophages; dP, decidual perivascular cells; dS, decidual stromal cells. Raw expression values were normalized, log transformed and summarized by published cell clustering where available or reproduced clustering annotated using marker genes and cell type nomenclature from the respective studies. The size of the dots indicates the proportion of cells in the respective cell type having greater-than-zero expression of ACE2 (first column), TMPRSS2 (second column) or both (third column), while the color indicates the mean expression of ACE2 (first and third columns) or TMPRSS2 (second column). b, Schematic illustration depicts major anatomical regions in the human respiratory tree demonstrated in this study: nasal, lower airway and lung parenchyma (left). Expression of ACE2 is from airway epithelial cell datasets: Vieira Braga et al.26 (middle) and Deprez et al.27 (right). Datasets were retrieved from existing sources and cell clustering and nomenclature were retained based on the respective studies. For gene expression results in the dot plots, the dot size represents the proportion of cells within the respective cell type expressing the gene and the dot color represents the average gene expression level within the particular cell type.

To further characterize specific epithelial cell types expressing ACE2, we evaluated ACE2 expression within the lung and airway epithelium. We found that, despite a low level of expression overall, ACE2 was expressed in multiple epithelial cell types across the airway, as well as in alveolar epithelial type II cells in the parenchyma, consistently with previous studies9,10,11. Notably, nasal epithelial cells, including two previously described clusters of goblet cells and one cluster of ciliated cells, show the highest expression among all investigated cells in the respiratory tree (Fig. 1b). We confirmed enriched ACE2 expression in nasal epithelial cells in an independent scRNA-seq study that includes nasal brushings and biopsies. The results were consistent; we found the highest expression of ACE2 in nasal secretory cells (equivalent to the two goblet cell clusters in the previous dataset) and ciliated cells (Fig. 1b).

In addition, scRNA-seq data from an in vitro epithelial regeneration system from nasal epithelial cells corroborated the expression of ACE2 in goblet/secretory cells and ciliated cells in air–liquid interface cultures (Extended Data Fig. 1). Notably, the differentiating cells in the air–liquid interface acquire progressively more ACE2 (Extended Data Fig. 1). The results also suggest that this in vitro culture system may be biologically relevant for the study of SARS-CoV-2 pathogenesis.

It is worth noting that TMPRSS2 was only expressed in a subset of ACE2+ cells (Extended Data Fig. 2), suggesting that the virus might use alternative pathways. It was previously shown that SARS-CoV-2 could enter TMPRSS2 cells using cathepsin B/L7. Indeed, other proteases were more promiscuously expressed than TMPRSS2, especially cathepsin B, which was expressed in more than 70–90% of ACE2+ cells (Extended Data Fig. 2). However, while TMPRSS2 activity is documented to be important for viral transmission13,14, the potential of cathepsin B/L or other proteases to functionally replace TMPRSS2 has not been determined.

We next asked whether enriched expression of viral receptors and entry-associated molecules in the nasal region/upper airway might be relevant for viral transmissibility. Here, we assessed the expression of viral receptor genes that are used by other coronaviruses and influenza viruses in our datasets. We looked for ANPEP, used by HCoV-22944 (ref. 15) and DPP4, used by MERS-CoV45 (ref. 16), as well as enzymes ST6GAL1 and ST3GAL4, which are important for the synthesis of α(2,6)-linked and α(2,3)-linked sialic acids recognized by influenza viruses17. Notably, their expression distribution coincided with viral transmissibility patterns based on a comparison to the basic reproduction number (R0), which estimates the number of people who can become infected from a single infected person. The skewed distribution of the receptors/enzymes toward the upper airway is observed in viruses with higher R0/infectivity, including those of SARS-CoV/SARS-CoV-2 (R0 ~1.4–5.0 (refs. 18,19,20)), influenza (mean R0 ~1.347 (ref. 21)) and HCoV-229E (unidentified R0; associated with common cold). This distribution is in distinct contrast with that of DPP4, the receptor for MERS-CoV (R0 ~0.3–0.8 (ref. 22)), a coronavirus with limited human-to-human transmission23, in which expression skews toward lower airway/lung parenchyma (Fig. 2a). Therefore, our data highlight the possibility that viral transmissibility is dependent on the spatial distribution of receptor accessibility along the respiratory tract.

Fig. 2: Respiratory expression of viral receptor/entry-associated genes and implications for viral transmissibility and genes associated with ACE2 expression.
figure 2

a, Expression of ACE2 (an entry receptor for SARS-CoV and SARS-CoV-2), ANPEP (an entry receptor for HCoV-229E), ST6GAL1/ST3GAL4 (enzymes important for synthesis of influenza entry receptors) and DPP4 (an entry receptor for MERS-CoV) from the airway epithelial datasets: Vieira Braga et al.26 (left) and Deprez et al.27 (right). The basic reproductive number (R0) for respective viruses, if available, is shown. b, Respiratory epithelial expression of the top 50 genes correlated with ACE2 expression based on Spearman’s correlation analysis (with Benjamini–Hochberg-adjusted P values) performed on all cells within the Vieira Braga et al.26 airway epithelial dataset. The colored gene names represent genes that are immune-associated (GO:0002376, immune system process or GO:0002526, acute inflammatory response). For gene expression results in the dot plots, the dot size represents the proportion of cells within the respective cell type expressing the gene and the color represents the average gene expression level within the particular cell type.

To gain more insight into the expression patterns of genes associated with ACE2, we performed Spearman’s correlation analysis with Benjamini–Hochberg-adjusted P values to identify genes associated with ACE2 across all cells within the lung epithelial cell datasets. While the correlation coefficients are relatively low (<0.12), likely due to low expression of ACE2, technical noise and dropout effects, the expression pattern of the top 50 ACE2-correlated genes across the respiratory tree is consistent with that of ACE2, with a skewed expression toward upper airway cells (Fig. 2b and Extended Data Fig. 3a,b). Notably, while some of the genes are associated with carbohydrate metabolism, possibly due to their role in goblet cell mucin synthesis, a number of genes associated with immune functions including innate and antiviral immune functions, are over-represented in the rank list, including IDO1, IRAK3, NOS2, TNFSF10, OAS1 and MX1 (Fig. 2b and Supplementary Table 1). Expression of these genes is highest in nasal goblet 2 cells (Fig. 2b), consistent with the phenotype previously described. Nonetheless, nasal goblet 1 and nasal ciliated 2 cells also significantly express these genes (Fig. 2b). Given their environmental exposure and high expression of receptor/receptor-associated enzymes (Fig. 2a), it is plausible that nasal epithelial cells are conditioned to express these immune-associated genes to reduce viral susceptibility.

In this study, we explored multiple scRNA-seq datasets generated within the Human Cell Atlas (HCA) consortium and other resources and found that the SARS-CoV-2 entry receptor ACE2 and viral entry-associated protease TMPRSS2 are highly expressed in nasal goblet and ciliated cells. This finding implicates these cells as loci of original infection and possible reservoirs for dissemination within and between individuals. Co-expression in other barrier surface tissues could also suggest further investigation into alternative transmission routes. For example, the co-expression in esophagus, ileum and colon could explain viral fecal shedding observed clinically24, with implications for potential fecal–oral transmission, whereas the co-expression in superficial conjunctival cells could explain an ocular phenotype observed in a small portion of COVID-19 patients25 with the potential of spread through the nasolacrimal duct.

The results confirmed the expression of ACE2 in multiple tissues shown in previous studies10,11 with added information on tissues not previously investigated, including nasal epithelium and cornea and its co-expression with TMPRSS2. We clearly detected nasal ACE2 mRNA expression, for which protein confirmation is needed to resolve conflicting results in literature8,12. Our findings may have important implications for understanding viral transmissibility, considering that the primary viral transmission is through infectious droplets. Moreover, as SARS-CoV-2 is an enveloped virus, its release does not require cell lysis. Thus, the virus might exploit existing secretory pathways in nasal goblet cells sustained at a presymptomatic stage. These discoveries could have translational implications. For example, given that nasal carriage is likely to be a key feature of transmission, drugs/vaccines administered intranasally could be highly effective in limiting spread.

This collaborative effort by HCA Biological Network (the lung) illustrates the opportunities from integrative analyses of HCA data, with future examples of consortium work expected soon.

Methods

Datasets were retrieved from published and unpublished datasets in multiple human tissues, including airways26,27, cornea (personal communication; Lako laboratory, Newcastle, UK), skeletal muscle (personal communication, Teichmann laboratory, Wellcome Sanger Institute and Zhang laboratory, Sun-Yat-Sen University, Guangzhou, China), ileum28, colon29, pancreas30, liver31, gallbladder (personal communication; Vallier laboratory, University of Cambridge, UK), heart (Teichmann laboratory, Hubner laboratory/Berlin, Seidmanns/Harvard and Noseda laboratory/Imperial College London, UK), kidney32, placenta/decidua33, testis34, prostate gland35, brain36, skin37, retina38, spleen39, esophagus39 and fetal tissues40,41. Raw expression values were normalized and log transformed. We retained cell clustering based on the original studies when available.

For each dataset where per-cell annotation was not available, we re-processed the data from a raw or normalized (whichever was deposited alongside the original publication) quantification matrix. The standard scanpy (v.1.4.3) clustering procedure was followed. When batch information was available, the harmony package was used to correct batch effects in the principal component space and the corrected principal components were used for computing nearest-neighbor graphs. To re-annotate the cells, multiple clusterings of different resolutions were generated among which the one best matching published clustering was picked and manual annotation was undertaken using marker genes described in the original publication. Full details can be found in analysis notebooks available at github.com/Teichlab/covid19_MS1.

Illustration of the results was generated using scanpy and Seurat (v.3.1). For correlation analysis with ACE2, we performed Spearman’s correlation with statistical tests using the R Hmisc package (v.4.3-1) and P values were adjusted with the Benjamini–Hochberg method with the R stats package (v.3.6.1) on the Vieira Braga et al.26 airway epithelial dataset and the Deprez et al.27 airway dataset. We also tested multiple additional approaches, including Kendall’s correlation, data transformation by sctransform function in the Seurat package and data imputation by the Markov affinity-based graph imputation of cells algorithm to compare correlation results. While imputation significantly improved correlations, the top genes correlated with ACE2 are largely the same as the analysis performed on un-imputed data. With the uncertainty of the extent that imputation artificially distorted the data, we reported results with no imputation, even though correlations were low. The correlation coefficients for all genes are included as Supplementary Data 1. The top 50 genes in each dataset were characterized based on gene ontology classes from the Gene Ontology database and associated pathways in PathCards were from the Pathway Unification database.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.