Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent for coronavirus disease 2019 (COVID-19), which continues to threaten millions of lives worldwide1,2,3,4. Infected individuals without apparent clinical symptoms may also transmit the viruses5. Most patients infected by SARS-CoV-2 displayed symptoms of fever, dry cough, headache, dyspnea, and pneumonia5. Before the emergence of SARS-CoV-2 in December 2019, six coronaviruses (CoVs) are able to infect humans. These include four epidemic CoVs causing mild respiratory symptoms in human (i.e., HCoV-NL63, HCoV-229E, HCoV-OC43, and HCoV-HKU1) and two CoVs related to animal-to-human spillover events, including SARS-CoV and Middle East respiratory syndrome CoV. These six CoVs all originate from bats or rodents6,7,8,9,10,11. Currently, SARS-CoV-2 shares highest sequence similarity with a bat β-CoVs (BatCoV-RaTG13)12, indicating a probable bat origin. Another mammalian animal, pangolins, has also been suggested as a potential host for SARS-CoV-213,14,15. It is also reported that SARS-CoV-2 was capable of infecting cats, dogs, etc.16,17. However, the precise intermediate animals involved in the bat-to-human transmission of SARS-CoV-2 remain controversial and require continuous investigation.

Angiotensin-converting enzyme 2 (ACE2) has been recognized as the receptor for the spike protein of SARS-CoV, SARS-CoV-2, and HCOV-NL6312,18,19. Because of the diverse types of receptors and potential hosts of different viruses, understanding the tissue tropism and host range of a novel virus remains challenging. Regarding the high biosafety level required for the operation of highly pathogenic live viruses such as SARS-CoV-2, experiments involving cellular or animal models are mostly restricted to a small number of qualified laboratories, which hinders the large-scale investigation for the preferential tissues and hosts for these pathogens.

Single-cell sequencing has been applied to construct the single-cell atlas for a wide variety of species20,21,22,23,24,25,26. Previous studies suggested that animal tissues show high heterogeneity in terms of cellular composition and gene expression profiles, and ACE2 is only expressed in a small proportion of specific cell populations27, thus revealing the potential application of single-cell analysis in investigating SARS-CoV-2 tropism. In this study, we applied single-nucleus RNA sequencing to determine the potential target cells and hosts for SARS-CoV-2. A comprehensive single-cell atlas was constructed for 11 species, comprising ~300,000 cells derived from a wide variety of anatomical locations, thus representing the broadest single-cell atlas to date.

We first systematically screened for putative SARS-CoV-2 target cells (indicated by the co-expression patterns of SARS-CoV-2 entry receptor ACE2 and SARS-CoV-2 entry activator TMPRSS2) to assess the potential tissue tropism and host range for the virus. In addition to screening virus entry factors, single-cell atlases for multiple species have been utilized to identify highly conserved regulomes and connectomes within evolutionarily distant species28,29,30. However, despite encouraging progress in state-of-the-art single-cell sequencing in traditional evolutionary biology and neural science, single-cell studies on non-neural tissues over a broad spectrum of non-model species are lacking.

Here we show extensive cellular cross-talk mediated by ligands and receptors, as well as dynamic intracellular regulatory circuits of critical pulmonary cell types in mammals, reptiles, and birds. Our findings could help narrow down suspected animal hosts of newly emerging viruses and accelerate the identification of animal species involved in virus amplification and interspecies transmission. Furthermore, our study could be employed to explore fundamental cellular and molecular networks among evolutionarily distant species.


Generation of single-cell atlas for cat, tiger, and pangolin

Although the intermediate animal host involved in the emergence of SARS-CoV-2 remains obscure, cats, tigers, and pangolin were found permissive for CoV infection13,17,31,32. To establish a comprehensive transcriptome atlas for these three susceptible species, we generated single-nucleus libraries using 10× Genomics for various tissues (cat lung, kidney, liver, heart, eyelid, esophagus, and rectum; tiger lung, kidney, liver, spleen, and heart; and pangolin lung, kidney, liver, spleen, heart, esophagus, stomach, duodenum, and large intestine) (Fig. 1a and Supplementary Data 1). In total, 34,173, 80,608, and 92,863 single-cell transcriptomes passing quality control (see “Methods”) were obtained for the cat (Fig. 1b), tiger (Fig. 1c), and pangolin (Fig. 1d), respectively (Supplementary Data 1). Cell clustering analysis was performed using Seurat33,34 and cell type annotation was conducted according to the expression of canonical cell type markers (Fig. 1e–m, Supplementary Figs. 14, and Supplementary Data 1). Unsupervised clustering analysis revealed 30, 25, and 30 major cell types for the cat, tiger, and pangolin, respectively (Fig. 1b–d). The heart atlas for the three species mainly consisted of endothelial cells, fibroblasts, cardiomyocytes, and macrophages (Fig. 1e, h, k and Supplementary Figs. 13). The liver atlas for the three species primarily contained hepatocytes, Kupffer cells, hepatic stellate cells, endothelial cells, and liver sinusoidal endothelial cells (Fig. 1f, i and Supplementary Figs. 13). The spleen atlas for the tiger and pangolin included B cells, T cells, macrophages, and endothelial cells (Fig. 1j and Supplementary Figs. 23). The kidney atlas for the three species included proximal tubular cells, collecting duct cells, podocytes, Henle’s loop cells, and endothelial cells (Fig. 1g and Supplementary Figs. 13). The cat and pangolin esophagus primarily contained endothelial, epithelial, and smooth muscle cells (Fig. 1m and Supplementary Figs. 1 and 3). The cat eyelid tissue consisted of Wolfring’s gland, endothelial, epithelial, and immune cells (Supplementary Fig. 1), whereas the rectum tissue contained goblet, Paneth, enteroendocrine, and progenitor cells (Supplementary Fig. 1). In the pangolin digestive tract, we identified endothelial cells, epithelial cells, macrophages, secretory cells, and smooth muscle cells in the stomach (Supplementary Fig. 3); endothelial cells, enteroendocrine cells, goblet cells, Paneth cells, progenitor cells, stem cells, transit-amplifying (TA) cells, and tuft cells in the duodenum (Supplementary Fig. 3); and enterocyte cells, enteroendocrine cells, goblet cells, macrophages, progenitor cells, stem cells, and TA cells in the large intestine (Supplementary Fig. 3). Alveolar type 1 cells (AT1), alveolar type 2 cells (AT2), ciliated cells, secretory cells, endothelial cells, fibroblasts, T cells, B cells, and macrophages were identified in the lung atlas (Supplementary Fig. 4 and Supplementary Data 1). Taken holistically, we constructed the single-cell atlas for three important, yet poorly characterized, non-model species, which revealed their cellular taxonomies and thus laid the foundation for in-depth study regarding the cellular biology and development of these animals.

Fig. 1: Single-cell atlas for cat, tiger, and pangolin.
figure 1

a Illustration of the overall project design. Single-nucleus RNA-seq was performed with multi-tissues of cat (lung, kidney, liver, heart, eyelid, esophagus, and rectum), tiger (lung, kidney, liver, spleen, and heart), and pangolin (lung, kidney, liver, spleen, heart, esophagus, stomach, duodenum, and large intestine), with tissue types indicated by corresponding images. Lung tissues were used from other animals (deer, goat, rabbit, dog, hamster, lizard, duck, and pigeon). bd tSNE plot showing single-cell atlas of cat (b), tiger (c), and pangolin (d). Dots with colors represent different cell types, which were indicated below. LSECs, liver sinusoidal endothelial cells. TA cells, Transit-amplifying cells. eg Plot (upper panel) showing different cell types in cat heart, liver, and kidney. Feature plot (lower panel) showing expression of marker genes (above) of indicated cell types (below). hj Plot (upper panel) showing different cell types in tiger heart, liver, and spleen. Feature plot (lower panel) showing expression of marker genes (above) of indicated cell types (below). km Plot (upper panel) showing different cell types in pangolin heart, stomach, and esophagus. Feature plot (lower panel) showing expression of marker genes (above) of indicated cell types (below). In feature plot, red dots representing cells expressed the marker gene and corresponding cell types were highlighted with colored lines.

Investigation of SARS-CoV-2 tissue tropism

Although the susceptibility of cats, tigers, and pangolins to CoV has been reported13,17,31,32, little information is available regarding the distribution of putative SARS-CoV-2 target cells in these animals. Utilizing the high-quality and comprehensive single-cell atlas for distinct tissues of the three species, we systematically evaluated the cellular distribution of SARS-CoV-2 entry factors.

In the cat, ACE2-expressing cells and TMPRSS2-expressing cells were observed in all the tissues sampled (Figs. 2a, c and 3a). ACE2 and TMPRSS2 co-expressing cells were detected in the kidney (collecting duct cells, endothelial cells, Henle’s loop cells, proximal tubule cells, and stromal cells), heart (epicardial cells), eyelid (endothelial cells, epithelial cells, and Wolfring’s glands cells), rectum (enteroendocrine cells) (Fig. 2b, d), and lung (AT2 cells, ciliated cells, endothelial cells, fibroblasts, and secretory cells) (Fig. 3b–d). Notably, we observed the co-expression of ACE2 and TMPRSS2 in over 40% of kidney proximal tubule cells (Fig. 2b and Supplementary Data 2). In addition, co-expression of ACE2 and TMPRSS2 was detected in more than 50% of Wolfring’s gland cells in the cat eyelid and epicardial cells in the cat heart (Fig. 2b and Supplementary Data 2), suggesting the possibility that such tissue could act as a putative portal for the initial infection and transmission of SARS-CoV-2. Thus, our data showed that cats may be prone to multiple organ (lung, kidney, heart, rectum, and eyelid) infection by SAR-CoV-2 when exposed.

Fig. 2: Screening of putative SARS-CoV-2 target cells at single-cell resolution in various tissues of cat, tiger, and pangolin.
figure 2

a, e, i Dot plot showing the expression of ACE2 and TMPRSS2 in various cell types of cat (a), tiger (e), and pangolin (i). Each dot represents gene expression within indicated cell type, of which the color represents average normalized expression level and the size indicates the percentage of cells that expressed the gene within each cell type. b, f, j Bar plot showing percentage of cells that co-express ACE2 and TMPRSS2 within each cell type of cat (b), tiger (f), and pangolin (j). c, g, k Feature plot showing the expression pattern of ACE2 and TMPRSS2, respectively, in indicated cell types in kidney of cat (c), tiger (e), and pangolin (i). d, h, l Feature plot showing the co-expression of ACE2 and TMPRSS2 in indicated cell types in kidney of cat (d), tiger (h), and pangolin (l). Putative SARS-CoV-2 target cells were highlighted with red-colored labels.

Fig. 3: Expression of ACE2 and TMPRSS2 in lung cells of wildlife, pets, livestock, and poultry.
figure 3

a Dot plot showing expression level and percentage of ACE2 and TMPRSS2 (indicated on the right) in each cell type (indicated on the left) of lungs across a variety of species (indicated above). Dot color represents scaled average expression level and dot size represent percentage of cells that expressed indicated gene within each cell type. b Bar plot showing percentage of cells that co-expressed ACE2 and TMPRSS2 within each cell type (indicated on the left) of different species (indicated above). c, e, g, i Feature plot showing the expression pattern of ACE2 and TMPRSS2, respectively, in indicated cell types in lung of cat (c), tiger (e), hamster (g), and goat (i). d, f, h, j Feature plot showing the co-expression of ACE2 and TMPRSS2 in indicated cell types in lung of cat (d), tiger (f), hamster (h), and goat (j). Putative SARS-CoV-2 target cells were highlighted with red-colored labels.

In the tiger, ACE2-expressing cells and TMPRSS2-expressing cells were observed in all the tissues sampled (Fig. 2e, g). ACE2 and TMPRSS2 co-expression was detected in five out of eight kidney cell types (collecting duct cells, Henle’s loop cells, mesangial cells, podocytes, and T cells), one out of six heart cell types (endothelial cells) (Fig. 2f, h and Supplementary Data 2), and six out of nine lung cell types (AT1 cells, AT2 cells, ciliated cells, endothelial cells, fibroblasts, and secretory cells) (Fig. 3b, e, f), thus indicating infection potential of the lung, kidney, and heart.

In the pangolin, ACE2-expressing cells and TMPRSS2-expressing cells were observed in all the tissues sampled (Fig. 2i, k). Potential SARS-CoV-2 target cells were mainly found in the kidney (collecting duct cells, endothelial cells, podocytes, and proximal tubule cells), liver (hepatocytes), spleen (macrophages), and stomach (secretory cells) (Fig. 2j, l and Supplementary Data 2). Unlike the tiger and cat, however, SARS-CoV-2 target cells were only present in a small proportion of pangolin lung endothelial cells, with other lung cell types absent of ACE2 and TMPRSS2 co-expression (Fig. 3b and Supplementary Data 2). Similar to the cat, exposure to SARS-CoV-2 may lead to systemic infection in the pangolin.

Collectively, we evaluated SARS-CoV-2 tissue tropism in the cat, tiger, and pangolin. Distributions of putative SARS-CoV-2 target cells were restricted in specific cell types of certain tissues, with skewed expression towards the respiratory and urinary systems in the cat, respiratory system in the tiger, and urinary system in the pangolin.

Screening of SARS-CoV-2 entry factors in lung atlases

Lung is one of the main target tissue of various respiratory viruses35 and pneumonia is a typical symptom of COVID-1936. To characterize lung cell compositions in a broad range of species, we generated single-nucleus libraries of lung cells for 11 species covering livestock, poultry, pets, and wildlife (Fig. 3a and Supplementary Fig. 4), resulting in a total of 114,015 pulmonary cells passing quality control (Supplementary Data 1).

Recent comparative and structural analysis of ACE2 predicted a broad SARS-CoV-2 host range in vertebrates37. Consistently, functional and genetic analysis of ACE2 orthologs among mammals suggested a broad potential host tropism of SARS-CoV-238. To evaluate the expression patterns of SARS-CoV-2 entry factors in lung cells for various species, we screened eight mammalian species (cat, tiger, pangolin, dog, hamster, deer, rabbit, and goat) (Fig. 3a and Supplementary Data 3). Results suggested that, in addition to the three previously mentioned species (cat, tiger, and pangolin) (Fig. 2), four other mammalian species (dog, hamster, deer, and goat) also demonstrated co-expression of ACE2 and TMPRSS239 in specific cell types (Fig. 3b and Supplementary Data 3). Putative SARS-CoV-2 target cells were detected in three cell types in the dog (AT1, AT2, and ciliated cells). In the hamster lung, putative SARS-CoV-2 target cells were abundant in ciliated cells and found in a small proportion of AT1 cells (Fig. 3g, h). Four deer cell types (AT1 cells, AT2 cells, fibroblasts, and endothelial cells) co-expressed ACE2 and TMPRSS2. In the goat lung, ACE2 and TMPRSS2 co-expression was detected in AT1, ciliated, and endothelial cells (Fig. 3i, j). The proportions of SARS-CoV-2 target cells in the cat lung ciliated cells were higher than the proportions in corresponding cell type of the other species studied (Fig. 3b and Supplementary Data 21). Meanwhile, the proportion of cells co-expressing ACE2 and TMPRSS2 in the cat lung ciliated cells was higher than macrophage cells (Fig. 3b and Supplementary Data 22). Besides, we investigated the expression of SARS-CoV-2 entry factors in human lung30 and detected co-expression of ACE2 and TMPRSS2 in AT2 and ciliated cells (Supplementary Fig. 5).

Next, we screened the expression patterns of orthologs of ACE2 and TMPRSS2 in the lung cells of two avian species (duck and pigeon) and one reptile species (lizard). Consistent with previous research that SARS-CoV-2 replicates poorly in ducks40, no putative SARS-CoV-2 target cells were found in the lung cells of the poultry species (Fig. 3a, b). Intriguingly, ACE2 and TMPRSS2 co-expression was detected in the AT2 and ciliated cells of the lizard lung (Fig. 3a, b). As the binding affinity of ACE2 to SARS-CoV-2 has not been evaluated in lizards, this result should be interpreted with caution.

Cross-talk landscape of cat cells and viruses

To reveal the putative cellular tropism of feline viruses or viruses that may potentially infect cats at single-cell resolution, we investigated the expression patterns of 13 virus entry factors (Nectin1, Ncam1, Anpep, Ngfr, Grm2, Tnfrsf4, Flvcr1, Cxcr4, Nectin2, F11r, Slc20a1, Tfrc, and Slc20a2) (Supplementary Data 4). Overall, the 13 viral receptors could be classified into three clusters (Fig. 4a, b). Briefly, cluster A receptors (Ncam1 and Anpep) demonstrated a highly specific target cell spectrum, while cluster C receptors (Flvcr1, Cxcr4, Nectin1, Nectin2, F11r, Slc20a1, Tfrc, and Slc20a2) showed pan-expression patterns (Fig. 4a, b). In contrast, cluster B receptors (Ngfr, Grm2, and Tnfrsf4) were expressed at very low levels in all 31 feline cell types included in this study.

Fig. 4: Cross-talk landscape of cat cells and representative feline viruses.
figure 4

a Dot plot showing expression percentages and average expression levels of virus receptors (indicated below) in each cell types (indicated on the right) of cat cells. b Feature plot showing the expression patterns of virus receptors (F11r, Ncam1, Nectin2, Slc20a1, Slc20a2, Tfrc, Cxcr4, Anpep, Flvcr1) in cat cells. Corresponding virus species and image were illustrated approximate to the feature plot. CCov, canine coronavirus; FA27, feline leukemia virus strain C; FCV, feline calicivirus; RABV, Rabies lyssavirus; FCoV, feline coronavirus; FeLV-B1, feline leukemia virus strain B; FPLV, feline panleukopenia virus; FIV, feline immunodeficiency virus; HCoV-229E, human coronavirus 229E; IBV, infectious bronchitis virus; TGEV, transmissible gastroenteritis virus. c Bubble plot showing the immune-related GO terms of eight representative virus receptors in cat. The dot color was positively correlated with enrichment levels, whereas dot size was determined by count of genes in that GO term. P-values were calculated using hypergeometric test. Multiple comparisons adjustment was performed using Benjamini and Hochberg method. Exact P-value and source data were included in the Source Data file.

Different receptors for the same virus show high variability in terms of expression abundance and spatial distribution. For example, rabies lyssavirus (RABV) was reported to infect the kidney, lung, and heart of cats41,42. So far, three well-known receptors for RABV have been reported, i.e., Ncam1, Ngfr, and Grm243. Here, Ncam1 was enriched in kidney stromal cells, lung ciliated cells, and heart cardiomyocytes and fibroblasts, whereas Ngfr and Grm2 were weakly expressed, indicating the diverse cellular tropism of RABV in cats. Both Cxcr4 and Tnfrsf4 are receptors of feline immunodeficiency virus (FIV). Here, Cxcr4 was expressed in T and B cells of the heart and liver, in line with previous study showing that FIV can infect T and B lymphocytes44. In contrast, Tnfrsf4 was expressed poorly in feline cells, implying a receptor usage preference of FIV.

Receptors for different strains of the same virus show a high degree of heterogeneity. For instance, Slc20a1 and Slc20a2 are the receptors for feline leukemia virus (FeLV) strain B/lambda-B145, whereas Flvcr1 is the receptor for the FeLV strain C/FA2746. These three receptors were all present in the cluster C receptors (pan-expression cluster). Infection of FeLV in the myocardium, thymus, mesentery, liver, kidney, and lung has been reported in cats47,48, consistent with our observation that Flvcr1, Slc20a1, and Slc20a2 were highly expressed in stromal cells and distal convoluted tubule cells of the kidney, epicardial cells of the heart, and ciliated cells of the lung. Notably, the expression proportions of Slc20a1 and Slc20a2 in corresponding cell types were much higher than that of Flvcr1. In addition to the commonly expressed cell types, we observed abundant expression of Slc20a2 in heart cardiomyocytes and liver hepatocytes, whereas the expression of the other two receptors (Slc20a1 and Flvcr1) was much less significant.

Furthermore, our study indicated that potential target tissue could be predicted based on viral receptor expression patterns. As a cell adhesion molecule that can mediate pseudorabies virus (PRV) entry, Nectin1 colocalizes with E-cadherin at the adhesion junction of epithelial cells49, consistent with the enrichment of Nectin1 in the epithelial cells of the eyelid. Interestingly, although PRV infection has been detected in the hearts of dogs, rats, and pigs50,51,52, no PRV infection has been reported in cats heart previously. Here we observed that Nectin1 was enriched in epicardial cells of feline heart, indicating cat heart could probably be permissive to PRV infection. In addition, Anpep is a receptor commonly used by feline infectious peritonitis virus, transmissible gastroenteritis virus, and infectious bronchitis virus. These viruses have been detected in kidney of cats53,54. Consistently, Anpep was found to be highly expressed in stromal cells and proximal tubule cells of kidney. Besides, we detected the enrichment of Anpep in hepatocytes and hepatic stellate cells of liver, indicating the possibility that these liver cells could be targeted by viruses using Anpep as entry factor (Fig. 4a).

SARS-CoV-2 entry factors are reportedly co-expressed with immunity genes in human and monkey cells27,55, suggesting that SARS-CoV-2 target cells are conditioned to express immunity genes to reduce their susceptibility to viruses27. To explore whether this trend holds true for other viruses and non-model species, we applied Pearson’s correlation coefficient analysis for all 13 feline viral receptors of all 31 cat cell types. Briefly, we identified the top 100 genes positively correlated with each virus receptor, followed by Gene Ontology (GO) enrichment analysis (Supplementary Data 4). We noticed that viral receptors co-expression analysis could reveal novel gene signatures that is not simply reflected in the cell type differentially expressed genes (DEGs) (Supplementary Fig. 6). Intriguingly, we observed co-expression of humoral immune response genes (A2m, Fga, and C9) with Anpep and Nectin1, co-expression of adaptive immune response genes (Ptprc, Skap1, and Samsn1) with Cxcr4 and Tfrc, and co-expression of type 2 immune response genes (Anxa1 and Il33) with Nectin2 (Fig. 4c and Supplementary Data 4), suggesting the existence of extensive cell-virus cross-talk in the host–pathogen interaction interface. In summary, our data indicated that the co-expression of viral receptors and immunity genes may be more common than previously recognized. This phenomenon could be the result of the long-term “arms race” between viruses and hosts, which collectively shape the immune landscape of animals and host tropism of viruses56.

Conservation of pulmonary cellular connectomes

In addition to exploring cell-virus cross-talk, a pan-species single-cell atlas can be applied to investigate divergent and conserved cell–cell interactions among multiple species. To identify putative cellular communications, we constructed a ligand-receptor-mediated interaction network for lung cells within each species, except the rabbit (due to limited cell numbers) and pigeon (due to limited number of ortholog genes), which revealed extensive and dynamic cellular cross-talk in pulmonary cells (Fig. 5a and Supplementary Data 5 and 6). Overall, the source connectome topology of distinct species was quite similar, with fibroblasts have dominant source weight and hub scores. As for target analysis of network centrality, endothelial cells function as signaling authority across most species including dog, hamster, lizard, goat and tiger. The immune cells were relatively less active, in both source and target analysis of connectome topology (Fig. 5b). This tendency was quite robust regardless of cutoff threshold values (Fig. 5c). In general, communication pairs were classified into 44 signaling modalities (Supplementary Data 6). Consistent with previous study30, the AT1 cells dominated the vascular endothelial growth factor (VEGF) family signaling networks in most tested species (Fig. 5d).

Fig. 5: Conserved pulmonary cellular connectomes.
figure 5

a Communication network of receptor–ligand pairs between lung cell types of indicated species. Cell type was represented by colored node, of which the size was directly proportional to sum of receptor–ligand pairs between this node and all other nodes. Edge width was directly proportional to number of receptor–ligand pairs between two connected nodes. b Plot (upper) showing source weight and hub score of various cell types in indicated nine species. Plot (lower) showing target weight and authority score of nine cell types. c Comparison of degree rankings and the effect of thresholding of lung cell types in indicated species. The x-axis (“percent.source”) is percent of cell type expressing the marker, and the y-axis is the degree of each node plotted on a logarithmic scale. d Plot showing source weight of VEGF signaling family in nine cell types. e Violin plots showing expression level of indicated ligand and receptor genes within each cell type of lung in different species. f Circos plot of mammalian-specific conserved connectome in cat. Receptors and ligands were displayed near the upper and lower half circle, respectively.

We next identified pan-conserved cellular connectivity, which may correspond to ancient signaling vectors inherited from common ancestors of mammals, reptiles, and birds. In total, we detected 642 pairs of cell–cell interactions conserved among all nine species (Supplementary Data 6), most of which were associated with the extracellular matrix (collagens and fibronectin) and the development (VEGF) and morphogenesis (Semaphorin, SEMA) of pulmonary endothelial cells. Neuropilin-1, a receptor critical for vascular growth and remodeling, can convey SEMA and VEGF signaling57. Here we detected a fibroblast-Sema3c-Nrp1-endothelial and AT1-Vegfa-Nrp1-endothelial signaling axis in all nine species (Fig. 5e). Intriguingly, SEMA signaling plays a pivotal role in vascular patterning, and signal disruption can lead to anomalous pulmonary venous connections58. Here, cellular cross-talk mediated by other SEMA signaling components (e.g., Sema5a, 6a, and 6d) was also highly conserved (Supplementary Data 6).

In addition to pan-conserved cellular cross-talk, we explored why cell–cell cross-talk was conserved in all mammalian lung cells but was absent in the pulmonary cells of reptiles (lizard) and birds (duck). In total, we found 147 pairs of cellular interactions that were specifically conserved in mammals (Fig. 5f and Supplementary Data 6). The mammalian-specific conserved cellular connectivity included ADAM (Adam9-Itga3, Adam9-Itgav, and Adam9-Itgb5), ephrins (Efnb2-Pecam1 and Efna1-Epha7), fibronectins (Fn1-Itga3 and Fn1-Robo4), intracellular trafficking (Calr-Itga3 and Calr-Itgav), laminin signaling (Lama2, Lama4, Lamc1, and Lamc2 to Itga3), matrix metalloproteinases (MMPs) (Timp2-Itga3), NOTCH (Jag1-Notch2), SLIT (Slit2-Robo4), VEGF (Vegfc-Lyve1), and TGFB (Tgfb2-Eng). Jag1-Notch2 signaling is reported to play a major role in orchestrating alternative cell fate choices of ciliated cells59. Indeed, the Jag1 ligand broadcast from AT1 and AT2 cells is predicted to connect to the Notch2 receptor in ciliated cells. In addition, MMPs play critical roles in lung organogenesis60. In our study, Timp2 signaling from the fibroblasts and endothelial cells may target the Itga3 receptor of the AT1, AT2, ciliated, and endothelial cells. Endothelial ephrinB2 function was reported to be essential for lung alveolar formation61; here we detected endothelial cell-Efnb2-Pecam1-AT2 cell signaling was conserved in all mammalian species (pangolin, dog, hamster, deer, goat, cat, and tiger) but was absent in lizard and duck. Transforming growth factor-β plays crucial roles in epithelial–mesenchymal interactions for proper alveolarization and lung branching morphogenesis, and is thus critical for lung organogenesis and homeostasis62. Our data suggested that mammalian AT1 cells, AT2 cells, ciliated cells, and fibroblasts may regulate endothelial cells via Tgfb2-Eng signaling, but such signaling was not observed in the non-mammalian species. In summary, this study systematically revealed those highly conserved and lineage-specific cell-to-cell signaling within vertebrate lungs.

Conservation of regulomes in pulmonary cells

To reveal the regulatory mechanisms underlying alveolar development from the perspective of evolutionary biology, we predicted the regulomes in pulmonary cells for nine species (rabbit and pigeon were excluded for the same reasons mentioned above), resulting in an average of 469,796 Transcription Factor (TF)-target interactions for nine cell types (GENIE3 score > 0.01) (Supplementary Data 715). The number of TF-target interactions conserved in at least two species ranged from 5049 in AT2 cells to 44,009 in T cells (Fig. 6a and Supplementary Data 16). Encouragingly, several regulators for AT1 cells (Cux1 and Gata6)63, AT2 cells (Etv5)63, ciliated cells (Rfx3 and Glis3)64,65, secretory cells (Nfib)66, fibroblasts (Zeb2 and Foxp1)67,68, endothelial cells (Tcf12 and Dach1)69,70, macrophages (Mef2a and PPARg)71,72, T cells (Sox5 and Ikzf1)73,74, and B cells (Mef2c and Ets1)75,76 were active in the genetic regulatory network of the corresponding cell types (Supplementary Data 16), consistent with their expected regulatory functions. Of particular interest, we found a variety of regulatory circuits that were highly conserved among multiple species. Briefly, 37, 29, 533, 84, 151, 221, 1 097, 217 and 126 pairs of TF-target interactions were deeply conserved in AT1 cells, AT2 cells, ciliated cells, secretory cells, fibroblasts, endothelial cells, macrophages, T cells, and B cells, respectively (present in at least four species) (Supplementary Data 17). In the AT1 cells, 37 interactions associated with 12 TFs (Nfia, Dach1, Foxp1, Tox, Erg, Meis1 Cux1, Epas1, Mecom, Tcf12, Tead1, and Zeb2) were conserved in at least four of the nine species investigated. Notably, Cux1 contributed to 49% of the highly conserved interactions in AT1 cells (Fig. 6b). Cux1 is a known transcription factor regulating AT1 cell differentiation63. Rtkn2, the predicted target of Cux1, is reported to be an AT1 cell-specific gene77. The deeply conserved regulatory relationships between Cux1 and other functional genes (Lmo7, Cped1, Rtkn2, Ap1s3, Arhgef26, Cadm1, Col4a3, Ctdspl, Ctnna1, Lama3, Nrg1, Pdzd2, Unc13b, and Wwc2) suggest that Cux1 may manipulate the expression patterns of those developmental genes to instruct the differentiation, development, and morphogenesis of AT1 cells in a species-conserved manner.

Fig. 6: Conservation of regulomes in pulmonary cells.
figure 6

a Genetic regulatory network constructed with top 30 TFs with largest number of interactions and predicted target genes in AT1 cells, ciliated cells, and macrophages. b Highly conserved genetic regulatory networks in AT1 cells within each indicated species. Light blue nodes represent regulators, green nodes represent corresponding target genes. Edge width is proportional to weight of regulation and node size is proportional to number of target genes of regulator. c GO enrichment related to cellular functions of predicted target genes of top 30 TFs in AT1 cells, ciliated cells, and macrophages, respectively. Dot color represents significance level of enrichment analysis and dot size is count of target genes classified in GO terms. P-values were calculated using hypergeometric test. Multiple comparisons adjustment was performed using Benjamini and Hochberg method. Exact P-value and source data were included in the Source Data file.

In addition to well-known regulators, we also identified a variety of putative novel regulators in each cell type, e.g., Epas1, Klf6, Myb, Nfia, Tcf7l2, Tox, and Zeb2 in AT1 cells; Zbtb16Sox5, and Tox in ciliated cells, and Mitf, Arid1b, Trps1, Etv6, Foxn3, Tcf12, Jarid2, Lrrfip1, Arid5b, Kmt2c, Elf1, and Mllt10 in macrophages. The regulatory functions of these transcription factors were inferred based on the enriched GO terms of their predicted target genes (Supplementary Data 18), which supported the proposed functions of each regulator in the corresponding cell type. For example, targets for AT1 regulators were closely related to GO terms covering epithelial cell development (Arhgef26, Cdk6, and Tmod1), lung epithelium development (Foxp2, Gata6, and Ncor2), morphogenesis of a branching epithelium (Greb1l, Cd44, and Dlg1), and epithelial tube morphogenesis (Tgfbr2, Dlc1, and Efnb2). Additionally, targets under the control of ciliated cell regulators were enriched in GO terms associated with cilium organization (Bbs9, Cfap43, and Rfx3), cilium assembly (Lrguk, Mak, and Rp1), cilium movement (Ccdc114, Ccdc39, and Dnah10), cilium-dependent cell motility (Cfap57, Spag16, and Tekt1), motile cilium assembly (Intu, Kif3a, and Lrrc46), epithelial cilium movement involved in determination of left, right asymmetry (Dnah11, Lrrc6, and Ccdc40), and regulation of cilium beat frequency (Armc4, Dnaaf1, and Bbs4). Targets for macrophage regulators were associated with macrophage chemotaxis (Mapk14, Lgals3, and Ptprj), macrophage migration (Rpl13a and Nup85), and macrophage cytokine production (Cd36, Cd74, and Sema7a) (Fig. 6c). Overall, our study systematically identified conserved regulators for pulmonary cells, including both well-recognized and novel regulators.

Conserved gene modules of AT1 and AT2 cells

Conserved core expression programs have been identified by cross-species comparison of microglia single‐cell RNA sequencing data across evolutionary timescales29. Here we compared the transcriptomic landscapes of nine species (cat, tiger, pangolin, dog, hamster, deer, goat, lizard, and duck) spanning more than 312 million years of evolution78. Briefly, we selected 5442 genes showing expression in the lung cells of the nine species, followed by hierarchical clustering (see “Methods” and Supplementary Data 19). Cluster 4, 5 (1432 genes) from AT1 cells (Fig. 7a and Supplementary Data 19) and cluster 3, 5 (1429 genes) from AT2 cells (Fig. 7b and Supplementary Data 19) showed conserved expression patterns across all nine species and were thus considered as conserved core gene expression programs for the AT1 and AT2 cells, respectively. Those cell type conserved genes showed largely non-overlapping molecular signature with cell type highly expressed genes (Supplementary Fig. 7). The AT1 cell core gene expression program was related to epithelial cell morphogenesis, proliferation, migration, and respiratory system development (Fig. 7c and Supplementary Data 20). In contrast, the AT2 cell-conserved genes were enriched in stem cell differentiation, stem cell population maintenance, activation of innate immune response, and lung alveolus development (Fig. 7d and Supplementary Data 20). For example, Cav1, a critical regulator of lung injury that is highly expressed in mouse AT1 cells79,80, was conserved in the AT1 cells in multiple species (Fig. 7e), and Etv5, which is essential for the maintenance of mouse AT2 cells81, was deeply conserved in the AT2 cells (Fig. 7f).

Fig. 7: Conserved core expression programs in AT1 and AT2 cells.
figure 7

a, b Heatmap showing clustering of orthologs in AT1 (a) and AT2 cells (b) across different species, with cluster index displayed on left. Each row represents one ortholog and each column represents one species. Colors represent expression levels of orthologs. c, d GO terms enriched in AT1 and AT2 cell-conserved orthologs. e, f Violin plot showing cell type-specific expression patterns of AT1 cell-conserved gene (Cav1) and AT2 cell-conserved gene (Etv5). P-values were calculated using hypergeometric test. Multiple comparisons adjustment was performed using Benjamini and Hochberg method. Exact P-value and source data were included in the Source Data file.

Interestingly, a set of lung disease genes overlapped in the core gene expression programs of the AT1 and AT2 cells, thus linking evolutionary biology with pulmonary pathology. For example, multiple genes in the AT1 cell program were closely related to pulmonary veno-occlusive disease (PVOD) (Bmpr2 and Eif2ak4), lung cancer (Itga9 and Braf), pulmonary fibrosis (Cadm1 and Itgb6), pulmonary hypertension (Bmpr2 and Cav1), and pulmonary emphysema (Itgb6). Several genes of AT2 core program were strongly associated with lung disease such as lung cancer (Mvp and Braf), pulmonary fibrosis (Parn), and PVOD (Eif2ak4) (Fig. 7a and Supplementary Data 20). Here we presented several examples that evolutionarily conserved genes are developmentally important, and interruptions in those genes can lead to disorders of well-balanced developmental programs, thus leading to a variety of pathological consequences. Collectively, we revealed core gene expression programs of alveolar cells that are highly conserved in mammals, reptiles, and birds, thus presenting gene modules of fundamental importance to alveolar cellular identity maintenance.

PANDORA: a website for non-model specie single-cell atlas

To fully exploit single-cell resources, we developed an online platform, named PANDORA (, composed of five main functional modules: i.e., panAtlas, panScan, panConnectome, panRegulome, and panCoreProgram. The panAtlas module allows users to explore gene expression patterns of any genes in any tissue included in this study. In the panScan module, visualization of the expression patterns of virus receptors in cell populations of each tissue can be calculated, giving clues about cell tropism of each virus. In the panConnectome module, cellular interactions mediated by ligands and receptors can be depicted. In the panRegulome module, putative TF–target regulations within each cell type can be predicted. In the panCoreProgram module, the core expression program for each cell type can be presented. In summary, we developed a comprehensive and integrated online platform, which can hopefully maximize the functionality and applicability of our resources.


The SARS-CoV-2 pandemic has caused a catastrophic global health crisis. Although drugs and vaccines will help reduce loss in life and economic impact caused by COVID-19, their large-scale application is time-consuming and labor-intensive. Thus, the identification of intermediate hosts or animal reservoirs is crucial for the prevention and control of SARS-CoV-2 and other zoonotic viruses. So far, functional and genetic analyses of ACE2 orthologs in 48 species have found that 44 can mediate cell entry of SARS-CoV-2, implicating a broad range of host tropism for this virus27. With the advancement of single-cell sequencing technologies, the single-cell atlas for human, mouse, zebrafish, fruit fly, frog, and nematode have been successfully constructed20,21,22,23,82,83. However, studies on the cell composition of non-model species are lacking. To expand single-cell resources for non-model species, we generated a single-cell atlas for 11 species that are in close contact with humans (livestock, poultry, and pets) or possibly harbor enzootic viruses, thus producing a transcriptome of 272,148 high-quality cells derived from 29 tissues. Based on these resources, we screened putative virus target cells, revealed critical regulators of cell fate commitment, dissected genetic regulatory networks of pulmonary cells, and identified core gene expression programs deeply conserved among mammals, reptiles, and birds.

SARS-CoV-2 target cells were widely distributed among tissues within the digestive system (esophagus, rectum), respiratory system (lung), and urinatory system (kidney) of the cat. Regarding the licking habits of felines, cats could be infected by SARS-CoV-2 via the fecal-oral route or the airborne transmission route. Here, our results provide some explanation for the observation that cats are highly permissive to SARS-CoV-240 and highlight the necessity to monitor and evaluate the possible roles of cats during the COVID-19 pandemic.

Cellular signaling is crucial for the emergence of tissue properties and the maintenance of tissue homeostasis30. In this study, we painted a system-level portrait of pulmonary cell–cell cross-talk. In addition, we identified a variety of conserved cellular connectivity exclusively present in mammals. The evolution of vertebrate lungs suggests that the emergence of new pulmonary structures is usually derived from modifications of preceding ones in order to adapt to new functional requirements84. Yet, there remains a huge gap between lung morphological innovations and pulmonary cellular communication modifications. Thus, our study could provide some clues regarding the underlying mechanisms of evolution innovation from primitive lungs to advanced mammalian lungs at the cellular and molecular levels.

Several well-known master regulators of pulmonary cells were found to be active in corresponding cell types in a species-conserved manner, in line with their expected functions, thus providing a repertoire of regulatory circuits that deserve further investigation. In addition, we identified a range of novel regulators that may work cooperatively with canonical TFs to regulate pulmonary cell development. However, the functions of TFs in lung development of non-model species remain poorly understood, and follow-up functional studies are needed to reveal their previously underappreciated regulatory functions. Our study also showed that developmentally important regulators were evolutionarily conserved, indicating that conserved TF–target pairs may represent critical regulatory circuits, which are worthy of experimental verification. Thus, cross-species comparison of genetic regulatory networks among evolutionarily distant species offers a strategy to pinpoint regulatory circuits that are fundamental for lineage commitment and cell identity maintenance.

Taken together, our study provides a unique resource and framework to study putative virus target cells, making it possible to screen the entry factors of a wide range of viruses in various species in an unbiased manner. Evolutionarily conserved connectomes and regulomes are of fundamental importance for organism development and cell fate commitment. Considering the high diversity and heterogenicity of animal tissues, it is necessary to elucidate tissue evolution at single-cell resolution. With the development of single-cell sequencing and advancement of international collaborative projects, atlases for more species will be generated at an accelerated speed. We anticipate that the information gained from the present study could augment future research and provide insights into prevention and control strategies against enzootic viruses such as SARS-CoV-2. Furthermore, our research should allow further investigations on the regulatory networks, cellular cross-talk, and core gene expression programs across evolutionary timescales. Hopefully, our resources can be utilized by evolutionary biology researchers to elucidate the cellular and molecular mechanisms underlying the origin and evolution of lungs, an important interface for pathogen–host interactions and a crucial respiratory organ that plays an important role in vertebrate adaptation to unique ecological niches.

There are several limitations in this study. First, the ACE2/TMPRSS2-expressing cells might be under-detected due to intrinsic feature of single-cell sequencing. Second, there may exist inter-individual heterogeneity of each species. Third, differences in innate and adaptive immune responses could putatively affect viral replication, assembly, and/or release85. Therefore, the results of this study should be interpreted cautiously and the exact host range for SARS-CoV-2 should be determined by experimental confirmation.


Ethics statement

All experimental procedures and sample collection protocols were performed with the approval of the Institutional Review Board on Ethics Committee of BGI (NO. BGI-IRB A20008). All sampling procedures strictly followed the “Guidelines on the Ethical Treatment of Experimental Animals” established by the Ministry of Science and Technology, China.

Sample collection

This study was fully endorsed and approved by Institutional Review Board on Ethics Committee of BGI. All sampling activities were conducted in collaboration with technical staff. All the samples were stored carefully and used for research purposes only. We declare that this study complied with the Convention on Biological Diversity and the Convention on Trade in Endangered Species of Wild Fauna and Flora. Tissue samples were obtained from 11 animals, including: (1) four pet species: cat (Felis catus, domestic short-haired cat), dog (Canis lupus familiaris, German Shepherd), hamster (Mesocricetus auratus, golden hamster), and lizard (Anolis carolinensis, green anoles); (2) two livestock species: goat (Capra aegagrus hircus, domestic goat) and rabbit (Oryctolagus cuniculus domesticus, domestic rabbit); (3) two poultry species: duck (Anas platyrhynchos domesticus, domestic duck) and pigeon (Columba livia domestica, domestic pigeon); and (4) three wild animal species: tiger (Panthera tigris altaica, Siberian tiger), pangolin (Manis javanica, Sunda pangolin), and deer (Cervus nippon, Sika deer). The pangolin samples were collected from an individual that died of natural causes in Guangdong Provincial Wildlife Rescue Center, China, and was immediately stored in a −80 °C freezer after dissection. The tiger samples were obtained from an individual that died of natural causes in the Siberian Tiger Park in Heilongjiang Province, China. Tiger samples were immediately stored in a −80 °C freezer after dissection. Deer samples and corresponding genome assembly were kindly provided by the Institute of Special Animal and Plant Sciences (ISAPS) of the Chinese Academy of Agricultural Sciences. Other animals were obtained from markets with permission from the BGI Ethics Committee. Animals were kept in a pathogen-free environment and provided with sufficient living space and adequate food and water. After animal euthanasia in accordance with the animal experiment guidelines issued by the Chinese Ministry of Science and Technology, dissection was carried out to separate each tissue. The collected tissues were rinsed using 1× phosphate-buffered saline, then quick-frozen, and stored in liquid nitrogen. Nuclei of each tissue were separated by mechanical extraction. Briefly, the tissues were first thawed, infiltrated by 1× homogenization buffer (containing 30 mM CaCl2, 18 mM Mg(Ac)2, 60 mM Tris-HCl pH 7.8, 320 mM sucrose, 0.1% NP-40, 0.1 mM EDTA, and 0.2 U/µl RNase inhibitor), and cut into smaller pieces, with the single nucleus then isolated by 2 ml of Dounce homogenizer. After filtration with a 30 µm strainer, the nuclear extraction was resuspended in 1% bovine serum albumin (BSA) containing 0.2 U/µl RNase inhibitor and centrifuged at 500 × g for 10 min at 4 °C to discard cellular impurities within the supernatant. This step was repeated twice, after which the nuclei were recollected with 0.1% BSA containing 0.2 U/µl RNase inhibitor. Subsequently, 4′,6-diamidino-2-phenylindole was used to stain the nuclei and nucleus density was calculated under a fluorescence microscope for subsequent library construction.

Single-nucleus library construction and sequencing

The separated single nuclei of different tissues (e.g., lungs of cat, tiger, pangolin, dog, hamster, deer, rabbit, goat, lizard, duck, and pigeon; cat kidney, liver, heart, eyelid, esophagus, and rectum; tiger kidney, liver, spleen, and heart; pangolin kidney, liver, spleen, heart, esophagus, stomach, duodenum, and large intestine) underwent library construction using a Chromium Single-cell 3’ GEM, Library & Gel Bead Kit v3 (PN-1000075) following the guidelines provided by the manufacturer. After conversion using the MGI Easy Universal DNA Library Preparation Reagent Kit, the libraries were sequenced using a compatible BGISEQ-500 sequencing platform.

Cross-species homolog gene conversion

Gene models were downloaded from the National Center for Biotechnology Information (NCBI) database (Supplementary Data 1). To facilitate integration of the cross-species single-cell lung data sets, we converted genes from other species to the mouse homologs. First, we downloaded the homologs of eight species (cat, tiger, dog, hamster, goat, rabbit, lizard, and duck) and the mouse using BioMart86. For the pangolin, pigeon, and deer, which lack homolog records on Ensemble. Single-copy orthologs were identified from two species genomes by cluster analysis of gene families using OrthoFinder87 v2.3.3 with default parameters. Single-copy genes were extracted from the OrthoFinder output file. If a 1 : 1 match existed between a non-mouse and mouse gene, the non-mouse gene name was converted to the mouse gene name.

Single-cell transcriptomic data processing

Sequencing data were filtered using a custom script and the gene expression matrix was obtained using Cell Ranger v3.0.2 (10× Genomics). The genomes used for read alignment were downloaded from the NCBI Assembly (Supplementary Data 1). Single-cell analysis was conducted using Seurat33,34. Briefly, quality control was performed based on the following criteria: cells with mapped number of genes <200 or with mitochondrial percentage higher than 10% were removed. Variable genes were determined using Seurat’s FindVariableGenes function with default parameters (selection.method = “vst”, nfeatures = 2000). Clusters were identified via the FindClusters function (resolution = 1) in Seurat using principal components with a P-value < 0.01 and subsequently visualized using the RunTSNE function (reduction = “pca”). All DEGs for each Seurat object were identified using the FindAllMarkers function (only.pos = T, min.pct = 0.1, logfc.threshold = 0.25).

Collection of cell type markers

Cell type markers for each cell type were collected from previously published literature30,88,89,90,91 and the CellMarker database92.

Cell type identity inference

In general, lung cells were annotated based on the overall transcriptomic similarity with the reference data set and were combined with canonic cell type markers. Each lung atlas data set from the 11 species was integrated with the reference lung atlas (generated using 10× Genomics for human, pig, mouse, and rat)30 to infer their putative identity. Briefly, data sets of lungs from different species were integrated with the reference data set using the Seurat FindIntegrationAnchors and IntegrateData functions with features after homolog conversion33,34. Subsequently, cells in each cluster were marked according to their origin (from custom or reference data). The cell type identity tag of the reference cells was obtained from the metadata of the reference lung data set and the proportion of each reference cell type (AT1 cells, AT2 cells, ciliated cells, secretory cells, fibroblasts, endothelial cells, macrophages, T cells, and B cells) in every cluster was calculated. The cell type contributing to the highest proportion of the corresponding cluster was adopted as the potential cell type of that cluster. Cell type identity was further confirmed according to the expression of canonical cell type markers. Regarding other tissues (heart, liver, spleen, and kidney), cells were annotated according to canonical markers.

GO term enrichment analysis

The gene list converted to mouse homologs was subjected to GO analysis using the hypergeometric test implemented in the clusterProfiler93 package. GO enrichment level was evaluated by adjusted P-values and multiple test adjustment was conducted using the Benjamini–Hochberg (BH) method.

Screening of SARS-CoV-2 target cells

A cell was considered a SARS-CoV-2 target cell if ACE2 and TMPRSS2 expression levels were detected (unique molecular identifier, UMI > 0) simultaneously. The percentage of ACE2&TMPRSS2-positive cells was calculated for each cell type.

Pearson correlation coefficients

Pearson correlation analysis was conducted using the corr.test function in the psych package with the parameter: method = “pearson”. Multiple test adjustment was conducted using the p.adjust function with parameter: method = “BH”.

Screening of viral receptors in cat cells

Viruses that may infect cats were collected from previously published literature41,45,46,53 and reviewed by virology experts. A receptor list of viruses was downloaded from the viralReceptor database ( First, the percentage of receptor-positive cells was calculated for each cell type, resulting in a receptor percentage matrix (receptors in columns, cell type in rows). We next performed principal component analysis on all cell types using the receptor percentage matrix. Pearson’s correlation coefficients between receptors and all protein-coding genes were calculated, and the top 100 correlated genes of each receptor were subjected to GO term enrichment analysis.

Statistics analysis for ACE2 and TMPRSS2 co-expression enrichment

To test the significance of SARS-CoV-2 putative target cells in cat compared to other species, we randomly sampled 100 times of ciliated cells in each species, with 50% cells selected each time. After calculation of ACE2 and TMPRSS2 co-expression ratio in each sampling, Wilcoxon’s rank sum test was conducted using the 100 co-expression ratios between cat and that of the other 10 species respectively. Similarly, we tested the significance of ACE2 and TMPRSS2 co-expression in ciliated cells, compared to macrophages.

Cellular communication analysis

Ligands and receptors were downloaded from the FANTOM5 database94, a widely used database for inferring connectome in various species30,95,96,97, and included all literature-supported mouse ligands and receptors. Connectome networks were constructed according to the expression of ligands and receptors and both degree and centrality were calculated based on interactions passing different percentage thresholds using methods described by Raredon30. Nodal degree, hub authority, and centrality were calculated using the R package igraph98. The threshold of cognate ligand-receptor pair expression was greater than 5% of cells in source and target cell types. We next applied network centrality analysis and mode dominate analysis to connectomes using the CompareCentrality functions in the R package Connectome ( CircosPlot function was used to visualize mammalian-specific conserved interactions.

Identification of pan-conserved and mammalian-specific conserved cellular interactions

A signaling axis was defined as a combination of the following four components: source cells, ligands, receptors, target cells. If a signaling axis was present in all nine species investigated (cat, tiger, pangolin, dog, hamster, deer, goat, lizard, and duck), then it was considered as a pan-conserved cellular interaction. In addition, if a signaling axis was present in all seven mammalian species (cat, tiger, pangolin, dog, hamster, deer, and goat) but absent in the lizard and duck, it was defined as a mammalian-specific conserved cellular interaction.

TF–target interaction inference

The TF list for mouse (Mus musculus) was downloaded from the animalTFDB3.099. TFs and other genes were filtered out if expressed in <5% of corresponding cell types. Only protein-coding genes were maintained for subsequent analysis. Briefly, GENIE3100 was employed to predict putative regulatory circuits, with TF–target interactions with a weight value ≥0.01 retained for further regulatory network construction. Regulomes of representative TFs were visualized using Cytoscape and the R package igraph98,101.

Conservation analysis of TF–target pairs

The total frequency of each TF–target pair in a specific cell type in the nine species was counted to evaluate its conservation level. If a given TF–target pair was present in at least two of the same cell types in the nine species, it was regarded as “conserved.” If the count was ≥4, then the TF–target pair was considered as “highly conserved.”

Identification of putative cell type regulators

Cell type regulators were identified as follows: (1) TF–target interactions needed to be present in a specific cell type in at least two species. (2) TFs in conserved TF interactions (in at least two species) in each cell type were ranked in descending order according to the number of targets under their regulation. (3) Targets for each of the top 30 TFs in each cell type were subjected to GO term enrichment. (4) TFs with targets enriched in GO terms closely associated with expected biological processes of corresponding cell types were considered as putative regulators. (5) Extensive literature mining was performed for the putative regulators identified in the last step. If any report supported a clear link between the investigated TF and corresponding cell type, then it was considered as a known regulator. Otherwise, it was annotated as a putative novel regulator.

Conserved core expression programs identification

A gene was considered as expressed in a species if it was expressed in at least one cell type (UMI ≥ 1) in the lung atlas of that species. We intersected the expressed gene list of the nine species and obtained a set of 5442 commonly expressed genes for subsequent analysis. We first calculated the average gene expression of each cell type among the nine species. To obtain the percentile values of genes in a certain cell type of a particular species, we sorted genes according to their average expression levels from low to high, and then divided the gene ranking by the total number of genes. Using the same strategy, we generated a percentile matrix for each cell type of all nine species. Hierarchical clustering was applied to the percentile matrices in the AT1 and AT2 cells in all nine species. The clustered genes were ranked by average percentile values, and the top two clusters were considered the core gene expression programs for the AT1 and AT2 cells.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.