Functional pathways in respiratory tract microbiome separate COVID-19 from community-acquired pneumonia patients

In response to the global pandemic of the last four months, some progress has been made in understanding the molecular-level host interactions of the new coronavirus SARS-CoV-2 responsible for COVID-19. However, when the virus enters the body it interacts not only with the host but also with the micro-organisms already inhabiting the host. Understanding the virus-hostmicrobiome interactions can yield additional insights into the biological processes perturbed by the viral invasion. We carry out a comparative functional analysis of bronchoalveolar lavage fluid of eight COVID-19, twenty-five community-acquired pneumonia (CAP) patients and twenty healthy controls. The resulting functional profiles clearly separate the cohorts, even more sharply than just their corresponding taxonomic profiles. We also detect distinct pathway signatures in the respiratory tract microbiome that consistently distinguish COVID-19 patients from both the CAP and healthy cohorts. These include increased vitamin, drug, nucleotide, and energy metabolism during SARS-CoV-2 infection, contrasted with decreased amino acid and carbohydrate metabolism. This comparative analysis indicates consistent differences in COVID-19 respiratory tract metatranscriptomes compared to CAP and healthy samples.


Introduction
An impressive number of scientific studies have rapidly been published on the genomics and molecular-level host interactions of the new coronavirus SARS-Cov-2 1 of reported bat origin, 2 responsible for the COVID-19 disease pandemic. Additionally, understanding changes in the microenvironment within the host can yield insights into related biological processes perturbed by the virus. The respiratory tract microbiome community composition during pathogen infections has been studied previously, along with its predictivity of clinical outcomes and associated potential probiotic interventions. [3][4][5][6][7] Studies profiling the respiratory microbiome community in COVID-19 patients are also beginning to appear. 1,[8][9][10] To better understand the role of the microbiome in COVID-19, we introduce a functional analysis of recently published metatranscriptomes 8 from bronchoalveolar lavage fluid (BALF) of COVID-19 patients, healthy subjects, and communityacquired pneumonia (CAP) cases. While Shen et al. 8 focused on the SARS-Cov-2 genomes and taxonomic profiling of the microbiomes, here we perform functional profiling to characterize active biological processes.
The sequenced reads were classified with PRROMenade 11 using a vast amino acid sequence collection of 12 million bacterial and viral protein domains from the IBM Functional Genomics Platform, 12 annotated with KEGG enzyme codes from a corresponding functional hierarchy. 13 Post-processing and robust rank-based RoDEO 14 projection of the annotations onto a unified scale made the resulting profiles comparable. The overall analysis workflow is depicted in Fig. 1A.
The functional profiling separated samples by cohort, and the more abundant enzymes codes in COVID-19 mapped to vitamin, drug, nucleotide, and energy metabolism pathways, while the less abundant ones mapped mostly to amino acid and carbohydrate pathways. In related literature, similarly altered pathways were indicated in the lower respiratory tract microbiomes of mice infected with the influenza virus. 15 The results from this robust comparative analysis highlight consistent differences in microbial pathways in COVID-19 compared to community-acquired pneumonia and healthy control samples.

Microbiome functional profiles cluster by cohort
The functional profiles were first visualized with Krona 16 to examine their average distribution per cohort (Fig. 1B, see Supplemental File 1 for interactive version including all samples). While there are certain differences detectable in these coarsely averaged profiles, a subsequent robust comparative analysis reveals specific functions that are consistently altered between cohorts.
Multi-dimensional scaling of pairwise Spearman distances between samples clearly separates the cohorts, with just a few co-locating healthy control and CAP samples (Fig. 1C). A significant difference in functional profiles was observed between the COVID-19, CAP, and healthy control groups according to PERMANOVA test (p ≤ 0.0001). Functional profiling separated the groups more sharply than taxonomic profiling (R 2 = 0.12 from functional profiling vs. R 2 = 0.07 from taxonomic profiling by Shen et al 8 ). Nearly identical results were also obtained when utilizing only the bacterial subset of the database for read matching; the separation is not merely due to COVID-19 sample reads matching SARS-CoV-2 protein domains (Supplemental Fig. S1).

Differentially abundant functions distinguish COVID-19 samples
The sequencing data had varying total number of reads and human content per sample (Supplemental Fig. S2). Therefore we used RoDEO 14 to project the samples onto a robust comparable scale. To examine the features that differentiate COVID-19 from others, we extracted the top-ranked features from the COVID vs. CAP and COVID vs. healthy control comparisons, and considered the union of top 30 features from each comparison, resulting in 51 EC numbers. Their clustering shows three main feature clusters that correspond to the sample clusters of i) 20 healthy controls (with 8 CAP samples included), ii) 15 CAP, and iii) 8 COVID-19 (with 2 CAP samples included) ( Fig. 2A, three outlined rectangles).
The CAP patient samples were collected from different hospital sources, yet these samples cluster together and separate from the COVID-19 patient samples. While the features were selected as those differentiating COVID-19 from CAP and healthy controls, they also tend to separate CAP from the controls. Although two CAP samples cluster here with COVID-19, note that the CAP samples were collected prior to the current pandemic and represent distinctly different pneumonia cases.
The bottom five COVID-19 samples in Fig. 2A

Altered lung microbiome pathways indicated in COVID-19
We further checked the 51 top differential functions shown in Fig. 2A for extract those whose average abundance changed in a consistent direction in the COVID-19 cohort compared to both the CAP and the healthy cohort. The average difference between COVID-19 vs. CAP and COVID-19 vs. healthy controls was visualized against their corresponding pathways from the KEGG pathway mapping (Fig. 2B).
Functions with increased relative abundance are linked to energy metabolism (methane, nitrogen, sulfur), metabolism of cofactors and vitamins (B6, folate, lipoic acid) and purine -related to one-carbon metabolism, nucleotide metabolism (purine, pyrimidine), and drug metabolism. Several functions with decreased relative abundance are related to amino acid and carbohydrate metabolism. Decreased amino acid and carbohydrate metabolism with increased metabolism of cofactors and vitamins and nucleotide metabolism were also indicated among the changes in lower respiratory tract microbiomes at the acute phase (day 7) of mice infected with influenza virus, with increased energy metabolism in the early days of infection (day 3). 15

Discussion
Analyzing COVID-19 respiratory tract metatranscriptomes from a functional perspective offers an additional view into their differences from healthy controls and subjects with community-acquired pneumonia (CAP). Querying translated reads against amino acid sequences allows for more flexibility in sequence matching, as synonymous mutations will not affect the matching. This also supports the detection of functions performed by unknown organisms yet to be characterized. 17 The resulting functional profiles clearly distinguish COVID-19 from CAP and healthy controls, even more so than the original taxonomy-based analysis of the community members. 8 From this profiling, we detected consistent differences in COVID-19 samples compared to both the healthy control and CAP samples.
Features with increased relative abundance mapped to metabolism of vitamins, drugs, purine, pyrimidine, and energy metabolism. Several features with decreased relative abundance were related to amino acid and carbohydrate metabolism. Similar observations were recently made in a comprehensive study of microbiomes in mice infected with the influenza virus. 15 With limited clinical data available for these patients it is unclear if administered therapies had an effect on the detected microbiome differences and the detected subclusters of COVID-19 samples. The comparative approach taken here carefully avoids possible experimental variation and resulting biases within individual samples, and focuses on detecting robust and consistent differences between cohorts.

2/6
These preliminary observations call for further in-depth analysis of microbial functions and host-microbiome interactions from studies coupling sequencing and clinical data, as viewing them with the lens of functional annotation can yield additional insights into the altered biological processes.

Sequence data and functional database
The recently published bronchoalveolar lavage fluid (BALF) metatranscriptomic sequencing data of 8 COVID-19 patients, 20 healthy controls, and 25 samples of community-acquired pneumonia (CAP) were obtained from the National Genomics Data Center. 8,18 Pre-processing was performed as described by Utro et al. 11 including Trim Galore 19 trimming and bowtie2 20 human read filtering, resulting in 40k to 32M input reads for microbial functional annotation per sample (Supplemental Fig. S2). Human reads were already filtered by Shen et al. 8 , still we removed additional reads that mapped to human with the bowtie 2 local alignment mode.
The KEGG Enzyme Nomenclature code (EC) reference hierarchy 13 was used as the functional annotation tree. EC numbers define a four-level hierarchy. For example, 1.5.1.3. = "Dihydrofolate reductase" is a fourth (leaf) level code linked to top level code 1 = "Oxidoreductases", via 1.5. = "Acting on the CH-NH group of donors" and 1.5.1 = "With NAD+ or NADP+ as acceptor".
A PRROMenade 11 database was constructed from a total of 11.88M bacterial and 51k viral annotated protein domain sequences, obtained from the IBM Functional Genomics Platform 12 (formerly known as OMXWare). The virus domain sequences (including from SARS-Cov-2) were added to the previously described bacterial sequence collection, 11 totaling 3.72 billion amino acids.

Functional annotation and downstream analysis
Metatranscriptomic sequencing reads were annotated with PRROMenade by locating the maximal length exact match for each read and processed as described by Utro et al. 11 Minimum match length cutoff of 9 AA (27 nt) was employed. Classified read counts (13k to 5.0M per sample, see Supplemental Fig. S2-S3) were post-processed to summarized the counts at the leaf level of the functional hierarchy. Leaf nodes contributing ≥ 0.1% of total annotated reads in at least one sample were retained, 595 nodes. Functional profiles were processed with RoDEO 14 for robust comparability (parameters P=10, I=100, R = 10 7 ). A two-sample Kolmogorov-Smirnov test was applied to identify differentially abundant features between COVID-19 samples and CAP, healthy control samples.