We live in a microbial world. We always have, and our mucosal immune system has co-evolved with it, availing nutrients and critically important molecules that are products of microbial metabolism as well as recognizing a multitude of cues from these microbes that inhabit us. These cues are critical to our own development. Alterations in the microbial composition and/or microbial function in our mucosae, commonly referred to as dysbiosis, leads to compromised mucosal function, altered immune responses, and local and/or distal damage. We have known these facts for several decades now, and they can be convincingly defended on the basis of the knowledge gained from exploring the microbial world using culture-based methods. But more recently, our understanding of the microbial world within us has been revolutionized by the use of culture-independent techniques that have allowed us to characterize both the presence of microbes and their function based on the identification of microbial products (mainly DNA or RNA). Equipped with such armamentarium, we have largely expanded our understanding of the role different microbes play in areas where microbial colonization is well-documented, such as the gut and upper airways.

The story however, is different in the lower airways. While the presence of respiratory pathogens is well-documented in the lower airways of patients suffering from various respiratory diseases, the lungs have been considered to be sterile for the longest time. More recently, the use of culture-independent techniques has allowed us to harness a complex and dynamic microbial community structure in the lower airways, where pathogens do not exist in isolation but rather, are immersed in complex polymicrobial communities. Furthermore, the use of multi-omic approaches can also meticulously characterize the complexity of microbial-host interactions at mucosal sites.

In this review, we will focus on how this approach has advanced our understanding of the microbial environment and its effects on the host immune tone in the lungs. Most notably, we acknowledge that there are myriad ways in which such multi-omic approaches can be utilized. Rather than stipulating a definitive set of guidelines about “best” practices, this review will attempt to provide “conceptual” guidance in a rapidly evolving field that allows plenty of room for optimization and that should be approached with an open technical and analytical mind.

Glossary of lung “omic” nomenclature

Here we will try to summarize the meaning and specific aspects of different omic approaches as they pertain to the study of the lung environment. Figure 1 illustrates the challenges encountered with different omic technologies that are currently in use for lower airway microbiome studies. We make an illustration of both sequencing-based bioinformatic tools (16 S ribosomal RNA (rRNA) gene amplicon sequencing, 18 S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics and host transcriptomics) and mass spectrometry-based tools (metabolomics and proteomics); highlighting technology-specific pros and cons. On the right in the figure, we show several key challenges that are commonly shared in varying degrees, between the different omic approaches when studying the lower airways.

Fig. 1: Pros and cons of multi -omic approaches applied to lower airway studies.
figure 1

This figure summarizes the challenges encountered with different omic technologies that are currently in use for lower airway microbiome studies. Here we make an illustration of both sequencing-based bioinformatic tools (16 S ribosomal RNA (rRNA) gene amplicon sequencing, 18 S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics and host transcriptomics) and mass spectrometry-based tools (metabolomics and proteomics); highlighting technology-specific pros and cons. On the right in the figure, we show several key challenges that are commonly shared in varying degrees between the different omic approaches when studying the lower airways.

“Lung microbial burden

There are a few methods where targeted measurement of a conserved region of a microbial gene can be utilized for measurement of total microbial burden. Examples include qPCR or digital droplet PCR targeting the 16 S rRNA gene for total bacterial load and similar PCR methods for the 18 S rRNA gene for total fungal load. These methods offer quantitative data and have been very important in identifying how an increase in bacterial load can be associated with increased lower airway inflammation and an overall poorer prognosis in lung diseases such as asthma, pulmonary fibrosis and critically ill patients with SARS-CoV-2 infection1,2,3,4,5,6.

Lung microbiota

This commonly refers to the taxonomic composition of the microbial communities in the lung, frequently explored using short read sequencing that targets variable regions of the 16 S rRNA gene. The use of this methodology has led to the recognition that microbial products are found in the lower airways of healthy individuals, challenging a preconception of lung sterility that has dominated the scientific literature for a long time. Early studies involving healthy or diseased individuals used 16 S rRNA gene sequencing approaches to taxonomically profile the lungs and identified that there was frequent presence of DNA belonging to oral commensals7,8,9,10,11. However, these earlier investigations uncovered the challenging reality of study samples with a low microbial biomass: the constant intrusion of background noise in the data. This is predominantly driven by the presence of microbial DNA reads originating from two main potential sources: (a) DNA contamination from kits and reagents (true contamination)12; and (b) sequencing noise13. The first one can be distinguished by its inverse relationship between the relative taxonomic abundance of contaminant and microbial DNA load while the second one can be characterized by its stochastic nature (randomness across technical replicates). In both cases, the lower the microbial biomass of the sample, the greater the chances of background noise; a situation commonly present when studying the lungs. Moreover, the premise of background contamination applies in varying degrees to all microbial omic assessments of the lower airways and should be considered in every analytical design. Overall, targeted microbial gene sequencing can be considered a cost-effective approach to taxonomically characterize the lung microbiome. However, it lacks strain resolution and microbial function can only be inferred using analytical pipelines that assume the presence of certain genes based on taxonomic composition (e.g., Phylogenetic Investigation of Communities by Reconstruction of Unobserved States, or “PICRUSt”)14. Here, it is important to mention that a few studies have compared the results of inferred microbial genomic potential with measured microbial genomics (done by shotgun sequencing) but good correlation between the two datasets could not be demonstrated, raising concerns for the assumptions inherent in this approach15.

Lung mycobiome

This term refers to the targeted sequencing of fungi and can be done by sequencing amplicons of the 18 S rRNA gene or internal transcribed spacer (ITS). Similar to observations made with 16S rRNA, fungi are frequently present in the lower airways, and although overall community composition of the lung mycobiome is distinct from the upper airways, micro-aspiration seems to be the main source of fungi in the lower airways16. Challenges in the study of the mycobiome include difficulties with DNA isolation and other limitations inherent to different targeted sequences and less refined reference libraries17.

Lung metagenome

This refers to the assessment of the composition of all microbial genes present in the lung environment, usually done through whole genome sequencing (WGS). This method allows us to characterize microbes with a much more precise strain resolution than targeted sequencing and also describes the genomic potential, including potential microbial metabolic functions and antibiotic resistance genes (resistome)18. There is a paucity of studies that have used this approach to evaluate the lung microbiome due to significant challenges such as insufficient capture of microbial annotated reads. Since there is no amplification of microbial signals prior to sequencing, these methods are also challenged by a lack of sequencing depth required to capture a reasonable number of microbial reads to fully profile the lower airway microbiome. Thus, it is not uncommon to see lung microbiome studies that use these techniques aiming to sequence at high depth (e.g., 100 million reads per sample) with the hope of obtaining information from a small percentage of the data that can be assigned to microbial origin. For example, studying the lung microbiome of patients infected with SARS-CoV-2 with metagenomics and metatranscriptomics yielded 1.2% and 0.6% of the total reads assigned to microbes (bacteria, fungi or viruses), respectively1. One major advantage of these approaches is that they are not restricted to one kingdom, thus allowing for the exploration of inter-kingdom relationships. For example, studies of the lung metagenome have simultaneously explored the presence of bacteria, fungi and viruses in patients with sarcoidosis19 and SARS-CoV-2 infection1. Ultimately, such multi-kingdom assessments are often challenging due to discrepancies in reference databases for several microbial components, particularly those of viruses and fungi.

Lung metatranscriptome

This can be done through RNA sequencing that provides taxonomic and microbial genomic function of microbes with actual active transcription. Because cell-free RNA is rapidly degraded, this approach can reveal information about viable bacteria in the lower airways. Indeed, studies comparing microbial profiling using metatranscriptome, metagenome and 16S rRNA gene sequencing in the lungs have suggested that the former provides a more accurate representation of viable bacteria in the lower airways by identifying those that actively produce bacterial products such as short chain fatty acids15. Very few studies have utilized this approach to study the lower airways probably because of similar challenges as those encountered in metagenome studies, that is further exacerbated by the instability of RNA.

Lung metabolome and proteome

This refers to the study of metabolites and proteins and is commonly performed through nuclear magnetic resonance or mass spectrometry (MRS and MS, respectively). Both metabolomic and proteomic approaches can differ vastly and have ‘less targeted’ and ‘more targeted’ approaches that dictate the range of molecules detected. Both technologies have immense untapped potential and can add invaluable insight into disease pathogenesis, biomarker discovery, “deep” endotyping and response to therapy in various heterogeneous lung disease states20. Several investigations using lower airway samples have described distinct metabolic signatures in multiple pulmonary conditions such as pulmonary infections21, HIV22, acute respiratory distress syndrome23 and chronic obstructive lung disease24. While these methods frequently reveal distinct features across different disease states, data interpretation is often difficult due to a lack of standardized measurements in the lower airway samples and the inability to distinguish the relative contributions of microbial and host metabolism to the measured value of the metabolite or protein in question.

A detailed review of such methods pertaining to the lung environment was recently published by the American Thoracic Society25. Briefly, strategies to improve standardization include plate-based platforms and the use of internal standards and quality control pools to overcome batch effects26,27. Major challenges include, but are not limited to data processing and visualization, as no one platform can capture the entire metabolome25. As metabolomic datasets become more and more complicated, tools for biological pathway/network mapping and enrichment analysis of large datasets are being developed; ConceptMetab, Metscape and metabolite set enrichment analysis are examples, to name a few28,29,30.

Lung host transcriptomics

This is the study of mammalian RNA in the lower airways performed either through targeted approaches (e.g., microarrays, nanostring) or untargeted approaches (e.g., RNA sequencing). A common challenge for studying the lung transcriptome relates to difficulties in obtaining high-quality RNA, especially in samples that are contaminated with saliva due to high levels of salivary RNAse31,32. Another caveat that warrants special consideration is the cellular composition of the sample obtained given significant topographical variations (e.g., airway brushings vs. bronchoalveolar lavage fluid (BALF)) and innate sample variations (e.g., neutrophilic-predominant BALF vs. macrophage-predominant BALF).

Novel insights about microbial-host interface from lung multi-omic studies

Here we will discuss studies that exemplify how multi-omic approaches can be used to delineate the microbial-host interface in both health and disease. We will mainly focus on studies that have used lower airway samples. It is important to mention that most lower airway microbiome investigations have focused on studying samples from patients with different disease processes. Due to the invasive nature of sampling, there is a paucity of investigations that have been conducted on healthy volunteers3,8,33, and this is a shortcoming that may limit our understanding of abnormal microbial host interactions.

Figure 2 illustrates how various multi-omic studies have helped to unravel the mechanisms underlying pathological conditions such as micro-aspiration, lung transplantation, idiopathic pulmonary fibrosis (IPF), lung cancer and chronic airway diseases. Key discoveries that have been made using multi-omics on lower airway samples are described here.

Fig. 2: Approach to multi-omic analyses in lower airways.
figure 2

Here we exemplify how various multi-omic technologies have helped to unravel the mechanisms underlying pathological conditions such as micro-aspiration, lung transplantation, idiopathic pulmonary fibrosis (IPF), lung cancer and chronic airway diseases. The dysbiotic signals are on the top (luminal side) of the mucosa and the underlying host immune endotype are depicted below the mucosa. References for the studies combining multi-omic datasets in each of these conditions are also displayed.

Multi-omic analyses to uncover lung microbial functions

While the preconception of sterility of the lower airways has been debunked since the identification of microbial signals through culture-independent methods, there is still much debate about whether these microbial signals are representative of true live bacteria or are remnants of cleared microbes. Using a combination of lower airway samples to perform 16S rRNA gene sequencing, WGS and RNA metatranscriptome sequencing paired with short chain fatty acid (SCFA) metabolomics, investigators have found that there is a subset of individuals that have viable and metabolically active microbes that produce microbial products with significant immunomodulatory properties such as SCFAs15. Further, evaluation of the metabolome in BALF from HIV patients revealed specific metabolites such as glycerophospholipid, fatty acids and lineolate that were associated with certain lower airway microbiota that may confer a greater risk for pneumonia22. These studies demonstrate how we can start recognizing the presence of viable and metabolically active microbes in the lower airways, perhaps as a proof-of-concept to begin with.

Multi-omic analyses to uncover biomarkers in lung disease

The lack of reliable biomarkers in chronic lung diseases acutely highlights the need for large data integration. Multi-omic-derived biomarkers can also help risk-stratify patients via the integration of genomic, proteomic, transcriptomic and microbiomic datasets34. Combining multiple omic datasets may generate biomarkers with a high accuracy and positive predictive value. Across several lung diseases, prior studies applying multi-omics have investigated whether diagnostic and prognostic biomarkers can be derived from the lower airways. For example, in patients with lower respiratory tract infections, Langelier et al.,35 utilized a combined metagenome, metatranscriptome and host transcriptome approach to identify distinct omic specific features in lower airway samples associated with respiratory tract infections.

Role of multi-omic studies at the interface of microbial-host immunity in the characterization of chronic lung diseases

Lower airway dysbiosis is involved in several chronic airway diseases such as asthma, COPD and cystic fibrosis. When specific “potentially pathogenic bacteria” colonize the airways, the total burden and composition of the microbiota changes and this can trigger the activation of inflammatory pathways such as the Th17- neutrophilic pathway36. The Th17 pathway and lower airway dysbiosis have a dysregulatory effect on the development of asthma, the severity of asthma and the corticosteroid responsiveness of asthma. This association is particularly strong in the case of neutrophilic asthma, which is characteristically more resistant to steroids36. Studies comparing the composition of the airway microbiome between steroid-resistant and steroid-sensitive patients have identified differences at the genus level, with gram-negative lipopolysaccharide- producing bacteria having high-endotoxic activity enriched in the BAL of those with steroid-resistant asthma37. Combining 16S rRNA gene sequencing and microarray data, investigators have found that in severe asthma, enrichment with Proteobacteria in airway brushings is correlated with the upregulation of Th17-associated genes, whereas Actinobacteria correlated with the bronchial epithelial gene expression of FK506 binding protein, a marker of steroid responsiveness38. In another study comparing atopic asthma subjects with non-atopic controls, the diagnosis of asthma was associated with differences in the predicted bacterial functions involving amino acid and short-chain fatty acid metabolism, suggesting that alterations in the levels of bacterial metabolites with immunomodulatory effects such as SCFAs can affect the clinical phenotype6. Among patients with cystic fibrosis, there is often an abundance of SCFA-producing anaerobic bacterial species present in the lower airways39. These SCFAs promote the release of various cytokines and chemokines and their concentration positively correlates with sputum neutrophil counts39.

The dynamics of the lung microbiome in COPD is constantly in flux, particularly during periods of exacerbation when dramatic changes have been noted in both bacterial strain and species40. Excessive and persistent inflammation is a major driver of pathogenesis in COPD and a decline in the richness and diversity of the respiratory microbiome is known to be associated with increased immune cell infiltration41. In one of the earlier studies in this field, Sethi et al.,42 demonstrated that COPD subjects colonized with potentially pathogenic bacteria in the distal airways had greater neutrophil counts and increased concentration of chemokines and matrix metalloproteinases. Using 16S rRNA gene sequencing and metatranscriptome sequencing of BALF samples, three distinct microbial compositions were shown to significantly correlate with lymphocyte proportion, human Th17 immune response and COPD exacerbation frequency43. In another study using advanced stage emphysematous lung tissue samples, a decline in microbial diversity was found to be associated with emphysematous destruction, remodeling and CD4 + T cell infiltration and specific OTU’s were associated with neutrophils, eosinophils and B cell infiltration44. In a secondary analysis of BALF microbiome and metabolomic data from patients with early stage COPD (GOLD 0-2), lower lung function and severity of symptoms were positively associated with enrichment with oral commensals, Streptococcus, Neisseria and Veillonella, and with several metabolites, including glycosphingolipids, glycerophospholipids, polyamines and xanthines45. These observations suggest that the microbiome and its active functional elements play a significant role in immune homeostasis in chronic airway diseases. It is surmisable that a lower airway dysbiotic signature stimulates airway cells and triggers Th-17 responses, higher chemokine levels and recruitment of neutrophils into the airway lumen. A better understanding of the cross-talk between the microbiome and host immune compartments is relatively untapped territory and future discoveries could have huge implications for optimizing existing treatments in Asthma and COPD.

Studies have also clearly linked dysbiosis with fibrotic lung diseases such as IPF46. In a seminal study from the UK, Molyneaux et al.2 evaluated BALF and peripheral blood gene expression in IPF patients and matched healthy controls. Over-expression of one gene module was associated with death and disease progression as well as increased BALF bacterial burden. This particular module was also enriched with genes associated with host defense, over-expression of which was associated with worse survival. In the COMET-IPF study, evaluation of BALF from 60 IPF patients showed that relative inhibition of 11 gene signaling pathways was associated with reduced progression free survival time; 8 pathways involved pathogen infection and 3 involved innate immune response receptors47. Greater relative abundance of a Streptococcal OTU correlated negatively with poor progression free survival. The composition of the lower airway microbiome in patient with IPF was associated with higher levels of pro-fibrotic cytokines48 and pro-inflammatory cytokines such as IL-17 and TNF-α49. Such cytokines can both damage airway epithelial cells and trigger the expression of α- smooth muscle actin gene. A high bacterial burden in IPF patients with poorer prognosis is especially higher among patients with a minor allele at the promoter of the mucin gene50,51. In a study using BALF from 68 patients with IPF and a germ-free murine model of pulmonary fibrosis, the loss of microbial diversity correlated with IPF symptoms and spirometric measurements, high serum surfactant protein-D and lactate dehydrogenase levels and elevated pro-inflammatory cytokine levels. These findings portray the mechanistic impact of lower airway dysbiosis on the pathogenesis of this disease52. Taken together, these data support a possible link between the lung microbiome, an aberrant host response and disease progression in IPF, although further elucidation of the exact directionality of these associations need more experimental investigations.

Multiomic approaches have also identified lower airway microbial signals of possible importance in lung transplantation. In a study that combined microbiome and host transcriptome from lower airway samples following lung transplantation, investigators showed that the host transcriptome in the allograft presented distinct remodeling profiles, characterized by the expression of genes differentially involved in matrix synthesis (anabolic) or matrix degradation (catabolic)53. While catabolic remodeling aligned with a microbiota dominated by Staphylococcus, Pseudomonas and Haemophilus, anabolic remodeling was linked to the relative abundance of Prevotella, Streptococcus and Veillonella. A study that combined amplicon sequencing, bacterial culturing and gene expression assessment to characterize 234 longitudinal BALF samples from 64 lung transplant recipients established links between viral loads, host gene expression, lung function and clinical stability54. What the investigators called a ‘balanced’ pneumotype, characterized by a diverse bacterial community, was associated with an immune tolerant host gene signature. The other three pneumotypes, characterized by depletion of taxa from the “balanced” pneumotypes and/or dominated by potential pathogens, were linked to increased immune activity, lower respiratory function and increased risk of infection and rejection. These studies suggest a complex host-microbe interplay affecting inflammatory and remodeling processes in the transplanted lung”.

Multi-omic approaches may reveal a microbial-dependent mechanism of the anti-inflammatory effects of macrolides in lung diseases

Multiple studies have suggested that macrolides have direct anti-inflammatory effects in various airway diseases such as cystic fibrosis, asthma and COPD. Additionally, whenever the effect of macrolides on microbial communities has been evaluated, investigators have found that even among patients with mild COPD without overt infection with a respiratory pathogen, there are substantial macrolide-driven effects on the microbial composition. In a placebo-controlled trial evaluating the microbiome and metabolome in the lower airways, macrolides led to increased levels of microbially-derived anti-inflammatory metabolites, probably due to an antibiotic-driven pressure on the lower airway microbes55. This increase in anti-inflammatory metabolites also impacted the inflammatory tone of lower airway macrophages. Chronic azithromycin therapy decreased the frequency of exacerbations in COPD, potentially mediated by a reduction in dysbiosis via selective pressure on the lung microbiota56. In another study, patients with asthma who displayed improved bronchial reactivity after 6 weeks of macrolide treatment had higher baseline bacterial diversity, thus implicating the role of resident microbiota in modulating the outcome of therapeutic interventions57. Therefore, it is possible that multi-omics can uncover the mechanisms via which existing therapies rely upon the lower airway microbiome for providing therapeutic effects.

Integrative microbiomics can help identify exacerbation risk clusters in bronchiectasis

Studies have shown minimal change in dominant taxa during exacerbation of bronchiectasis, thereby questioning the simplistic model in which a single kingdom overgrowth is held culpable58. Using an integrated multi-biome analysis of the bronchiectasis airway profiles from patients, Mac Aogain et al.,59 showed that patients at highest risk of exacerbations have a multi-biome dominated by antagonistic interactions between microbial kingdoms in which microbes compete rather than cooperate with one another. The study used a subject-to-subject similarity matrix with predictors that depended on the number of subjects rather than molecular features59, and therefore had superior cluster precision and accuracy in the highly heterogeneous omic datasets of the bronchiectasis cohorts60. Although lung function and disease severity were similar between the clusters, the frequent exacerbators had lower alpha diversity. This multi-kingdom interactome provided new and previously unrecognized targets for antimicrobial therapy that could be used as an adjunct to or in combination with more established antibiotic-based regimens. The interactome approach may also be used to monitor the outcome of therapy and to understand the effects of host directed therapy.

Multi-omic approaches have uncovered mechanisms of microbial host interactions in lung cancer

Multiple investigations have looked at whether lung cancer is associated with either differences in lung microbial communities or host immune defects. The use of multi-omics has allowed for the co-evaluation of these two features by facilitating comparisons between patients with malignant and benign neoplasms and between lung cancer patients at different stages. For example, one study demonstrated an association between lung cancer diagnosis and enrichment of the lower airways with oral commensals. Concurrently, there was an upregulation of host transcripts related to inflammatory pathways (or oncogenesis) such as PI3K, Kras and the Th17 cascade61. As proof-of-concept, exposure of Kras-mutated epithelial cell lines to similar dysbiotic signals led to an upregulation of similar transcriptomic processes. In the case of studies that evaluated patients with lung cancer, patients with lower airway microbiota enriched with oral commensals were more likely to present with advanced-stage lung cancer and had a poorer prognosis62. Parallel evaluation of the host transcriptome identified that this dysbiotic signature was associated with an upregulation of the Th17 inflammatory pathway and checkpoint inhibition62. Extension of these observations with a mouse preclinical lung cancer model showed that lung dysbiosis played a significant role on tumor progression through the Th17 pathway62. These studies exemplify how the application of multi-omic “agnostic” approaches can be used to dissect important associations that can then be explored mechanistically through ex vivo or in vivo experimental models.

How can big data can be integrated to uncover microbial-host interactions in the lung?

The previous section highlighted some inherent challenges with the various omic approaches available to study the lower airway environment. All omic-centered approaches share some technical challenges such as the risk of contamination, topographical differences and having to measure a large number of variables for a relatively small number of samples. Despite these challenges, omic approaches have blossomed over the last several years and have consistently demonstrated that the lung environment is highly variable between individuals. The next big leap in the omics multiverse would require an integrated model that can investigate the organization and behavior of all the different layers of the system. The integration of disparate datasets such as microbiomics, genomics, proteomics, metabolomics, metatranscriptomics and metagenomics, even though computationally challenging, makes this possible. The overwhelming question is therefore this: how do we unleash the power of combining -omic approaches to shed light on our understanding of the interaction of microbes with the host in the lung? Whenever using multiple omics (over and above the specific challenges outlined above), investigators need to be cognizant of some common pitfalls that will be encountered during the design of such studies. The research field is still in its relative infancy and lacks methodological standardization across studies. Here, we discuss some recurrent concerns and potential solutions for frequently encountered issues when combining different omic approaches. In Fig. 3, we illustrate our conceptualization of how the comparison of data from different omics technologies can be used to understand the lung microbiome. First, we consider three main conceptual reasons for the integration of multiple omic methods; summarized via concepts/ approaches that we have called “redundancy”, “relatedness” and “associations”. For example, under “redundancy”, integration of omics helps to assess consistency of data whilst the lack of consistent signals can help identify platform-specific background contaminants. Under “relatedness”, combining different omic methods can uncover differences between the genomic potential vs. transcription of genes, thus revealing functional activation (or lack thereof). As an example, high gene transcription may be recognized as higher relative abundance levels for  a particular gene in metatranscriptomics rather than in metagenomics. The “integration” of different omics can help find meaningful networks via which microbes and host interact; this can help uncover novel associations and mechanisms. Finally, it is important to remember that the confirmation of causality requires experimental studies rather than correlative investigations. Having said that, evaluation of the strength and significance of correlative data might be the first step to identify the most promising signals. In order to use multi-omics to generate new knowledge, there are several challenges that should be considered and are discussed below.

  1. (a)

    Should different omic approaches be used in topographically distinct samples?

    Topographical variation in microbes and host signals are well-documented. Differences in sample processing across various omic approaches require specific considerations such as how to process and aliquot fresh samples, how to maximize the yield of human cells and how to account for the potential impact of various preservation media on microbial signatures. Data derived from culture-independent methods have established that micro-aspiration leads to episodic seeding of oral commensals in the lower airways, but topographical differences exist even within each individual. When investigating the lower airways of healthy individuals with BALs, enrichment of the lower airway microbiota with oral commensals such as Prevotella, Streptococcus, Fusobacterium, Rothia and Veillonella is associated with subclinical inflammation and a lower airway inflammatory tone with a Th17 endotype33. However, topographical microbial differences could affect such assessments and warrants careful planning. Also, it is unclear if the inflammatory signal is due to viable and metabolically active bacteria, non-viable bacteria or byproducts of bacterial metabolism63. Differences in topographical environmental conditions, such as oxygen tension, pH and mucus, may affect microbial viability and metabolic activity and is therefore another important variable to consider.

  2. (b)

    Is the data generated within each omic method quantitative, semiquantitative or compositional?

    Quite commonly, the nature of the data across different omic datasets varies. For example, the sequencing data for microbial composition (16S rRNA gene, WGS or metatranscriptome sequencing) is semiquantitative and compositional (no absolute concentrations but rather relative abundance), as well as sparse (frequent null values for many of the taxa assessed). In contrast, metabolomic data is not compositional and can be quantitative when internally labeled standards are added. Integrating these different kinds of datasets into a single analytical pipeline may lead to false assumptions about the behavior of the data and needs to be carefully considered.

  3. (c)

    Are the statistical methods used for analyses similar across different omics?

    Some of the multi-omic methods rely on integrating only statistically significant data from the different omic datasets. However, statistical significance may lack uniformity of criteria across various omic methods. In other words, because each omics output can be considered unique in terms of sparsity, data distribution, compositionality, depth, number of features, lack of updated databases; these omics datasets are often not interrogated using the exact same approaches. Thus, even though only statistically significant data is pooled from different omics, the end result of the merged dataset is dependent on how each data was processed prior to analysis and data integration. Wherever possible, multi-omics research warrants special considerations about how to deploy consistent analytical approaches across different omic datasets prior to integration of the data.

  4. (d)

    How do we choose the most appropriate method required for an integrative analysis that combines two or more omic datasets?

Fig. 3: Inter-relatedness of various -omic approaches.
figure 3

In this figure, we illustrate three different concepts/ approaches to gain knowledge when combining multiple omic data: “redundancy”, “relatedness” and “associations”. a Integration of omics helps to address “redundancy” by identifying consistency of data. In particular, replication of findings increases the likelihood of accuracy and decreases that of noise. b The “relatedness” of the different omic methods has the potential to uncover differences related to function. As an example, high rates of gene transcription may be recognized as higher relative abundance in metatranscriptomics rather than in metagenomics for that specific gene, indicating possible functional activation. c Finally, the integration of different omics can help discover the molecular mechanisms via which microbes affect the host immune response and vice versa.

The array of multi-omic analytical methods is growing at a pace that is difficult to keep track of. Overall, these methods are all based on correlative analyses between different components of each omic dataset, and although they are all individually valuable, they can be even more useful if integrated appropriately and applied to the dynamic/temporal aspects of disease pathogenesis. Different analytical approaches are built into each multiomic, such as dimensionality reduction methods, correlation-based approaches, regression-based approaches and network-based approaches. In-depth knowledge of each these analytical approaches is needed in order to apply them correctly. Practical examples of the use of such multi-omic methods in lung studies include SparCC33, Mummichog22, ComPLS61, Mixomics45,64, similarity network fusion59,60,65, and MMVEC62. Importantly, differences between these methods rely on how each individual omic dataset is treated and which statistical manipulation and visualization strategies are used. Whenever integrating multi-omic datasets, it is important to not just obtain expert informatic support but also to consider the real difficulties in establishing a statistically robust analytical approach. One such challenge comes from recognizing that some of what is seen as a statistically significant association may be driven by “outliers”66. Another major challenge is the variation in scale, complexity and correlation structure between datasets that are inherently different65. It is therefore paramount that investigators do not simply focus on statistical strength but also consider applying different methods for the integration of disparate datasets and testing the robustness of signals across them. Bayesian data interpretation based on prior knowledge and post-hoc experimental investigations seeking to corroborate and extend associations that have been previously identified are key to selecting “the right” analytical approach.

Future directions

Current multi-omic approaches that are deployed to study cross-sectional data are based on statistical associations and are unable to consider the plausibility of associations. In other words, the statistical significance and strength of the associations between ‘markers/endophenotypes’ within two different omic datasets often entail the simultaneous consideration of multiple associations. Given the high dimensionality of such datasets, very few associations pass beyond a certain level of statistical stringency.

However, we do have prior knowledge about potential associations between different components of these omic datasets from epidemiological or immunological studies. For example, we know about the microbes and microbial genes that contribute to the production of pathogen associated molecular patterns (PAMPs). We also know about the microbial genes associated with the production/degradation of metabolites that have immunoregulatory properties and the signals that they may elicit or perpetuate in the host immune response. Thus, there is a need for a “Bayesian” integrative analytical approach for multi-omic datasets. Multi-omic approaches may uncover statistically significant associations but directionality and causality need to be explored with other experimental approaches.

As the field develops, we will be better able to characterize the endophenotypes of various chronic lung conditions and this will inevitably lead to new avenues for their diagnosis, management and prevention. However, firstly, multi-omics need to move from cross-sectional descriptive profiling to longitudinal studies during disease progression and treatment. Secondly, we need a better understanding of the effects of different treatments on the microbiome. While probiotics, prebiotics and mixed bacterial transfers (such as fecal material transfer) have enormous promise as new therapies that can modify the gut microbiome, these have not been studied in any significant depth in the lung. In particular, the combination of metabolomics with other transcriptomics can specifically help us identify the interaction between the microbial metabolites and the host immune system and therefore offer the opportunity for a more personalized approach to diagnosis and treatment. Finally, multi-omics can help generate hypotheses for more focused mechanistic studies to help understand disease pathophysiology at a molecular level.


Advances in techniques and bioinformatics have enabled high throughput and in-depth analyses of transcripts, proteins and metabolites and enormously expanded our understanding of the human microbiome and the role it plays in our well-being. Chronic lung disease is a consequence of multiple layers of regulation and dysregulation, many of which are driven or affected by microbes present in the lungs. Large omic studies are able to expeditiously identify the stages that are important in any given disease process. The advent of multi-omics has made possible the generation of data sets that can integrate a broad range of molecular, cellular, pathophysiological and clinical states and thereby shed light on the hitherto unknown relationships between the lung microbiome, host endotype and phenotype. As our approaches evolve it is likely that multi-omics will enter a new phase of clinical application where targeting the lung microbiome will be possible and personalized.