Introduction

The rapidly emerging field of extracellular vesicles (EVs) has led to paradigm shifts in many different areas of biology and biomedicine. The release of EVs, originally thought to only act to remove harmful substances from cells, has been shown to have many more functional consequences and a wide range of implications for biomedicine. To understand the structure and function of EVs, the initial biochemical targeted approaches rapidly progressed to bias-free large-scale analyses using systems biology and bioinformatics. In 2009, the first manually curated database of EV proteins, RNA and lipids, ExoCarta1 (http://www.exocarta.org/), was launched. It was followed by two additional databases including Vesiclepedia2,3 (http://www.microvesicles.org/) and EVpedia4,5 (http://student4.postech.ac.kr/evpedia2_xe/xe/). These are repositories of RNA, protein, lipid, and metabolite datasets. Given that preanalytical parameters may play important roles in the quality of EV preparations, database entries should be interpreted with caution, and special attention has to be paid to preanalytical conditions. Recently, gene ontology has been extended to the context of EV communication, owing to increased recognition of the importance of the EV field6. Furthermore, bioinformatic tools that can be used to analyze EV datasets have become available7,8. Future directions may include the following: (i) systems biology analyses after more standardized EV preanalytics, (ii) multiomics analyses of EV samples (combinations of different -omic groups used for the analysis), and (iii) the determination of disease-specific EV molecular patterns/networks composed of different molecule types. Additionally, systems biology approaches may be extended to novel fields such as image-based systems biology.

Advancements in the analysis of complex biological systems such as EVs will help to reveal the biological significance of these recently discovered structures and exploit their diagnostic and/or therapeutic potential.

EV proteomics

To date, the best characterized EV cargo is EV-associated protein molecules. Proteomics analysis of EVs has been made available on MS-based technological platforms. Proteomic analyses of EVs have been reviewed extensively elsewhere9,10 and are not the focus of the present article. Of note, thousands of proteins have been identified in various EV subtypes, and disease-specific proteome alterations have also been identified11,12,13,14. The potential for EV proteins to be used as monitoring tools for disease progression has also been successfully studied15. In addition, unconventional membrane protein orientation has been described in EVs16. The topology of various EV-associated proteins remains a very important hot topic because it influences target cell recognition by different EV subtypes and the signal transduction pathways induced by EVs.

EV transcriptomics

A plethora of studies confirmed the feasibility of using high-throughput transcriptomic methods for EVs (such as microarrays and next-generation sequencing; see Table 1)17,18,19, and these approaches have been used successfully to characterize the healthy circulating20,21, urine20,22, cerebrospinal fluid23, or saliva24,25 EV RNA cargo. The first study exploring the physiological miRNA pattern of circulating EVs was published in 200826. In the following years, the heterogeneity of circulating EV transcriptional landscapes was analyzed and revealed the presence of many different RNA types, including tRNA, miRNA, Y-RNA, mRNA, SRP-RNA, rRNA, lncRNA, piRNA, snRNA, snoRNA, and scaRNA17,20,21. In vitro studies further suggested that various types of RNA molecules identified in EVs were specifically shuttled into EV subsets27. A reference dataset for miRNA profiling in whole blood, peripheral blood cells, serum, and EVs was also published28. EV transcriptomics is particularly useful in the study of complex diseases because it assists in the identification of novel biomarkers (Table 1). The biomarker potential of EVs has been highlighted by high-throughput studies; however, the analysis of a single subtype of EVs29 instead of bulk EV ‘omics’ analyses may yield more targeted results and suggest novel therapeutic strategies.

Table 1 Transcriptomics of pathological condition-derived EVs

EV metabolomics

Metabolomics involves the simultaneous detection and analysis of a large number of small molecules (<2000 Da) from biological samples30. The relatively low sensitivity of NMR to detect metabolites in EV samples (which are usually available in low amounts) does not allow detailed analysis of the EV metabolome. However, with advances in the available methodological platforms (e.g., Ultra-Performance Liquid Chromatography-Mass Spectrometry, UPLC MS), numerous studies have performed detailed analyses of the EV metabolome31,32,33,34. Interestingly, EVs have been shown to function as independent metabolic units35 and to modify the metabolome of their body fluid environment36,37 or to induce metabolic changes in recipient cells38.

EV lipidomics

EV lipidomics (see Table 2) is a relatively new field mainly because the amount of an EV sample is usually very limited, and novel techniques with increased sensitivity have only recently become available to EV researchers. In the twentieth century and in the first decade of the twenty-first century, thin layer chromatography (TLC) was widely used, and it was essentially the only technique available that enabled the study of the lipid composition of EV membranes. TLC is an easy and straightforward method and does not require expensive equipment. However, the data collected in TLC experiments are very limited. Only a few lipid forms (main classes) can be separated with the help of external lipid standards. Since 2004, the application of different liquid chromatography technologies have been reported. The sensitivity and reproducibility of these experiments were significantly improved compared to those of the TLC methods, but the number of detectable lipid species was still very limited. Revolutionary development began in the early 2010s with MS-based methods, when real EV lipidomics began. The different MS-based techniques made it possible to determine the different acyl chains of membrane lipids (not just the major lipid types based on the head groups). The number of complex lipidomic studies started to increase significantly in 2016, and an exponential growth of the field is expected to come in the next few years.

Table 2 Lipidomic analyses of EVs

EV glycomics

Glycomics in general show a relative backlog compared to other omic fields, such as genomics or proteomics (see Table 3). This is possibly due to the complexity of carbohydrate structures and the lack of sensitive and simple high-throughput methods for glycan analysis that caused a significant delay in the development of glycomics. For glycosylation analyses of EVs, lectin-based microarrays, and high resolution MS analyses have been used, and these approaches provide evidence of EV-specific glycosylation patterns.

Table 3 Glycomic technologies used for EV analysis

EV genomics

Some of the EVs carry DNA that may range in size from 100 base pairs to several kilobase pairs39 or even fragments up to 2 million base pairs long40. EV-associated DNA may be single-stranded DNA, mitochondrial DNA, or double-stranded DNA39,41,42. The DNA content associated with EVs (termed EV-DNA) may be transported within the lumen of EVs39,40; however, recent studies have shown that, depending on the biological context, EV-DNA can also be found attached to the outer surface of EVs43,44,45,46.

Several studies have shown that EV-DNA spans sequences across all chromosomes of genomic DNA (gDNA)39,40,47. Sequences of mitochondrial DNA (mtDNA) may or may not be present depending on the context and/or cell line39. Other studies have shown that selective sorting of specific DNA sequences may occur. For example, a study investigating different prostate cancer cell-derived EV subpopulations showed that different EVs carried different gDNA contents48. Another study that investigated the EVs of healthy individuals provided evidence of an uneven representation of the human genome and even detected EV-DNA of bacterial origin46. Nevertheless, very little is currently known about the mechanisms of DNA packaging or selective sorting of DNA into EVs.

At present, the functional significance of EV-DNA is largely unknown. A recent study has shown that surface-bound EV-DNA plays a significant role in the binding of EVs to fibronectin45, an extracellular matrix glycoprotein that is of vital importance in processes associated with tumor progression49. Generally, surface-bound molecules are responsible for the binding of EVs to target cells or to the extracellular matrix50. Therefore, it is likely that exofacial EV-DNA may have some physiological significance for the recipient cells. Additionally, it has been shown that oncogenes can be transferred from donor to recipient cells; however, contradictory results have been reported regarding whether cancer cell-derived EV-DNA is functional in the recipient cells. In a study, the EV-mediated spread of oncogenes was shown to promote disease progression in mice51. Another study showed that EVs containing oncogenic H-ras failed to produce a permanent tumorigenic conversion of primary and immortalized fibroblasts52.

Several studies have shown that EV-DNA reflects the parental cell gDNA both qualitatively39,47,53,54,55,56,57,58 and quantitatively40,42. Therefore, the analysis of circulating EV-DNA may have substantial diagnostic potential. Moreover, the analysis of genomic mutations may prove to be superior to the analysis of the RNA transported by EVs, as DNA is intrinsically more stable than RNA.

Systems biology approaches show relationships between genes involved in EV biogenesis and diseases

Finally, it is possible to gain information about the role of EVs through a systems biology analysis of public transcriptomic and genomic data, as well as different types of biomedical data. Our goal was to determine the relationships between key genes involved in EV biogenesis and diseases using systems biology approaches. We investigated whether a selected group of the proteins from among those reported to play a role in the biogenesis or secretion of EVs were associated with phenotypes and were enriched in publicly available transcriptomic databases.

Based on the literature, without a claim of completeness, we have compiled lists of proteins that have been reported to play roles in the biogenesis (see Tables S1 and S3) and secretion of EVs (see Table S2). We defined five partly overlapping gene sets from among these lists, namely, genes involved in EV biogenesis and secretion, EV biogenesis, exosome biogenesis, microvesicle biogenesis, and exosome secretion. We used these sets as inputs for the different analyses. Of note, the term “exosomes” refers here to small (50–150 nm in diameter) EVs that originate from the multivesicular body, whereas the designation “microvesicles” is used for EVs shed from the plasma membrane that are usually of medium size (100–1000 nm in diameter).

The Quantitative Semantic Fusion (QSF) System59 is an extensible framework that incorporates distinct annotated semantic types (also called entities) and links between them by integrating different data sources from the Linked Open Data world. The QSF System then enables the users to quantitatively prioritize a freely chosen entity based on evidence propagated from any other entity or possibly multiple entities through the connecting links (see Figure S1). Currently, the system contains genes, taxa, diseases, phenotypes, disease categories (UMLS semantic types and MeSH disease classes), pathways, substances, assays, cell lines, and the targets of the compounds. Links define associations between entities. For example, genes and pathways are connected with a link that represents gene-pathway associations. To enable cross-species information fusion, we also added gene orthologue links.

The most important gene-disease associations identified in this research are from the DisGeNet60 database. This database integrates many other sources of information (e.g., OMIM, GWAS Catalog, OrphaNet, Mouse Genome Database, and Rat Genome Database).

We constructed three different computation graphs that were used to detect known and predicted disease and phenotype associations (see Fig. 1). All three models can be used to answer the question of whether the genes involved in the biogenesis or secretion of EVs are functionally altered (for example, due to significant polymorphisms, mutations, or changes in the gene expression or the amount of protein produced), and, if so, which diseases are associated with these changes. This can also elucidate the pathomechanisms underlying the association between diseases or phenotypes and EVs.

Fig. 1: Three different models used for prioritizing the associations of key EV genes with diseases.
figure 1

Top: A model that prioritizes diseases and phenotypes based on gene-disease associations known in the literature. Middle: This model predicts the associated diseases and phenotypes using molecular pathway associations. Bottom: This model predicts the associated diseases and phenotypes using orthologue molecular pathway associations in other species

The first model, based on gene-disease associations known in the literature (based on the data sources in the DisGeNet database), prioritizes the diseases and related phenotypes that can be linked to important genes relevant to EVs. In the second model, molecular pathway associations were used to expand the range of genes to include disease-associated genes that are in the same molecular pathways as the genes that are important for EVs. In the third model, we used the molecular pathway information from different species to predict the diseases associated with human genes the orthologues of which in other species are in the same molecular pathways as the orthologues of the human genes important for EVs.

We used the QSF System to quantitatively prioritize diseases and phenotypes that are associated with the five gene sets of genes known to be involved in the biogenesis and/or secretion of different types of EVs. First, we used a model that exploited the gene-disease associations already known in the literature. The top 20 diseases that are associated with genes that are involved either in the biogenesis or the secretion of EVs are shown in Table 4. The top 20 phenotypes are shown in Table S4. EV biogenesis genes are significantly associated with several diseases, including several tumors, such as mammary neoplasms (microvesicle biogenesis: p = 0.03; Exosome secretion: p < 0.01) and melanoma (microvesicle biogenesis: p = 0.02); pathologic functions, such as neoplasm invasiveness (EV biogenesis and secretion: p < 0.01) and neoplasm metastasis (EV biogenesis: p = 0.03); and cardiovascular diseases, such as myocardial reperfusion injury (microvesicle biogenesis: p < 0.01). The most relevant phenotypes include frontotemporal dementia (exosome biogenesis: p < 0.01), lack of insight (exosome biogenesis: p < 0.01), and autoimmune neutropenia (exosome secretion: p = 0.01).

Table 4 Diseases associated with different sets of key EV genes based on gene-disease associations known in the literature

Pathway-mediated analysis (i.e., determining which diseases are associated with genes that participate in the same pathway as EV biogenesis genes) indicated possible associations of EVs with many common diseases (see Table S5 and Table S6), such as diabetes (microvesicle biogenesis: p < 0.01), Alzheimer’s disease (EV biogenesis and secretion: p < 0.01), and obesity (microvesicle biogenesis: p = 0.02). Cross-species pathway-mediated analysis indicated the possible association of EVs with several tumors (see Tables S7 and S8), such as mouth neoplasms (exosome secretion: p < 0.01) and tongue neoplasms (exosome secretion: p < 0.01) and several other diseases and conditions.

Next, we downloaded and reanalyzed five large publicly available microarray data sets from the Gene Expression Omnibus (GEO) that represent various diseases (accessions: GSE13576, GSE6919, GSE4115, GSE54514, and GSE43696). Then, we computed the enrichment of the five key EV gene sets and all KEGG pathways in the various contrasts of the differential expression analyses. The statistical analyses were performed in R statistical language61. We used the limma62 and EGSEA63 packages for the microarray and enrichment analysis, respectively.

The Ensemble of Gene Set Enrichment Analysis (EGSEA) utilizes and combines the analysis results of many prominent gene set enrichment algorithms to calculate the collective significance score for a given gene set in the generally long lists of genes that arise from a differential expression analysis.

We reanalyzed five publicly available gene expression experiments using contrasts defined by the authors of these experiments, and then we computed the enrichment of the five key EV gene lists using the EGSEA method based on these contrasts (i.e., gene expression signatures relevant for a specific biological process).

The key EV gene sets were statistically significantly enriched in many of the analyzed contrasts (see Table 5).

Table 5 Enrichment of different sets of key EV genes in various gene expression experiments

Meyer et al. investigated the engraftment properties and impact on outcomes of 50 pediatric acute lymphoblastic leukemia samples transplanted into NOD/SCID mice64. They found that the time to the development of leukemia (i.e., weeks from transplant to overt leukemia) was strongly associated with the risk of early relapse. We found that the differentially expressed genes between the no relapse and the early relapse groups were significantly enriched for key EV genes as well.

Yu et al. performed a comprehensive gene expression analysis on 152 human samples, including prostate cancer tissues, prostate tissues adjacent to tumor, and organ donor prostate tissues, obtained from men of various ages65. The differentially expressed genes between the nonmetastatic tumor samples and the metastatic tumor samples were significantly enriched for all key EV gene sets.

Spira et al. compared gene expression data from smokers with lung cancer with samples from smokers without lung cancer66. This allowed them to generate a diagnostic gene expression profile that could distinguish between the two classes. We found that all EV gene sets were significantly enriched in the gene expression profile comparing smokers with and without lung cancer.

Parnell et al. performed gene expression profiling of whole blood to monitor immune dysfunction in critically ill septic patients67. We found that all gene expression signatures comparing healthy controls with sepsis survivors, healthy controls with nonsurvivors, and nonsurvivors with survivors were significantly enriched for EV genes.

Voraphani et al. compared the gene expression profiles of airway epithelial and bronchoalveolar lavage cells of healthy controls, mild-moderate asthmatic patients, and severe refractory asthmatic patients, respectively68. We found no enrichment in the different gene expression signatures.

Genes that have been reported to participate in the biogenesis or secretion of EVs are significantly associated with numerous common diseases, including different types of tumors and cardiovascular diseases, which further emphasizes the key role of EVs in human health and disease.