The human gut microbiome is a complex and dynamic ecosystem that performs several essential functions for human metabolism, nutrition, physiology, and immunity. Due to its high density and diversity of microorganisms, this environment is prone to constant genetic exchange between resident and transient bacteria1,2,3,4. Genetic transfers by plasmids in the gut have the potential to impact the ecology and evolution of the gut microbes. The acquisition of a plasmid and its cargo genes by gut bacteria or archaea may assist the microorganism's survival, fitness, and adaptation to the environment4,5,6,7. Some traits provided by plasmids in the gut have been associated with antibiotic (AR) and metal resistance (MR)7,8,9,10, salt tolerance11,12, and interbacterial competition13. Therefore, the set of plasmids in the human gut has the potential to modulate the gut microbiome community and introduce new traits, such as antibiotic resistance and virulence, to both resident and transient gut microbes4,5,6,7,8,9,10,11,12,13,14.

Many studies have assessed the microbial profile present in human gut microbiomes15,16,17,18,19. Although these studies have improved our knowledge on the subject, the set of plasmids in the gut microbiome is still poorly investigated despite its potential to modulate the microbiome physiology and, consequently, human health. Many human life aspects modulate the gut microbiome and its plasmids, including diet, lifestyle, therapy, consumption of pre- and probiotics, and diseases7,15,16,17,18,19,20,21. These aspects have been dramatically transformed during human history, which is evidenced by the current plethora of diets and lifestyles around the globe. The gut microbiome composition and functionality are products of a long coevolutionary host-microbe relationship, and the human gut plasmids also suggestively reflect this coevolution. As an example of the diet impacting mobile elements from the human gut, the seaweed in the Japanese daily diet was proposed to have transferred algal polysaccharide degradation genes to the Japanese microbiome20. The different diets of Fijians and Americans have been linked to the different glycoside hydrolase families encoded by mobile genes in the gut microbiome of these groups6. Moreover, the use of antimicrobials has been associated with the increase in the prevalence of mobile elements carrying functional resistance genes associated with synthetic antibiotics7,8,9,10,22,23,24,25,26.

Most of the microbiome studies have been focusing on urban-industrialized groups, and few have considered semi-isolated human groups15,16,17,18,19,27. However, the study of gut plasmids from semi-isolated groups has the potential to unravel genetic elements related to their unique lifestyle and environment. These groups retain valuable information about the microbiome before urbanization and industrialization impacted human diet and lifestyle, and also clues that can expand the field of prebiotics and probiotics for modern disorders prevention and treatment, as well as biomarkers that can form the basis for health and prognostic disease tests. So here, based on the gut microbiome survey from semi-isolated human groups, we investigated their plasmid content, including the set of accessory genes associated with these mobile elements that can modulate human health and physiology. We analyzed the set of plasmids and its accessory genes from the gut microbiome of four semi-isolated human groups: the Yanomami, the largest indigenous group from the Brazilian Amazon Forest19, the Matses, a remote hunter-gatherer group from the Peruvian Amazon Forest, the Tunapuco, a traditional agricultural community from the Peruvian Andean highlands16, and the Hadza, an indigenous ethnic group from northwestern Tanzania17. The geographical locations of these groups are presented in Fig. 1 and a summary of their diets and lifestyles is presented in Supplementary Note S1.

Figure 1
figure 1

Geographic locations of the four semi-isolated groups. Map generated in R, version 4.0.3. ( with the packages rnaturalearth, brazilmaps, sf and ggplot2.


We identified a total of 290 putative plasmids in the gut microbiome of the Yanomami, Matses, Tunapuco, and Hadza. These putative plasmids were identified based on the nucleotide similarity with sequences of replication initiation protein (rep), mobilization protein (relaxase), mate-pair formation (MPF), and origin of transfer (oriT). The Yanomami was the group with significantly more plasmids identified (n = 184, FDR-corrected Kruskal–Wallis multiple comparisons: P < 0.05), followed by the Hadza (n = 52), the Matses (n = 36), and the Tunapuco (n = 18). The number of plasmids per metagenome varied from zero to 17 in the Yanomami, Matses, and Hadza, but in the Tunapuco it varied from zero to seven plasmids. However, one Yanomami metagenome stands out for harboring 90 plasmids (STable 1).

The plasmids were classified as conjugative, mobilizable, and non-mobilizable based on three plasmid markers: relaxase, MPF, and oriT. Thus, among the 290 plasmids identified in this study, 8 (~ 2.7%) were classified as conjugative, for harboring both relaxase and MPF; 89 (~ 30.6%) as mobilizable, for harboring relaxase and/or oriT without MPF; and the majority (n = 193, ~ 66.5%) as non-mobilizable, for not presenting relaxase and oriT. Considering the plasmid length, most of them (79.6%) is less than 10 Kb long, with 20% having between 10 and 90 Kb, and the longest plasmid has 115 Kb. These are the lengths of the contigs characterized as plasmids and they may not represent the length of the original plasmid. Information regarding each plasmid identified can be found in STable 1.

Moreover, mobilization (MOB) typing was used to classify the relaxase and oriT sequences of the conjugative and mobilizable plasmids into six MOB types. MOBP was the most prevalent type, followed by MOBQ, and MOBF (Fig. 2). Two other MOB families (MOBC and MOBV) were assigned to relaxases from the Hadza and Matses, respectively. MOBH was the only MOB family not identified. Besides, 16 oriT sequences did not match any of the six known MOB families and were classified as unknown. The MPF was also classified into classes and the most prevalent was MPFF, followed by MPFI and MPFT (Fig. 2).

Figure 2
figure 2

Stacked bar chart representing the relative abundance (y-axis) of the MOB families and MPF types of the relaxase, oriT, and MPF identified in the gut plasmidomes of each group. Raw counts are also shown.

We explored the protein's functions encoded by the gut plasmids of the groups. The most abundant protein categories were Signaling/Cellular Processes (28–53%) and Genetic Information Processing (13–23%) (Stables 2, 3). The majority of the proteins associated with Signaling/Cellular Processes are part of Secretion (IV and VI) and Prokaryotic Defense Systems. The majority of proteins categorized as associated with Genetic Information Processing are related to Replication and Repair (Stables 2, 3). Some relevant gene functions identified in the gut plasmids differed by group. Considering the four human groups, functions related to lipopolysaccharide biosynthesis were exclusive to the Hadza, plant-pathogen interaction, immune system, and siderophore biosynthesis were exclusive to the Matses, flagellar assembly and drug and xenobiotics metabolism were exclusive to the Yanomami. Moreover, genes associated with Carbohydrate Metabolism and Other Amino Acids were only identified in the Matses and Yanomami. We also observed that the genes that generate energy metabolize different nutrients according to each group: Sulfur, Nitrogen and Methane in the Hadza, Matses and Yanomami, respectively.

Considering Prokaryotic Defense Systems, we performed focused analyses on identifying antimicrobial and metal resistance, and virulence, which in some cases, can directly impact the host's health. AR and MR genes and virulence factors (VFs) were found in 27 plasmids identified in 19/72 metagenomes (Stable 1). The Hadza was the group in which the gut set of plasmids harbored the highest number of AR, MR, or VFs (n = 25), followed by the Matses (n = 18), the Yanomami (n = 10), and the Tunapuco (n = 4). Considering all AR genes identified, they are associated with six antibiotic classes: aminoglycosides, β-lactams, fluoroquinolones, fosfomycin, tetracyclines, and trimethoprim. The most abundant AR genes found were aph (aminoglycoside resistance), tet (tetracycline resistance), and tem (beta-lactamase). The MR genes are associated with copper, iron, mercury, and multiple metals resistance. Concerning the VFs, one plasmid presents a heat-stable enterotoxin (astA), but the other VFs are mainly yersiniabactin, which besides being a VF, is also associated with iron resistance. The gene ybt, which is a siderophore yersiniabactin, was identified in a cluster of eight ybt genes in one plasmid recovered from one metagenome from the Matses (Stable 1).

Moreover, we also identified 119 transposase sequences in 49 plasmids. Therefore, some plasmids have more than one transposase. The vast majority have one to three transposases, but a plasmid recovered from Tunapuco had seven. The transposases were classified into families and the most prevalent was IS3 (n = 61).

Based on the clustering classification, the plasmids were split into three groups: 1, 2, and 3. Group 1 comprises plasmids that were assigned to plasmid clusters of the MOB-suite program database, therefore known plasmids. While plasmids that exceeded the distance threshold with all references in the database were classified in Groups 2 and 3, and therefore novel plasmids could be part of these groups. The former comprises plasmids shared among the metagenomes and were clustered, and the latter comprises unique plasmids. The plasmids of Group 1 (n = 96) were grouped into 63 clusters, Group 2 (n = 65) into 22 clusters, while the remaining 129 were singletons (Group 3).

To infer the plasmids shared among the studied human groups, we generated a network based on the plasmid clusters characterized by MOB-suite (Fig. 2). The network revealed that 12/85 clusters (Group 1, n = 9; and Group 2, n = 3) are shared among the human groups, mostly between the Hadza and the Yanomami. Moreover, to expand our scenario, we also performed BLAST analysis on the plasmids shared among the groups. In this way, we found that the set of plasmids in the semi-isolated groups’ gut microbiome has a high similarity with plasmids spread worldwide in bacteria from humans, animals, environments, and foods (Table 1, Stable 4). Even Group 2 clusters, potentially comprising novel plasmids, presented similarities with known plasmids. However, the BLAST analysis of the plasmid cluster “R” returned no matches, indicating the presence of novel plasmids in this cluster.

Table1 Mobility prediction and antibiotic resistance genes of the plasmid clusters A-T from the network.

Surprisingly, the plasmids from the clusters “A”, “E”, and “C” (Fig. 3), previously characterized as non-mobilizable (Stable 1), were the clusters identified in a higher number of metagenomes. The cluster “A” was the most prevalent in the metagenomes analyzed, present in the Hadza (n = 3), Matses (n = 3), and Tunapuco (n = 1). The same rep and TolA sequences characterized the plasmids from cluster “A”, including the Hadza plasmids (n = 3), which also harbor a dfrf (dihydrofolate reductase) trimethoprim resistance gene. Besides that, this cluster was inferred by the MOB-suite as potentially hosted by bacteria of the genus Campylobacter. The clusters “E” and “C” harbor known plasmids, with no relevant functions so far assigned. The mobilizable cluster “F'' shared between the Hadza and Yanomami carries beta-lactamase resistance genes, besides some Hadza's plasmids also carry aph (aminoglycoside resistance). Conjugative plasmids carrying AR genes were identified in cluster “M” (fosA gene), cluster “O” (aph and tet genes), and cluster “P” (ctx gene) (Fig. 3).

Figure 3
figure 3

Network based on the plasmid clusters (smaller-sized nodes) harbored and shared between the human groups (bigger-sized nodes). Non-mobilizable plasmids that were identified in only one metagenome were removed from this analysis. Identification of the plasmid clusters A-T is also shown. Network generated in R, version 4.0.3. ( with the packages igraph and ggplot2.

Besides the plasmids shared among the human groups, we also observed plasmids with the potential to be human group-specific as some from Group 2 clusters (n = 8) identified only in metagenomes from Hadza (n = 2) and Yanomami (n = 6). The Yanomami-specific clusters were classified as mobilizable (n = 4), but the others and Hadza's plasmids were non-mobilizable. These “human group-specific” plasmid clusters carry genes associated with Metabolism, mostly related to Glycolysis/Gluconeogenesis, and also genes that encode Transport Proteins and Transposases.


Diet and lifestyle have changed dramatically throughout human evolution, first with the introduction of farming (~ 10,000 years ago), and more recently, with the introduction of industrially processed foods, hygienic conditions, antimicrobials, pollution, and other factors27,28. These transformations in human lifestyle have been widely associated with the selection and modulation of the gut microbes and with a substantial loss of gut bacterial diversity15,16,17,18,19,27. However, this microbial profile restructuration not only involves the microbial taxa but, responding to selective pressures, also the mobilome4. Moreover, modern diseases, such as diabetes, obesity, and allergies, partially driven by the lifestyle, have been associated with the loss and variation of bacterial taxa and mobile elements diversity in the gut microbiome28,29.

In the present study, we analyzed the set of plasmids and their accessory genes from the gut microbiome of four semi-isolated human groups: the Yanomami from the Brazilian Amazon Forest19, the Matses from the Peruvian Amazon Forest, the Tunapuco from the Peruvian Andean highlands16, and the Hadza from northwestern Tanzania17. These datasets are some of the few high-quality shotgun metagenomes available from hunter-gatherers, human groups whose gut microbiomes represent a link between the ancestral and modern human groups. The bacterial composition and functionality of these four human groups have been previously analyzed, and although they have a similar contrasting composition compared to urban-industrialized groups, each of them has particular characteristics related to their different ecological niches, genetics, and diets15,16,17,18,19. The gut plasmids analysis showed a similar pattern, in which specific plasmid clusters and gene functions are unique for each human group, yet they also share significant parts of their set of plasmids. Moreover, a dozen of plasmid clusters are shared among these semi-isolated human groups living in distant and distinct geographical niches. In addition, some of these shared plasmid clusters have also been detected worldwide in humans, animals, environments, and foods. This could indicate that these plasmids are part of our ancestral microbiome and that they remain in the human population to this day, but could also reveal that the global inter-niche exchange of plasmids eventually carried out by modern practices could be reaching distinct semi-isolated human groups and their environments.

In general, plasmids can be classified by their relaxase and oriT, which comprise six MOB families, and MPF, which comprises four families23,30. Based on a database of plasmid markers, the gut plasmids of our study were mostly characterized by the families MOBP, MOBQ, and MOBF. The MOBP, a cluster of actively evolving relaxases, is the most prevalent family in plasmids databases, featuring a considerable proportion of short plasmids30,31. Indeed, most of the plasmids that we identified are short plasmids (< 10 Kb). Moreover, the MOBC and MOBV families, low prevalent families in plasmid databases, were identified in two novel plasmids of the Hadza and Matses. This highlights the yet underexplored scenario that modulates the plasmids and diversity in the gut microbiome30,31. Concerning the quantitative aspect, the Yanomami presented the largest set of plasmids in their gut and the Tunapuco the smallest, likely reflecting their remarkably different diet, lifestyle, and environment. Although both groups inhabit the Amazon Region, the Yanomami’s subsistence is based on gathering and hunting wild plants and animals in the Amazon Forest, the world's largest and most biodiverse tropical rainforest19,32,33. In contrast, Tunapuco’s subsistence is based on local agricultural produce and homegrown small animals located in the Peruvian Andean highlands, at an elevation between 2500 and 3100 m above sea level16,32,33.

The gut microbes usually process undigested carbohydrates, proteins, and fat, producing additional energy from the diet and also metabolites that can impact human physiology34. These metabolic functions, encoded by chromosomes and plasmids of the gut microbes, are selected/modulated according to the components of the human’s host diet7,20,34,35. In these groups, the food seasonality and their eventual nomadic behavior are selection pressures that can drive a variety of changes in the gut throughout the year, including in the set of plasmids. So, to rapidly adapt to the new/seasonal dietary components, specific enzymes must be selected and spread among the gut microbes34,35. This may explain why only the groups that inhabit the Amazonian Forest (Yanomami and Matses) have plasmid accessory genes associated with Carbohydrate and Other Amino Acids Metabolism. These groups live in a hotspot of diversity, therefore they are always coming into contact with various dietary components, which must be metabolized32,33,34,35. Moreover, their niches and consequently, the distinct selective pressures, contribute to the selection of plasmids’ content that participates in the metabolism of molecules from their diet. The functions identified as group-exclusive follow this context. For example, only in the Hadza set of plasmids was observed the lipopolysaccharide (LPS) biosynthesis pathway. This pathway is known for inducing strong inflammatory responses, but contrastingly, it also has anti-inflammatory actions when produced by certain gut bacteria, such as Bacteroidetes. Indeed, the Hadza harbor this phylum in their gut microbiomes15,17. LPS has been proposed to be used in the prevention or treatment of chronic inflammatory diseases that are induced by pro-inflammatory LPS36. The gut accessory genome, represented by the gut plasmids, constitutes an underexplored plethora of genes and functions and an important source of information with potential biotechnological or pharmaceutical value37,38.

The prevalent gene functions that were identified in the gut plasmids of all groups, such as Prokaryotic Defense Systems, Secretion Systems, and Replication and Repair, are associated with core plasmid genes39,40. However, the type VI Secretion Systems (T6SS) is not plasmid ubiquitous, moreover, it is associated with interbacterial competition, bacteria-host interactions, and virulence41,42,43. Therefore, if the T6SS is harbored by a commensal microbe, it can prevent the microbiome disassembly by an invading bacteria. In contrast, if T6SS is carried by an invading bacteria, the community assembly can be disestablished, resulting in dysbiosis41,42,43.

Interestingly, we observed a resistome in the gut plasmids of these human groups that have no or limited exposure to synthetic antibiotics, as well as reported in other isolated or semi-isolated groups/locations7,24,44. Most of these AR genes are associated with β-lactam and tetracycline resistance, which are commonly seen in soil and water environments45. In fact, Hadza and Yanomami gut plasmids harbor accessory genes with the function of biosynthesis of Penicillin and Cephalosporin. Therefore, the AR genes in these semi-isolated groups may have originated from an environment where antibiotic-producing microbes naturally occur, and also may have reached these groups via contact with antibiotic-exposed populations22,24,46,47. Since the Hadza and Matses presented the highest numbers of resistance and virulence genes in their gut plasmids, we speculate that these groups have been more impacted by urban-industrialized lifestyles, in a way that may be favoring the selection and expansion of these genes. A previous study with the Hadza, in fact, showed the existence of antibiotic resistance genes, in this population that has little to no antibiotic exposure, even though they have a lower amount of AR genes in comparison with an urban-industrialized population17.

In the same way, metal resistance genes present in the gut plasmids can reflect bacterial adaptations to environmental conditions. Indeed, the Yanomami niche is likely to be contaminated with cadmium due to the continuous discharge of batteries by them over decades19, however, the natural or artificial presence of metal in the other group’s environment is unknown. Moreover, MR genes can impact the community assembly of the gut microbiomes and consequently affect host health48. Concerning the virulome, there is no prevalence of bacterial virulence-associated factors in the gut plasmids, with the expectation of T6SS. In fact, these factors are not prevalent within environmental bacteria but occur more often in pathogens in clinical settings49. In addition, the plasmids recovered also harbor transposases, mainly the IS3 family, which is widely distributed and one of the largest and most abundant Insertion Sequence families among plasmids50.

The gut plasmids recovered and characterized here represent an achievement in the study of the gut microbiome of semi-isolated groups, providing a snapshot of the forces that modulate the gut microbiome given the unique as well as global scenarios to which these human groups are exposed. Although we have focused only on plasmids from the human gut, other mobile elements, such as bacteriophages, must also be relevant in this context. Beyond that, we are aware that our study has biases and limitations, for example, the plasmid identification that is based on the current plasmid markers in the databases. However, different studies have successfully carried out bioinformatic analysis of sequences and addressed them to plasmids, mainly when the plasmids were in vitro isolated from the strains51,52. The semi-isolated groups may harbor plasmid markers that have not yet been characterized and this may have resulted in the general lack of relaxase, MPF, and oriT sequences identified. As a result, most plasmids were classified as novel and non-mobilizable. Indeed, new MOB families have often been reported30,53,54,55,56,57. Besides, the complexity of metagenomic assembly tends to produce shorter contigs and may have hampered the identification of the whole genetic content of the plasmids58. In addition, the largest number of genes found in the plasmids have no known functionality. Indeed, the set of plasmids from the gut microbiome is a recognized source of novel genes and gene products, so the presence of unknown or novel elements is expected38. However, these results highlight the knowledge gap in this field and in databases, which is often attributed to the difficulties in studying plasmids from complex environments, such as microbiomes40,59.


For this study, we analyzed shotgun metagenomic data from previously published gut microbiome studies of semi-isolated human groups: semi-isolated hunter-gatherer communities from South America and Africa (Yanomami from Brazilian Amazon19, n = 15; Matses from Peruvian Amazon16, n = 24; and Hadza from Tanzania17, n = 27), and a rural agriculturalist community from the Andean highlands in Peru (Tunapuco16, n = 12). These datasets comprise paired-end sequences generated on Illumina sequencers, and bioinformatic processing was performed in parallel. All methods were performed in accordance with the relevant guidelines and regulations. The diet of the hunter-gatherer communities is mainly based on the consumption of highly fibrous tubers and vegetal foods, while the diet of the rural agriculturalist community is based on the local agricultural produce of stem and root tubers, and homegrown small animals.

Each metagenome was independently subjected to de novo assembly through metaSPAdes v.3.1360 using default parameters. For plasmid identification, MOB-recon and MOB-typer modules from the MOB-suite program were used61. These modules identified putative plasmids from the contigs of each metagenome and searched for replication gene (rep), mobilization protein (relaxase), mate-pair formation (MPF), and the origin of transfer (oriT). Based on the presence/absence of these plasmid markers, the plasmids’ mobility was predicted61. Optional tools for the classification of potential circular or linear plasmids were not performed and it is reasonable to expect the recovery of both types of plasmids. To avoid false positives we removed from the analysis the plasmids with no markers identified by the MOB-suite. For characterization, pairwise genomic distances between each plasmid identified and each reference plasmid cluster was calculated. Thus, each plasmid identified was assigned to their closest reference plasmid cluster or as a novel plasmid depending on the predicted genomic distance, using the MOB-suite database as reference61. Following the MOB-suite analysis, the function “dist” of the Mash program62 was used to calculate the pairwise genomic distances between all novel plasmids identified in the groups. If the distance between the novel plasmids was lower than 0.05, they were assigned to the same novel cluster. This distance metric allows the clustering of plasmids with considerable differences in length, which is useful in this case, since plasmids can undergo diverse changes in their sequence content61. Based on the clustering classification, the plasmids were split into three groups. Group 1 comprises plasmids that were assigned to plasmid clusters from the reference database. Group 2 comprises plasmids assigned as novel and clustered into novel clusters. Group 3 comprises unique plasmids, with no significant similarity/coverage to the plasmids in the database or to other plasmids found in the metagenomes. Moreover, only plasmid clusters identified in at least three metagenomes from one same human group were considered with the potential to be group-specific.

We also used FragGeneScan v.1.3163 to predict Open Reading Frames (ORFs) in the contigs assigned as plasmids. Thereafter, Abricate ( was used to screen the ORFs against the MegaRes v.264 and VFDB65 databases. Abricate results were filtered for a minimum DNA identity of 75% and minimum coverage of 50%66. Functional annotation and assignment of KEGG functional categories to the ORFs were performed using the KOfamKOALA tool67, considering a threshold of 0.01. The plasmids were also annotated using Prokka68 for transposase identification.

Network analysis of the plasmids identified in the four groups was performed based on the plasmid clusters characterized by MOB-suite. The network was constructed with the igraph R package ( In this network, the smaller-sized nodes represent the plasmids, which are linked to the bigger-sized nodes that represent the groups that harbor these plasmids. We removed from this analysis non-mobilizable plasmids that were identified in only one metagenome.

In order to investigate which other hosts and niches these plasmids have been identified, we performed a search analysis on each plasmid using the online BLAST tool69 with the nt/nr database. We removed results with similarity and coverage less than 70% and e-value less than 1e-5. To recover metadata information from the accession numbers of the plasmids that matched our putative plasmids, we used a custom python script based on the Entrez module of the Biopython package.