Abstract
Melioidosis is an often-fatal neglected tropical disease caused by an environmental bacterium Burkholderia pseudomallei. However, our understanding of the disease-causing bacterial lineages, their dissemination, and adaptive mechanisms remains limited. To address this, we conduct a comprehensive genomic analysis of 1,391 B. pseudomallei isolates collected from nine hospitals in northeast Thailand between 2015 and 2018, and contemporaneous isolates from neighbouring countries, representing the most densely sampled collection to date. Our study identifies three dominant lineages, each with unique gene sets potentially enhancing bacterial fitness in the environment. We find that recombination drives lineage-specific gene flow. Transcriptome analyses of representative clinical isolates from each dominant lineage reveal increased expression of lineage-specific genes under environmental conditions in two out of three lineages. This underscores the potential importance of environmental persistence for these dominant lineages. The study also highlights the influence of environmental factors such as terrain slope, altitude, and river direction on the geographical dispersal of B. pseudomallei. Collectively, our findings suggest that environmental persistence may play a role in facilitating the spread of B. pseudomallei, and as a prerequisite for exposure and infection, thereby providing useful insights for informing melioidosis prevention and control strategies.
Similar content being viewed by others
Introduction
Melioidosis, a severe infectious disease, affects an estimated 165,000 cases globally each year, of which 89,000 are fatal1. The disease is caused by Burkholderia pseudomallei, a Gram-negative bacillus found in soil and contaminated water across tropical and sub-tropical regions. Historically, limited access to microbiology laboratories for culture-confirmed diagnosis led to underreporting, particularly in lower- and middle-income countries2. However, improved infrastructure and awareness have led to increases in reported cases across South Asia, Southeast Asia, East Asia3,4,5,6,7,8 and Australia9,10 In Southeast Asia, the disease incidence is often linked to agriculture practice, particularly during the rainy seasons when rice paddy fields are flooded for planting11. The flooded terrain enables the bacterium in the soil to surface, potentially exposing farmers to B. pseudomallei and subsequently leading to melioidosis. Additionally, many cases of melioidosis have been associated with severe weather events12,13,14. While climate likely influences human encounters with B. pseudomallei, further investigation is needed to fully understand the mechanisms linking environmental factors to melioidosis epidemiology.
Understanding the population structure, dissemination and adaptation of B. pseudomallei in these climatically-challenged endemic regions requires a large-scale, geographically and chronologically densely-sampled, genetic dataset. Previous studies, albeit limited in sample size, have demonstrated that B. pseudomallei dissemination is driven by both anthropogenic and environmental factors15,16,17,18. Streams19,20, monsoons, typhoons and cyclones13,14,21,22 were identified as significant contributors to bacterial dissemination, highlighting the importance of bacterial persistence across a range of environmental conditions. B. pseudomallei exhibits remarkable survival capabilities across diverse environments, spanning from wet to dry, nutrient-depleted soil23,24,25,26,27,28 thereby enabling the bacterium to thrive in various ecological niches. Previous studies have noted the temporal and geographical co-existence of multiple B. pseudomallei lineages15,16,29. However, little is known about their distinct genetic content and adaptive strategies. Identification of lineage-specific genes associated with bacterial persistence and disease escalation will be essential to develop disease control strategies.
In this study, we conducted a population genomics analysis using combined B. pseudomallei isolates from melioidosis patients across nine provinces in the northeast Thailand including Buriram, Khon Kaen, Mahasarakham, Mukdahan, Nakhon Phanom, Roi Et, Sisaket, Surin and Udon Thani8; totaling 1265 isolates collected from July 2015 to December 2018. Additionally, we incorporated contemporary environmental and clinical collections from Thailand and neighbouring countries29,30,31,32,33,34, consisting of 15 clinical isolates and 111 environmental isolates (Fig. 1a, Supplementary data 1). Our comprehensive analysis, including a total of 1391 isolates, revealed the population structure, dissemination patterns, and genetic diversity of this bacterium. We identified genetic determinants associated with dominant lineages and investigated their biological functions and expression conditions in three representative isolates, each representing a dominant lineage. This provides insights into the strategies employed by B. pseudomallei lineages for successful persistence in the environment, ultimately leading to human exposure and infection.
Results
Population structure analysis identified successful B. pseudomallei lineages and the intermixing of clinical and environmental isolates
To define the population structure of clinical and environmental B. pseudomallei in a hyperendemic area of northeast Thailand and neighbouring regions (n = 1391), we performed four independent approaches. PopPUNK35 analysis was performed on genome assemblies (Supplementary data 2). Additionally, we constructed three maximum-likelihood (ML) phylogenies36, each based on different sets of single nucleotide polymorphisms (SNPs): core genomes (n = 77,156 SNPs), core gene multilocus typing37 (cgMLST, n = 46,945 SNPs), and seven-gene multilocus typing genes38 (MLST, n = 31 SNPs). These approaches facilitated the grouping of isolates with close genetic similarity into distinct lineages. Notably, both PopPUNK analysis and core genome SNPs phylogeny yielded consistent results (Supplementary Fig. 1) clustering the population into three dominant lineages (Fig. 1b). The average pairwise core SNP distance within each dominant lineage was 549, 351, and 517 SNPs for lineage 1, 2, and 3, respectively, in contrast to the average pairwise core SNP distance of 1087 SNPs within the total population (Fig. 1c). This lower pairwise core SNP distance across lineages confirmed the genetic relatedness as defined by PopPUNK and core genome SNP phylogenetic analysis. While cgMLST displayed conservation for two out of three dominant lineages, MLST exhibited inconsistencies across dominant lineages with lower phylogenetic resolution and poorer bootstrap support compared to other methodologies (Supplementary Fig. 1). Consequently, we relied on the population delineated by PopPUNK and core genome SNP phylogeny for subsequent investigations.
The three predominant lineages (denoted as lineage 1 to 3) comprised 312, 297, and 125 isolates, respectively. They accounted for 52.8% of the studied population and persisted throughout the sampling period. Interestingly, each lineage peaked during the wet season, correlating agricultural practices at the onset of rainfall with increased environmental exposure and subsequent melioidosis infections (Fig. 1d). Despite the small sample size of environmental isolates, we observed a clustering of these isolates with clinical isolates within each dominant lineage, indicating their core genetic similarities and shared origin. The ratio of environmental to clinical isolates varied across lineages (Chi-square test with Monte Carlo resampling p-value 5.00 × 10−4, Supplementary Fig. 2). Due to the substantially lower number of environmental isolates used and incomplete geographical distribution matching between clinical and environmental isolates, caution is warranted in interpreting these results. Nevertheless, our findings highlight a mixing of environmental and clinical samples, suggesting that clinical isolates could serve as a surrogate for tracking the dissemination of an environmental bacterium, especially in the absence of equally comprehensive environmental samples.
Genetic evidence identifies patterns of B. pseudomallei dissemination in Northeast Thailand
We examined the dissemination patterns of B. pseudomallei in northeast Thailand at the provincial level. Except for Mahasarakham where samples were limited, all three dominant lineages were present across the rest of the eight studied provinces (Fig. 1a). This prompted us to focus the analysis on these dominant lineages to identify consistent geographical distributions underlying their spread in the region. We generated lineage-specific phylogenies to improve genetic resolution for transmission analysis. Additionally, we reconstructed ancestral histories of provincial origins, quantified the number of inter-provincial transmissions to examine transmission patterns, and estimated the time of the most recent common ancestor of each dominant lineage and its sub-lineages (Supplementary Figs. 3–6, see “Methods”). This allowed us to link the emergence of these lineages to historical events that might have affected their transmission dynamics. While transmission signals could reflect distinct dissemination patterns in each dominant lineage or its sub-lineages, we also considered the possibility of shared factors leading to a uniform geographical distribution. Notably, we observed consistent dissemination patterns in 14 out of 28 provincial pairs across the three dominant lineages (Fig. 2a, Supplementary Fig. 6). Eight out of these 14 provincial pairs showed potential correlation with the slope of terrain altitude between provinces or the natural flow of rivers in the region (Fig. 2b). These pairs included “Udon Thani-to-Khon Kaen”, “Udon Thani-to-Buriram”,, “Udon Thani-to-Surin”, “Udon Thani-to-Mukdahan”, “Khon Kaen-to-Buriram”, “Nakhon Phanom-to-Mukdahan”, “Roi Et-to-Surin” and “Surin-to-Sisaket”. Northeast Thailand is described as a saucer-shaped plateau, with elevations ranging from over 200 meters above sea level in the northwestern corner (parts of Udon-Thani, Khon-Kaen and Buriram) to less than 140 meters in the southeast (parts of Mukdahan and Sisaket), and gradually descending toward the Mekong River in the east39,40. The Mun River originates from elevated hills in central Thailand, streaming eastward through Buriram, Surin, Sisaket before merging with the Mekong River. Similarly, the Chi River, a tributary to the Mun, also originates from the central Thailand mountains. The Chi flows eastward through Khon Kaen, Mahasarakham, Roi Et and converges with the Mun in Sisaket40. These geographical features may, at least in part, explain the observed dissemination patterns.
Additionally, another set of three out of 14 conserved patterns coincided with the wind direction of the northeastern monsoon during the dry season (“Nakhon Phanom-to-Buriram”, “Mukdahan-to-Buriram”, and “Mukdahan-to-Surin”). Thailand experiences two predominant monsoon seasons: the southwest monsoon from approximately May to October and the northeast monsoon from approximately November to April. The southwest monsoon brings heavy rainfall from the southwest to the northeast, often marking the start of the agriculture season across Thailand (Fig. 1d)41,42. Conversely, the northeast monsoon brings dry winds from the northeast to the southwest. Our observation implies that dry winds may have the potential to transport aerosolised soil contaminated with B. pseudomallei southwestward. Despite that previous air sampling during Thailand’s wet season did not detect B. pseudomallei, exploring the impact of dry winds during the northeastern monsoon is essential. This consideration becomes even more pertinent given reports of the long-range transport of particles and small organic matters, such as PM2.5 and PM10, via the northeast monsoon elsewhere in Southeast Asia43,44. Furthermore, we successfully estimated the time of most recent of ancestry for a sub-lineage 1.3, a descendant of lineage 1. Our finding revealed that this sub-lineage emerged around 2011 (95% HPD of 2000–2014, Supplementary Fig. 3c). The age of this sub-lineage implies that older lineages, such as its parental lineage 1 likely experienced multiple monsoon seasons, which possibly resulted in the observed patterns. Although our observation suggested terrain slope, inland rivers, canal systems45 and regular monsoons41,42 as the potential factors that may contribute to B. pseudomallei dissemination in northeast Thailand, the result should be interpreted with caution since anthropogenic activities such as human migration between the provinces may also contribute to the observed pattern. However, without access to comprehensive human movement data, this aspect remains challenging to investigate.
Genetic markers present in successful B. pseudomallei lineages
The co-existence of multiple B. pseudomallei lineages within the same geographical areas and timeframe implies the presence of diverse adaptive strategies that enable them to thrive in a shared ecological niche. While some smaller lineages may be sporadically detected, the persistence of the three dominant lineages throughout the sampling period supports their fitness and successful adaptive strategies in this niche. We next sought to identify genes that were present in isolates that form each dominant lineage, or its sub-lineage; but absent in non-dominant lineages (see “Methods”, Supplementary Fig. 7). Out of a total of 15,237 genes in the pan-genome outlined from this population (see “Methods”), 5577 genes were conserved across the entire population while 9660 genes were variably present (accessory genes). Dominant lineage-specific genes were defined as accessory genes present in ≥ 95% of isolates within any of the dominant lineages or their sub-lineages, but present in ≤ 15% of isolates outside these lineages. Among these, 247 genes were identified as lineage-specific with their specificity to each dominant lineage and sub-lineage tabulated in Supplementary Data 3. The majority of dominant lineage-specific genes were poorly characterised and annotated as hypothetical proteins (Fig. 3). To gain insights into the potential functions, we annotated them using Gene Ontology (GO terms)46 which classify them by Biological process, Molecular function, and Cellular component (Fig. 3b, see “Methods”). Of the 247 dominant lineage genes, GO terms could be assigned to 42 genes for Biological Process, 68 for Molecular Function, and 12 for Cellular component. For genes that could be assigned GO terms, functions involved in “DNA integration”, “DNA recombination”, and “DNA methylation” might indicate their potential roles in horizontal gene acquisition and protection against incoming foreign DNA through site-specific DNA methylation. Furthermore, GO terms associated with “DNA binding” and “Regulation of DNA-templated transcription” may suggest lineage-specific regulation of the expression of these genes.
Lineage-specific genes were selectively expressed
To delve deeper into the functionality of dominant lineage-specific genes, we explored their expression patterns across both environmental and infection conditions. We selected representative strains - “K96243” (lineage 1 – sub-lineage 1.1), “UKMD286” (lineage 2 – sub-lineage 2.1), and “UKMH10” (lineage 3 – sub-lineage 3.2) - all are clinical isolates with pre-existing gene expression profiles under infection and environmental conditions47,48,49. For environmental conditions, K96243 was exposed to water47, while UKMD286 and UKMH10 were cultivated in a soil extract medium48,49 to mimic B. pseudomallei in the environment. For infection conditions, K96243 and UKMD286 were used in murine challenges47,48 and isolated from mice organs, while UKMH10 was subjected to human plasma49 to simulate host infection. This approach facilitated the comparison of differentially expressed lineage-specific genes between environmental and infection conditions. Although each representative strain carried a complete set of lineage-specific genes for their respective sub-lineages (K96243 with 47 genes, UKMD286 with 27 genes, and UKMH10 with 14 genes), their collective representation accounted for 69 out of 247 total lineage-specific genes (27.9%). This limitation was due to observed genetic diversity within the dominant lineage, which should be acknowledged. Interestingly, 11 out of 47 lineage-specific genes in K96243 and 6 out of 27 lineage-specific genes in UKMD286 were up-regulated in the environmental conditions (Figs. 4a–c, Supplementary Data 4). However, none of the lineage-specific genes showed up-regulation during the infection conditions. The remaining lineage-specific genes did not exhibit preferential expression in either environmental or infection conditions.
The elevated expression level of lineage-specific genes in the environmental condition was unexpected considering that all representative strains were clinical isolates. Our analysis of a broader spectrum of environmental conditions (Fig. 4d) for K9624347, a lineage 1 representative strain, revealed that lineage-1-specific genes exhibited higher expression levels under nutrient deprivation compared to other genes in K96243 genome (Benjamini Hochberg adjusted p-value = 5.72 × 10−3). This finding suggests that the ability to survive in nutrient-depleted soil, which is not uncommon in melioidosis endemic areas27,28,50 before being acquired by a human host and subsequently causing the disease, maybe one of the adaptive strategies of lineage 1. However, caution is needed in interpreting this finding due to the limited number of characterised lineage-specific genes and the use of a representative strain, which may not fully capture the genetic diversity within the lineage.
Example of lineage-specific genes
The majority of lineage-specific genes were located within genomic islands (GI)30,51,52. These regions are characterised by anomalies in %G + C content or dinucleotide frequency signatures, or the presence of genes associated with mobile genetic elements such as insertion sequence (IS) elements and bacteriophages. Notably, we observed that a cluster of genes specific to lineage 1 (BPSS2060 to BPSS2072) formed a mosaic structure within a putative metabolic island known as GI 1630. Although several variations of GI 16 have been reported (GI16, GI16.1, GI16.2, GI16a, GI16b, and GI16b.1)51, it typically spans 60 kb (BPSS2051 to BPSS2090) and carries several known virulence determinants and genes that enhance metabolic versatility. While certain virulence factors, such as the filamentous haemagglutinin (BPSS2053) required for host cell adhesion and its processing protein were conserved across multiple lineages observed in our study, genes encoding functions that potentially expand the metabolic repertoire were specific to dominant lineages. For example, the mosaic structure of GI 16 (BPSS2060 to BPSS2072), specific to lineage 1 (Supplementary data 3), contains genes involved in alternative nutrient catabolism and anabolism (BPSS2060, BPSS2065, BPSS2067, BPSS2068, and BPSS2072), transcriptional regulation (BPSS2061), and substrate transport (BPSS2064, BPSS2071). Out of 11 lineage-specific genes located in the mosaic structure of GI 16, eight were found to be upregulated during the early phase of nutrient starvation while remaining silent during infections. This finding reflects the functional division of GI16 where its lineage-specific mosaic structure contains genes that contribute to metabolic versatility, while its core structure encodes virulent determinants associated with disease implications. It is important to note that the structure of GI 16 may vary across different regions due to the plasticity of genomic islands and changes in selection pressures.
Roles of homologous recombination
Homologous recombination has been shown to play a significant role in facilitating the gain and loss of genes, and the generation of mosaic structures within the GI of B. pseudomallei53. To better understand its association with lineage-specific genes, we identified recombination events and quantified the rates of recombination in the dominant lineages (Supplementary Fig. 8). The ratio of polymorphisms introduced through recombination compared to those introduced by mutation (r/m) was 3.7, 4.6 and 2.2 for lineages 1, 2, and 3 respectively (Table 1). A very high proportion of genes underwent recombination at least once: 99.5% of genes in lineage 1, 99.9% in lineage 2, and 96.6% in lineage 3. Furthermore, every lineage-specific gene within each dominant lineage underwent recombination (Table 1). The bacterial restriction modification (RM) systems prevent the invasion of foreign DNA and restrict gene flow between B. pseudomallei lineages31. Notably, components of this system including a type I restriction system and modification methylase were among dominant lineage-specific genes (BPSL0947-BPSL0948 in lineage 1, and their homologues in lineage 2 and 3). They may act as a barrier for homologous recombination and potentially modulate lineage-specific genetic diversity. This highlights the intricate interplay between recombination, lineage-specific genes, and the RM system in shaping the genetic landscape of B. pseudomallei in northeast Thailand.
Discussion
Our analysis of B. pseudomallei population genomics enhances our understanding of the evolution and adaptive strategies employed by the dominant lineages in the melioidosis hyperendemic region of northeast Thailand and neighbouring countries. Through an unprecedentedly dense sampling effort between 2015 and 2018, we were able to determine the co-existence of three dominant lineages, characterise their dissemination patterns, and identify lineage-specific genes possibly contributing to their success during the studied period. By analysing transcriptome data from representative strains of each dominant lineage, we gained insights into some of their adaptive strategies, highlighting bacterial persistence in the environment as crucial for subsequent host acquisition and infection, among other strategies. Nevertheless, our study has a few limitations.
While a complete transmission landscape of an environmental pathogen would ideally incorporate both environmental and clinical isolates, our study primarily utilised clinical isolates. This choice was necessitated by the limited environmental surveillance and the scarcity of environmental isolates in Southeast Asia. Although this approach is not ideal, it represents the best available scenario. Importantly, all clinical isolates were most likely acquired from environmental exposures54, and human-to-human transmission is known to be very rare for melioidosis55. Furthermore, the co-occurrence of clinical and environmental isolates within the same lineage suggests that our findings based on clinical isolates may have broader implications for environmental isolates. Nevertheless, integrating environmental studies is needed to enhance our understanding of B. pseudomallei persistence in the environment and its transmission. Therefore, continued surveillance efforts including both clinical and environmental samples will be essential. Additionally, the lineage classification report in our study is subjected to potential alterations over time with the introduction of new data. As B. pseudomallei continuously adapts to environmental pressures, the composition of lineages 1, 2, and 3, along with their respective lineage-specific genes may shift. New lineages with superior fitness or selective advantages, of different adaptive strategies could potentially outcompete the existing dominant lineages. Consequently, this may lead to emergence of new lineage classifications in the future. This ongoing monitoring will be pivotal in identifying alterations within the bacterial population, uncovering new adaptive strategies, and evaluating their impact on disease dynamics over time.
The up-regulation of dominant lineage-specific genes in environmental conditions, coupled with the increased expression of lineage-1-specific genes during nutrient deprivation, suggests that bacterial persistence in the environment may be an adaptive strategy for certain B. pseudomallei lineages. However, it is essential to acknowledge the genetic diversity within each dominant lineage. The representative strains in our study carried only a subset of the lineage-specific genes. As a result, lineage-specific genes absent in these strains, often annotated as hypothetical proteins, remained unexplored. To gain a more comprehensive understanding, we need a fuller spectrum of conditions reflecting different transcriptome profiles mirroring seasonal fluctuations. Similar efforts have been successful in elucidating bacterial transitions through host cells during infection56, resulting in a comprehensive list of genes and their functional during different infection stages. Therefore, a broader range of experimental conditions is needed to uncover additional adaptive strategies overlooked in this study and pave the way for future exploration.
Despite these limitations, our dataset and analysis represent one of the most comprehensive efforts to date. Our findings suggest that environmental persistence may be one of the adaptive strategies for dominant lineages 1 and 2, with lineage-1-specific genes potentially mediating bacterial survival under nutrient depletion. It remains uncertain whether lineage 3 adopts a similar strategy. Nevertheless, our results align with previous soil sampling studies, which consistently observed a higher prevalence of B. pseudomallei in nutrient-depleted compared to nutrient-rich soil28,50. Additionally, a molecular evolutionary study also supports the species’ long-term adaptation to survive nutrient scarcity27. The northeast region of Thailand, where our samples were primarily collected, has inherently low-fertility soil. This is exacerbated by intensified agriculture, monoculture, excessive synthetic fertiliser use, and poor land management, resulting in depleted soil nutrients and organic matter57. This may present a challenging environment for B. pseudomallei to thrive, thereby potentially selecting for successful lineages with persistent traits as observed in our study.
Our analyses also highlight various factors such as differences in terrain altitude58,59, river flow dynamics40, and the northeast monsoon41 as the drivers that may shape the dissemination patterns of B. pseudomallei. These drivers of dissemination may be influenced by both natural and human activities60. For instance, strong winds can carry dried soil particles, potentially containing B. pseudomallei over distances. Climate change-induced alterations in vegetation cover might expose soil to rainfall and winds61, impacting the bacterial spread. Additionally, deforestation can disrupt natural barriers like trees and shrubs, accelerating water runoff60,61 and potentially facilitating the wide-ranging dissemination of B. pseudomallei during flood. Considering these dynamics, the strategy of bacterial persistence likely plays a pivotal role in its widespread dissemination within the region, thereby influencing disease prevalence. Therefore, an effective disease control strategy should integrate both environmental and clinical public health measures to effectively mitigate the impact of melioidosis.
Methods
Ethical statement and data collection
Our research received approval from the ethics committees of each of the nine study hospitals in northeast Thailand where B. pseudomallei samples were obtained8, as well as from the Mahidol University Faculty of Tropical Medicine (MUTM 2015-002-001 and MUTM 2021-055-01). The B. pseudomallei samples were collected between July 2015 to December 2018, comprising 1265 clinical isolates from northeast Thailand. Additionally, we incorporated a contemporaneous dataset from Thailand and neighbouring regions15,29,30,31,32,33,34, comprising 15 clinical and 111 environmental isolates sourced from previous publications. In total, 1391 B. pseudomlalei genomes were used in this study. Their metadata and accession numbers were documented in Supplementary Data 1.
The northeast Thailand collection was obtained from patients participated in our longitudinal cohort study8. The patients were from nine provinces including Udon Thani (n = 230), Mukdahan (n = 198), Roi Et (n = 195), Surin (n = 170), Nakhon Phanom (n = 135), Buriram (n = 123), Sisaket (n = 107), Khon Kaen (n = 96) and Mahasarakham (n = 11), who were admitted to nine hospitals included in our cohort. The isolates were obtained from various clinical samples, including blood (71.8%), pus (12.5%), sputum (9.9%), body fluid (3.5%), urine (1.9%) and tissue (0.4%). As latent infection accounts for < 5% of the cases, the majority of clinical cases likely directly acquired from the environment8. The numbers of enrolled cases was lower at the beginning of the study due to the delayed sample collection in some study sites, resulting in inconsistent number of bacterial isolates used across different sites. When applicable, a permutation test was performed to ensure that an unequal number of isolates did not impact the temporal or spatial analysis.
Culture confirmation of B. pseudomallei, DNA extraction and whole genome sequencing
All 1265 B. pseudomallei samples from the northeast Thailand collection were cultured on Ashdown, selective agar plates and confirmed the species using latex agglutination test and matrix-laser absorption ionisation mass spectrometry (MALDI-TOF MS). A single colony from Ashdown agar plate was subjected to culture in Luria-Bertani (LB) broth (catalogue number 1551, Condalab, Spain) and subsequently used for DNA extraction. Genomic DNA was extracted using QIAamp DNA Mini Kit (catalogue number 51304, Qiagen, Germany). All genomic DNA were processed for the 150-base-read library preparation and sequencing using Illumina HiSeq2000 system with 100-cycle paired-end runs at Wellcome Sanger Institute, Cambridge UK. An average of 71X read depth was achieved. To control the potential contamination in each sample with other closely related species, we assigned taxonomic identity using Kraken62 v.1.1.1. We then estimated the genome completeness and species confirmation using CheckM63 v.1.2.2 and FastANI64 v.1.31, respectively. The quality control data of the 1265 genomes were listed in Supplementary Data 2.
Genome assembly and mapping from short read data
Short reads were de novo assembled using Velvet v.1.2.1065, followed by optimisation and scaffolding to generate scaffolded contigs (see full protocol29). Data quality control is reported in Supplementary Data 2. Short reads were also mapped against several reference genomes, including a strain K9624330 (accession numbers BX571965 and BX571966) to determine the whole population structure, and lineage-specific references to improve the resolution for lineage-specific analyses. We selected K9624330 genome as a population-wide reference due to its origin in northeast Thailand, aligning with the geographical focus of our study. Additionally, its well-characterised and complete genome further streamlined subsequent analyses. For all mapping, variants were called using Snippy v.4.6.0 (https://github.com/tseemann/snippy). To avoid mapping errors and false SNPs, we filtered out SNPs covered by less than 10 reads and found in a frequency of less than 0.9.
Defining whole population structure
PopPUNK clustering
PopPUNK v.2.6.035 was run on 1391 assembled genomes. To define the core and accessory distance between each pair of isolates, the assemblies were hashed at different k-mers. Several kmer comparison options (--k-step) and model fit options (--K) were tested to identify the parameters with the optimal fit based on density, transitivity and network scores. The optimal population model was fit using command line “poppunk --fit-model ----output < database > --min-k 15 --max-kmer 31 --max-a-dist 0.53 --K 4 --k-step 2”. The model had a density of 0.028, a transitivity of 0.992, and a network score of 0.8961.
Maximum likelihood phylogenies from core genome SNP, cgMLST and MLST
An alignment of full genome was created by mapping whole genome sequences of each B. pseudomallei against a complete genome of K9624330 strain. From this alignment, 4221 cgMLST loci based on a scheme described in37 were extracted and concatenated to form cgMLST alignment. Additionally, seven MLST loci, as per scheme described in38 were extracted from the same alignment and concatenated to create the MLST alignment. Core genome SNP alignment was identified from a full genome alignment using snp-sites66 v.2.5.1, with genomic islands51 masked. Separate maximum likelihood phylogenies were constructed for core genome SNP alignment, cgMLST alignment, and MLST alignment using IQ-TREE36 v.2.0.3. Standard model selection in IQ-TREE determined the best-fit model as TVM + F + ASC + R6 for all three phylogenies. To access the robustness of the phylogenetic trees, a 1000 bootstrap support was performed for each tree.
Comparison of population structure outlined phylogeny constructed from core genome SNPs, cgMLST, MLST and PopPUNK
To test for consistency between phylogenetic trees constructed from core genome SNPs, cgMLST and MLST alignment, we used the R package treespace67 v. 1.1.4.3 to explore the tree tip distributions. We compared pairwise tree distances within the first 100 bootstraps within each alignment category (indicative of bootstrap support strength), and across trees generated from different alignment categories (indicating proximity between tree categories). The tree pairwise distances were computed, and principal components (PCs) were derived with eigenvalues calculated for different PCs. The similarity among phylogenies from each alignment category was assessed using two PCs dimensions, which jointly accounted for > 90% of variability in pairwise distance (Supplementary Fig. 1a). The scatter plot of PCs revealed a close clustering of bootstrap trees from core genome SNP and cgMLST, while the bootstrap trees from MLST alignment showed greater dispersion, highlighting less consistency in the trees generated by the MLST approach. We further compared the consistency between the median phylogenetic tree of each alignment category and PopPUNK classification was visually compared using iTOL68 (Supplementary Fig. 1b).
Specific lineage analysis
To investigate the dissemination and genetic diversity within each dominant lineage, we conducted individual genome alignments, recombination removal, and maximum likelihood phylogeny. To enhance the sensitivity of variant, we selected closely related genomes as references for each lineage. Specifically, the complete genome B. pseudomallei strain K96243 (accession numbers BX571965 and BX571966) served as the mapping reference for lineage 1, while new reference genomes were created for lineages 2 and 3.
For lineage 2, we chose a representative isolate 27035_8#57 and subjected it to long-read sequencing on a local MinION sequencer following the manufacturer’s standard protocol (Oxford Nanopore Technologies, Oxford, United Kingdom). A complete hybrid assembly of the long-read and short-read sequence data of this strain was performed using Unicycler69 v.0.8.4. The resulting hybrid assembly of the 27035_8#57 genome was employed as the mapping reference for this lineage.
In the absence of representative complete genomes for lineage 3, we selected the best quality de novo assembly of isolate 27035_8#119 and orientated its contigs according to strain K96243 using ABACAS70 v.1.3.1. This genome was used as a mapping reference for lineage 3.
For lineage-specific mapping, Snippy v.4.6.0 was employed as in the whole population analysis. All genome alignments were subjected to Gubbins71 v.3.1.3, a recombination identification tool, to detect and remove recombination fragments. This process determined the genetic diversity introduced by horizontally acquired elements and vertically inherited SNPs, thereby producing recombination-free SNP alignments for phylogenetic reconstruction. Maximum-likelihood phylogenies were constructed using recombination-free SNP alignment of each dominant lineage using IQ-TREE36 v.2.0.3 with TVM + F + ASC + R6 and 1000 replicates of bootstrap support. The overall proportion of nodes with ≥ 80% bootstrap support of lineage-specific phylogenies reached 83.5%.
Dating the timeline for lineages and sub-lineages
To enable dating analysis, we further divided each lineage into sub-lineages using R package rhierbaps72 v.1.1.4. We inferred evolutionary timeline and estimated the age of each lineage and sub-lineage based on the isolate’s collection date. A Bayesian molecular dating provided in R package BactDating73 v.1.1.1. was employed to assess the temporal signals by examining a positive correlation between the isolate’s collection date and the root-to-tip distance. Recombination removed phylogenies were used in this analysis. A date-randomisation test, consisting of 100 permutations, was performed to assess the robustness of the temporal signal compared to noise.
Notably, the temporal signals were discernable at the sub-lineage level rather than the broader lineage level. Among the 10 sub-lineages, only one sub-lineage (lineage 1.3) exhibited a positive correlation in their clock signals. Given the limited sample size of lineage 1.3, we employed a strict clock model to prevent parameter over-fitting. We ran three independent Markov chain Monte Carlo (MCMC) chains, each spanning at least 100 million iterations, and sampled every 10,000 steps. The prior mutation rate derived from Pearson and colleagues53 was used. Visual inspection of the trace from each MCMC chain confirmed signal convergence, with effective sampling size values > 200 for key parameters. Visualisation of results was performed using the R package ggtree74 v.3.10.0 to generate credibility time-calibrated phylogeny for sub-lineage 1.3 (Supplementary Figs. 3c).
Ancestral state reconstruction analysis
Ancestral trait reconstruction was conducted to discern the dissemination patterns of B. pseudomallei among provinces in northeast Thailand, focusing on dominant lineages. Due to varying number of isolates among provinces, the analysis excluded Mahasarakham, which had a limited dataset (n = 11), resulting in the analysis of eight provinces: Buriram, Khon Kaen, Mukdahan, Nakhon Phanom, Roi Et, Sisaket, Surin, and Udon Thani. This approach yielded 28 potential province-to-province transmission combinations. To mitigate sampling biases, we sub-sampled the phylogeny of each dominant lineage to have an equal number of isolates per province (n = 15 isolates) and permuted 1,000 times. Using the stochastic character mapping function (make.simmap) from the R package phytools75 v.1.9.16, we conducted 100 simulations (nsim = 100) to reconstruct the provincial origins at each node in the sub-sampled phylogeny (1000 phylogenies per lineage). This allowed us to quantify transition events (Markov jumps) between province pairs and determine the cumulative branch length associated with each province (Markov rewards). A Mann–Whitney U test, with Bonferroni correction for multiple comparisons was applied to compare the transition event counts among provinces (Supplementary Fig. 6).
Pan-genome analysis
All the study genomes were annotated using Prokka76 v.1.14.5, and further used in the pan-genome analysis. Each genome has a median of 5845 coding sequences (CDS) predicted onto each genome with a range of 5642 to 6142 CDS per genome. Panaroo77 v.1.3.3 was employed to estimate the pan-genome with a sensitive option and a cut-off sequence identity of 92% derived from the previous study15. The number of estimated genes falls within a comparable range to previous studies from a single population29,53.
Identification of dominant lineage-specific genes
We determined lineage-specific genes by assessing their prevalence within dominant lineage or any of their sub-lineages, requiring a high occurrence (95%) within these specific groups while maintaining a low presence in non-dominant lineages. To achieve this, three thresholds were employed: strict (95% occurrence in dominants vs 5% occurrence in non-dominants), intermediate (95% in dominants vs 10% in non-dominants), and relaxed (95% in dominants vs 15% in non-dominants). Based on a visual examination of gene distribution patterns (Fig. 3a, Supplementary Fig. 7), the relaxed threshold was used to maximise the number of genes included in subsequent analysis.
Identification of gene ontology (GO: terms)
Amino acid sequences of lineage-specific genes were submitted to InterPro database46 (https://www.ebi.ac.uk/interpro/) which characterised the function of lineage-specific genes based on biological processes, molecular functions, and cellular compartments (Fig. 3b; Supplementary data 3).
Transcriptomic analysis of dominant lineage-specific genes
Lineage 1 transcriptome analysis
The analysis focused on an expression profile of a strain K96243, which serves as a representative of lineage 1. Data was sourced from microarray experiment generated by Ooi et al. 47 and accessed through the Gene Expression Omnibus (GEO) under accession number GSE43205.
To understand the difference between environmental and infection conditions, we compared the expression profile of K96243 being exposed to water and K96243 recovered from infected mice. To simulate environmental conditions, K96243 was cultured to log phase in LB medium, subsequently washed with sterile deionising water, and suspended in water. To emulate infection conditions, BALB mice were infected with 1000 CFU of B. pseudomallei, and bacteria were harvested from the lungs three days post-infection. Two replicates were performed for each condition. We retrieved microarray data from GEO using the R package GEOquery78 v.2.58.0, and differential gene expression analysis was performed using the R package limma79 v.3.58.1.
We used binary expression patterns reported in Ooi et al. 47 to compare the expression profiles of lineage-specific genes against the remaining genes in K96243 across 62 conditions. This enabled the comparison of the count of expressed genes within the lineage-specific category against the remainder of the genes for each condition using Fisher’s exact test, with multiple testing adjustments via Benjamini-Hochberg corrections.
Lineage 2 transcriptome analysis
This analysis focused on the expression profile of a strain UKMD286, representative of lineage 2. RNAseq data was obtained from an experiment conducted by Ghazali et al. 48 and accessed through the European Nucleotide Archive (ENA) (E-MTAB-11200).
To simulate environmental conditions, UKMD286 was cultured in BHIB medium overnight, resuspended, and inoculated into soil extract medium. For infection conditions, BALB mice were infected with UKMD286, and the bacteria were harvested from spleens five days post-infection. Each experiment was conducted with two replicates. FastQC v.0.11.9 and FastXtool v.0.0.14 were used to pre-processed sequenced reads. Raw reads were aligned to UKDM286 genome using Hisat280 v. 2.2.1 with differential gene expression performed using the R package DESeq281 v.1.40.2.
Lineage 3 transcriptome analysis
We used a strain UKMH10 to represent the expression profile of lineage 3. Data was originated from an RNAseq experiment conducted by Kong et al. 49 and was accessed through the European Nucleotide Archive (ENA) (PRJEB53338).
To replicate environmental conditions, UKMH10 was cultured in LB medium overnight and subcultured into soil extract medium. To simulate infection conditions, UKMH10 was cultured in LB medium overnight and inoculated into human plasma, then incubated at 37 °C to mimic the human body temperature. Bacterial cells were harvested once the absorbance reading of the bacterial cultures at 600 nm (OD600) reached 0.5. Each experiment was performed with two replicates. FastQC v.0.11.9 and FastXtool v.0.0.14 were used to pre-processed sequenced reads. Raw reads were aligned to UKMH10 genome using Hisat280 v.2.2.1 with differential gene expression performed using the R package DESeq281 v.1.40.2.
Statistics and reproducibility
We did not use statistical methods to predetermine the sample size, as the number of isolates used depended on B. pseudomallei isolates obtained from melioidosis patients admitted to nine study hospitals in northeast Thailand during the study period. All newly sequenced B. pseudomallei genomes from northeast Thailand were combined with existing sequenced data from Southeast Asia to contextualise our findings. We used independent approaches to outline population structure into groups, ensuring reproducibility. Subgroup analyses focused on three dominant lineages, excluding smaller groups from the subsequent analyses. These analyses included reconstructing the geographical and temporal spread of each dominant lineage, identifying lineage-specific genes and their recombination patterns, and comparing the transcriptome profiles of lineage-specific genes in environmental and infection conditions. Date-randomisation permutation tests were performed to ensure that the detected temporal signals were not random. Additionally, permutations were conducted to reduce bias from the varying number of isolates from each province during the reconstruction of province-to-province dissemination history. The investigators were not blinded to allocation as the outcomes were objectively measured and did not require subjective interpretation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The newly sequenced 1265 B. pseusomallei genomes from northeast Thailand generated in this study have been deposited in the European Nucleotide Archive (ENA) under study accession number PRJEB25606 and PRJEB35787. The accession numbers for individual genomes are provided in Supplementary Data 1. We sourced existing RNA data used to compare gene expression during infection and in the environment for representative strains of lineages 1, 2, and 3 from the following repositories: NCBI Gene Expression Omnibus (GEO) under accession number GSE43205 (lineage 1) [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE43205], and the ENA under accession numbers E-MTAB-11200 (lineage 2) [https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-11200] and PRJEB53338 (lineage 3)[[https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB53338]. Source data are provided with this paper. There is no restriction on data availability. Source data are provided with this paper.
Code availability
All software utilised for data analysis is open source.
Change history
25 July 2024
In the original version of this article, the given and family names of T. Eoin West were incorrectly structured. The name was displayed correctly in all versions at the time of publication. The original article has been corrected.
References
Limmathurotsakul, D. et al. Predicted global distribution of Burkholderia pseudomallei and burden of melioidosis. Nat. Microbiol. 1, 15008 (2016).
Birnie, E. et al. Global burden of melioidosis in 2015: a systematic review and data synthesis. Lancet Infect. Dis. 19, 892–902 (2019).
Mohapatra, P. R. & Mishra, B. Burden of melioidosis in India and South Asia: challenges and ways forward. Lancet Reg. Health Southeast Asia 2, 100004 (2022).
Zheng, X., Xia, Q., Xia, L. & Li, W. Endemic melioidosis in Southern China: past and present. TropicalMed 4, 39 (2019).
Tran, Q. T. L. et al. Child melioidosis deaths caused by Burkholderia pseudomallei—contaminated borehole water, Vietnam, 2019. Emerg. Infect. Dis. 28, 1689–1693 (2022).
Win, M. M. et al. Enhanced melioidosis surveillance in patients attending four tertiary hospitals in Yangon, Myanmar. Epidemiol. Infect. 149, e154 (2021).
Thabit, A. M. et al. Etiologies of tropical acute febrile illness in West Pahang, Malaysia: a prospective observational study. Asian Pac. J. Trop. Med. 13, 115 (2020).
Chantratita, N. et al. Characteristics and one year outcomes of melioidosis patients in Northeastern Thailand: a prospective, multicenter cohort study. Lancet Reg. Health Southeast Asia 9, 100118 (2023).
Hodgetts, K. et al. Melioidosis in the remote Katherine region of northern Australia. PLoS Negl. Trop. Dis. 16, e0010486 (2022).
Gassiep, I., Ganeshalingam, V., Chatfield, M. D., Harris, P. N. A. & Norton, R. E. The epidemiology of melioidosis in Townsville, Australia. Trans. R. Soc. Trop. Med. Hyg. 116, 328–335 (2022).
Hinjoy, S. et al. Melioidosis in Thailand: present and future. TropicalMed 3, 38 (2018).
Bulterys, P. L. et al. Climatic drivers of melioidosis in Laos and Cambodia: a 16-year case series analysis. Lancet Planet Health 2, e334–e343 (2018).
Chai, L. Y. A. & Fisher, D. Earth, wind, rain, and melioidosis. Lancet Planet. Health 2, e329–e330 (2018).
Liu, X. et al. Association of melioidosis incidence with rainfall and humidity, Singapore, 2003–2012. Emerg. Infect. Dis. 21, 159–162 (2015).
Chewapreecha, C. et al. Global and regional dissemination and evolution of Burkholderia pseudomallei. Nat. Microbiol. 2, 16263 (2017).
Zheng, H. et al. Genetic diversity and transmission patterns of Burkholderia pseudomallei on Hainan island, China, revealed by a population genomics analysis. Microbial. Genom. 7, 000659 (2021).
Sarovich, D. S. et al. Phylogenomic analysis reveals an Asian origin for African Burkholderia pseudomallei and further supports melioidosis endemicity in Africa. mSphere 1, e00089–15 (2016).
Price, E. P., Currie, B. J. & Sarovich, D. S. Genomic insights into the melioidosis pathogen, Burkholderia pseudomallei. Curr. Trop. Med. Rep. 4, 95–102 (2017).
Mayo, M. et al. A cluster of melioidosis cases from an endemic region is clonal and is linked to the water supply using molecular typing of Burkholderia pseudomallei isolates. Am. J. Tropical Med. Hyg. 65, 177–179 (2001).
Draper, A. D. K. et al. Association of the melioidosis agent Burkholderia pseudomallei with water parameters in rural water supplies in Northern Australia. Appl Environ. Microbiol 76, 5305–5307 (2010).
Chen, Y.-L. et al. The concentrations of ambient Burkholderia Pseudomallei during Typhoon season in endemic area of melioidosis in Taiwan. PLoS Negl. Trop. Dis. 8, e2877 (2014).
Chen, P.-S. et al. Airborne transmission of melioidosis to humans from environmental aerosols contaminated with B. pseudomallei. PLoS Negl. Trop. Dis. 9, e0003834 (2015).
Limmathurotsakul, D. et al. Melioidosis caused by Burkholderia pseudomallei in drinking water, Thailand, 2012. Emerg. Infect. Dis. 20, 265–268 (2014).
Yip, T.-W. et al. Endemic melioidosis in residents of desert region after atypically intense rainfall in Central Australia, 2011. Emerg. Infect. Dis. 21, 1038–1040 (2015).
Baker, A. L., Ezzahir, J., Gardiner, C., Shipton, W. & Warner, J. M. Environmental attributes influencing the distribution of Burkholderia pseudomallei in Northern Australia. PLoS ONE 10, e0138953 (2015).
Pumpuang, A. et al. Survival of Burkholderia pseudomallei in distilled water for 16 years. Trans. R. Soc. Tropical Med. Hyg. 105, 598–600 (2011).
Chewapreecha, C. et al. Co-evolutionary signals identify Burkholderia pseudomallei survival strategies in a hostile environment. Mol. Biol. Evol. 39, msab306 (2022).
Hantrakun, V. et al. Soil nutrient depletion is associated with the presence of Burkholderia pseudomallei. Appl. Environ. Microbiol. 82, 7086–7092 (2016).
Chewapreecha, C. et al. Genetic variation associated with infection and the environment in the accidental pathogen Burkholderia pseudomallei. Commun. Biol. 2, 428 (2019).
Holden, M. T. G. et al. Genomic plasticity of the causative agent of melioidosis, Burkholderia pseudomallei. Proc. Natl Acad. Sci. 101, 14240–14245 (2004).
Nandi, T. et al. Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles. Genome Res. 25, 129–141 (2015).
Liechti, N. et al. Whole-genome assemblies of 16 Burkholderia pseudomallei isolates from rivers in Laos. Microbiol. Resour. Announc. 10, e01226–20 (2021).
Rachlin, A. et al. Whole-genome sequencing of Burkholderia pseudomallei from an urban melioidosis hot spot reveals a fine-scale population structure and localised spatial clustering in the environment. Sci. Rep. 10, 5443 (2020).
Webb, J. R. et al. Myanmar Burkholderia pseudomallei strains are genetically diverse and originate from Asia with phylogenetic evidence of reintroductions from neighbouring countries. Sci. Rep. 10, 16260 (2020).
Lees, J. A. et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res. 29, 304–316 (2019).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Lichtenegger, S. et al. Development and validation of a Burkholderia pseudomallei core genome multilocus sequence typing scheme to facilitate molecular surveillance. J. Clin. Microbiol. 59, e00093–21 (2021).
Godoy, D. et al. Multilocus sequence typing and evolutionary relationships among the causative agents of melioidosis and glanders, Burkholderia pseudomallei and Burkholderia mallei. J. Clin. Microbiol. 41, 2068–2079 (2003).
Lofjle, E. & Kubiniok, J. Landform development and bioturbation of the Khorat Plateau, Northeast Thailand. Nat. Hist. Bull. Siam. Soc. 44, 199–216 (1996).
Gupta, A. Geology and landforms of the Mekong basin. in The Mekong 29–51 (Elsevier, 2009).
Gadgil, S. & Rupa Kumar, K. The Asian monsoon—agriculture and economy. in The Asian Monsoon 651–683 (Springer Berlin Heidelberg, 2006).
Arpornrat, T., Ratjiranukool, S., Ratjiranukool, P. & Sasaki, H. Evaluation of southwest monsoon change over Thailand by high-resolution regional climate model under high RCP emission scenario. J. Phys.: Conf. Ser. 1144, 012112 (2018).
Hien, P. D., Bac, V. T. & Thinh, N. T. H. PMF receptor modelling of fine and coarse PM10 in air masses governing monsoon conditions in Hanoi, northern Vietnam. Atmos. Environ. 38, 189–201 (2004).
Hien, T. T. et al. Current status of fine particulate matter (PM2.5) in Vietnam’s most populous city, Ho Chi Minh City. Aerosol Air Qual. Res. 19, 2239–2251 (2019).
Floch, P. & Molle, F. Marshalling water resources: a chronology of irrigation development in the Chi-Mun River basin Northeast Thailand. (CGIAR Challenge Programme on water and food, 2007).
The Gene Ontology Consortium The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47, D330–D338 (2019).
Ooi, W. F. et al. The condition-dependent transcriptional landscape of Burkholderia pseudomallei. PLoS Genet. 9, e1003795 (2013).
Ghazali, A.-K. et al. Transitioning from Soil to Host: comparative transcriptome analysis reveals the Burkholderia pseudomallei response to different niches. Microbiol. Spectr. 11, e03835–22 (2023).
Kong, C. et al. Transcriptional landscape of Burkholderia pseudomallei cultured under environmental and clinical conditions. Microbial. Genom. 9, mgen000982 (2023).
Manivanh, L. et al. Burkholderia pseudomallei in a lowland rice paddy: seasonal changes and influence of soil depth and physico-chemical properties. Sci. Rep. 7, 3031 (2017).
Tuanyok, A. et al. Genomic islands from five strains of Burkholderia pseudomallei. BMC Genom. 9, 566 (2008).
Bertelli, C. et al. Enabling genomic island prediction and comparison in multiple genomes to investigate bacterial evolution and outbreaks. Microbial. Genom. 8, mgen000818 (2022).
Spring-Pearson, S. M. et al. Pangenome analysis of Burkholderia pseudomallei: genome evolution preserves gene order despite high recombination rates. PLoS ONE 10, e0140274 (2015).
Limmathurotsakul, D. et al. Activities of daily living associated with acquisition of melioidosis in northeast Thailand: a matched case-control study. PLoS Negl. Trop. Dis. 7, e2072 (2013).
Aziz, A., Currie, B. J., Mayo, M., Sarovich, D. S. & Price, E. P. Comparative genomics confirms a rare melioidosis human-to-human transmission event and reveals incorrect phylogenomic reconstruction due to polyclonality. Microb. Genom. 6, e000326 (2020).
Heacock-Kang, Y. et al. The Burkholderia pseudomallei intracellular ‘TRANSITome. Nat. Commun. 12, 1907 (2021).
Pierret, A. et al. Reshaping upland farming policies to support nature and livelihoods: Lessons from soil erosion in Southeast Asia with emphasis on Lao PDR (Institut de Recherche pour le Developpement & International Water Management Institute, 2011).
Keyes, C. F. Isan: Regionalism in Northeastern Thailand. (Southeast Asia Program, Department of Asian Studies, Cornell University, 1967).
Charusiri, P., Daorerk, V., Archibald, D., Hisada, K. & Ampaiwan, T. Geotectonic evolution of Thailand: a new synthesis. J. Geol. Soc. Thailand 1, 1–20 (2002).
Borrelli, P. et al. An assessment of the global impact of 21st century land use change on soil erosion. Nat. Commun. 8, 2013 (2017).
Singh, B. K. et al. Climate change impacts on plant pathogens, food security and paths forward. Nat. Rev. Microbiol. 21, 640–656 (2023).
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Page, A. J. et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial. Genom. 2, e000056 (2016).
Jombart, T., Kendall, M., Almagro‐Garcia, J. & Colijn, C. Treespace: statistical exploration of landscapes of phylogenetic trees. Mol. Ecol. Resour. 17, 1385–1392 (2017).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595 (2017).
Assefa, S., Keane, T. M., Otto, T. D., Newbold, C. & Berriman, M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25, 1968–1969 (2009).
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15–e15 (2015).
Tonkin-Hill, G., Lees, J. A., Bentley, S. D., Frost, S. D. W. & Corander, J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res. 47, 5539–5549 (2019).
Didelot, X., Croucher, N. J., Bentley, S. D., Harris, S. R. & Wilson, D. J. Bayesian inference of ancestral dates on bacterial phylogenetic trees. Nucleic Acids Res. 46, e134–e134 (2018).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Revell, L. J. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
Davis, S. & Meltzer, P. S. GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847 (2007).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Acknowledgements
We would like to thank physician and microbiology staff at the Udon Thani Hospital, Khon Kaen Hospital, Srinakarin Hospital, Nakhon Phanom Hospital, Mukdahan Hospital, Roi Et Hospital, Surin Hospital, Sisaket Hospital and Buriram Hospital for reporting the case and providing the facilities for bacterial collection. We sincerely appreciate the transcriptomic data of B. pseudomallei UKMD286 and UKMH10 provided by Professor Sheila Nathan and colleagues. We are grateful to Dr. Soe Htet Aung for supporting geographical map plot. This work was funded by Mahidol University (MU-KMUTT Biomedical engineering and Biomaterials Consortium) and Royal Golden Jubilee Ph.D. Programme (RGJ-ASEAN) (http://rgj.trf.or.th) through RS and NC. CCho was funded by Wellcome International Master Fellowship (221418/Z/20/Z) and Prince Mahidol PhD studentship. CChe was funded by Wellcome International Intermediate Fellowship (216457/Z/19/Z), Sanger International Fellowship, and Thailand Health System Research Institute (67-138). NC and TEW were supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (NIAID/NIH) (https://www.nih.gov) under Award Number U01AI115520. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This research was funded in part, by the Wellcome Trust [220211 and 206194]. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
NC and CChe conceived and designed the study, as well as administered and supervised the project. NC secured funding to collect isolates, while CChe obtained funding for sequencing and downstream analysis. NC, RS, TEW, and NS collected and identified bacterial isolates, with NC, RS, ST, and RP gathering and curating clinical data. CChe, JP, NRT, performed short-read sequencing while NC, RS, JT, EB and WC performed long-read sequencing. NPJD and NC provided reagents. NRT and JP contributed software tools. RS, CCho, and CChe performed bioinformatics analyses. CChe, NC, NRT and JP interpret the analyses. The original draft was written by RS, NC, and CChe. CChe reviewed, edited, and wrote revision drafts. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Jessica Webb, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Seng, R., Chomkatekaew, C., Tandhavanant, S. et al. Genetic diversity, determinants, and dissemination of Burkholderia pseudomallei lineages implicated in melioidosis in Northeast Thailand. Nat Commun 15, 5699 (2024). https://doi.org/10.1038/s41467-024-50067-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-50067-9
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.