Dissemination of Mycobacterium abscessus via global transmission networks

Mycobacterium abscessus, a multidrug-resistant nontuberculous mycobacterium, has emerged as a major pathogen affecting people with cystic fibrosis (CF). Although originally thought to be acquired independently from the environment, most individuals are infected with one of several dominant circulating clones (DCCs), indicating the presence of global transmission networks of M. abscessus. How and when these clones emerged and spread globally is unclear. Here, we use evolutionary analyses of isolates from individuals both with and without CF to reconstruct the population history, spatiotemporal spread and recent transmission networks of the DCCs. We demonstrate synchronous expansion of six unrelated DCCs in the 1960s, a period associated with major changes in CF care and survival. Each of these clones has spread globally as a result of rare intercontinental transmission events. We show that the DCCs, but not environmentally acquired isolates, exhibit a specific smoking-associated mutational signature and that current transmission networks include individuals both with and without CF. We therefore propose that the DCCs initially emerged in non-CF populations but were then amplified and spread through the CF community. While individuals with CF are probably the most permissive host, non-CF individuals continue to play a key role in transmission networks and may facilitate long-distance transmission.

M ycobacterium abscessus is a multidrug-resistant species of nontuberculous mycobacteria (NTM) that has recently emerged as a major threat to individuals with CF, with increasing rates of infection seen in CF cohorts around the world 1 . This rapidly growing NTM is divided into three subspecies: M. abscessus subspecies abscessus (M. a. abscessus), M. a. massiliense and M. a. bolletii 2 . Infections with M. abscessus lead to accelerated inflammatory lung damage 3 , are often difficult or impossible to treat despite prolonged courses of combination antibiotics 4-6 and may prevent safe lung transplantation 1,7 .
Although M. abscessus was originally thought to only be independently acquired from the environment, we 8 and others 9 have shown that individuals with CF can become infected through hospital-based person-to-person transmission (probably through the generation of long-lived infectious aerosols or via fomite spread 2 ). Indeed, large scale whole genome sequencing of M. abscessus isolates from CF centres around the world has revealed that most individuals with CF are infected with one of three highly prevalent globally dispersed clones, referred to as DCCs 1-3 (ref. 2 ). Assigning the DCC classification onto whole genome sequences from other studies indicates that the DCCs can also infect non-CF individuals 10,11 . DCC isolates are associated with worse clinical outcomes, have greater antibiotic resistance and are more virulent in in vitro and in vivo infection models 2 , suggesting that multiple rounds of within-host evolution have promoted increased pathogenic potential.
The high levels of genetic relatedness within the DCCs suggest that they have emerged recently and have rapidly spread within and between countries, as well as across continents 2,10,12,13 . There are many examples of isolates from individuals in different CF centres or in different countries whose sequences differ by fewer mutations than have been seen in a single individual during chronic infection 2,14 , suggesting individuals are linked by recent and widespread transmission networks. However, despite multiple studies combining whole genome sequencing with epidemiological data 2,8,9,[13][14][15][16][17][18] , the relative importance of different routes of acquisition remains unclear. In addition, the mechanism of transcontinental spread of DCCs 6 (given the lack of movement of CF individuals) is unknown. Several studies have found near-identical isolates in CF individuals with no obvious epidemiological links [15][16][17][18] , suggesting that transmission chains include additional unknown links, potentially implicating environmental or human intermediates.
To understand how the M. abscessus DCCs emerged and the routes through which they have spread globally, we have applied evolutionary phylogenetic analyses to whole genome sequences of clinical isolates from 1,178 individuals on five continents. We show that the DCCs expanded synchronously around the time of increases in CF life expectancy, spread globally (particularly from Europe to North America) and contain a mutagenic signature of exposure to smoking-related mutagens during their expansion, implicating smokers and more generally non-CF individuals, as an important second human niche for M. abscessus.

Results
Synchronous clonal expansions coincident with the emergence of the CF lung niche. We first reconstructed the phylogenetic relationships between 2,045 M. abscessus whole genome sequences from 1,178 individuals to identify recently emerged clones whose ancestry could be dated. We identified seven clonal clusters, including the three previously identified DCCs 2 , that contained highly related isolates from at least 20 individuals ( Fig. 1a and Extended Data Fig. 1) and were found on multiple continents (Fig. 1b and  Supplementary Table 1). We classified these clones as DCC1-7.
Bayesian temporal reconstruction 19 indicated that each of the DCCs emerged relatively recently, with their most recent common ancestors dating between 1902 and 1999 (Fig. 1c). The dated phylogenetic tree of each DCC is characterized by a section with multiple concurrent short branches, indicating a historical period of rapid expansion from a small number of lineages into multiple independent transmission chains that have persisted to the present day (Fig. 1c). In support of this conclusion, we found strong statistical evidence (posterior probability of one for each DCC, Methods) that each DCC had undergone at least one historical population expansion, the first of which occurred for six of the seven DCCs in the 1960s and for DCC4 much later (Fig. 1d).
We proposed that the expansion in DCCs could have been driven by changes within the CF population since M. abscessus has become a main cause of lung infection in individuals with CF 1 , can transmit onwards from CF individuals 2,8 and is more prevalent in this cohort than in other patient groups 20 . We found that the DCC expansion occurred shortly after life expectancy of CF individuals began to increase from infancy to over 10 years of age in the 1950s and 1960s (Extended Data Fig. 2) 21,22 . This period also saw the establishment of CF treatment centres and the widespread use of antipseudomonal and antistaphylococcal antibiotics 21 (to which M. abscessus is intrinsically resistant) (Extended Data Fig. 2), suggesting that the synchronous expansion of DCCs in the 1960s may have been driven both by increases in the number and density of susceptible individuals (promoting person-to-person transmission 2,8 ) and also the creation of a more permissive lung niche for M. abscessus, through suppression of other lung pathogens 1,23 . The more recent expansion of DCC4 in the mid-2000s (Fig. 1d) suggests that suitable conditions for the emergence of new clones have persisted within present day patient cohorts.
Rare intercontinental transmission events drive global transmission networks. We next investigated the global transmission network of M. abscessus by reconstructing the broad-scale geographical distribution of the three most prevalent clones, DCCs 1-3. In each case, isolates from different continents are interspersed within the phylogenetic tree (Fig. 2a), demonstrating historical intercontinental transmission events that are relatively uncommon, with approximately one out of every 50 lineages moving between continents each year (Fig. 2b). Furthermore, we found a significant correlation between the age of a clade and the number of continents it has been detected in, with most newly emerged clades only present in one continent (Extended Data Fig. 3). However, despite these low overall rates of intercontinental transmission, we found strong evidence for migration of M. abscessus between specific pairs of continents and, in particular, of migration of all three main DCCs from Europe to North America (Fig. 2c), potentially driven by the high density of CF individuals in Europe 24,25 .
Mutational spectrum analysis implicates smokers in DCC transmission networks. While our results indicate a major role for individuals with CF in the expansion of DCCs, it is clear from our temporal reconstruction that several of the DCCs emerged within the human population before CF survival increased beyond early childhood (Fig. 1c), suggesting an alternative pre-existing niche. Similarly, the transcontinental spread of DCCs is difficult to explain without an alternative human vector for transmission, given the historically limited travel of CF individuals.
We therefore proposed that individuals who smoke, given their recognized predisposition to mycobacterial infection [26][27][28][29] , might have provided the early niche for DCC emergence and facilitated their global dissemination. To test this hypothesis, we performed a mutational spectrum analysis of the M. abscessus genomes. The mutational spectrum is the pattern of different types of DNA mutation arising as a consequence of the action of various mutagenic processes and DNA repair mechanisms over time 30,31 . Since bacteria are likely to be exposed to different mutagens during pulmonary infection than when in the environment, we reasoned that we could use mutational spectrum analysis to define the historical exposure of DCCs to these different niches.
We first calculated the spectrum of mutations in M. abscessus attributable to the environment, by analysing mutations occurring along the internal phylogenetic branches of the non-DCC clades, which constitute long periods of predominantly or exclusively environmental replication (Fig. 3a,b). We were then able to define the specific mutational profile arising only during chronic infection of CF individuals (by examining within-patient longitudinal samples), and the profile generated during the expansion and dissemination of DCCs (by looking at the internal branches of the DCC phylogenies), which were both notably different from the spectrum attributable to the environment and from each other (Fig. 3c).
Decomposition analysis 32 (Fig. 3d) suggests that the environmental mutational spectrum consists of contributions from spontaneous mutation events (arising from cytosine deamination and inefficient homologous recombination repair) and mutations caused by alkylating agents, some of which have previously been reported to be present in soil 33,34 . In contrast, the within-CF patient spectrum has contributions from mutations associated with exposure to reactive oxygen species (ROS) 30 , thought to be generated at high levels within the CF lung 35 .
We found that the DCC-specific spectrum, in addition to having some contribution from ROS-related mutagenesis, was dominated by a general (context-independent) elevation in C-to-A/G-to-T mutations, a pattern seen in lung cancers from smokers 36 and in human cells exposed to tobacco smoke in vitro 30 (Fig. 3d). We can therefore infer that the DCC expansion and transmission networks involve smokers as well as individuals with CF.

Individuals with CF and without CF (non-CF) are involved in DCC transmission networks.
Given the mutational evidence of involvement of smokers in the initial emergence and continued transmission of the DCCs, we sought to quantify whether current transmission networks included both those with and without CF, or whether they form discrete transmission chains. We found that isolates from the lungs of individuals with and without CF could both be found in all seven DCCs (Fig. 4a, Extended Data Fig. 4a and Supplementary Table 2) and in non-DCCs (Extended Data Fig. 4b), showing that all of the emergent clones are capable of infecting members of either group. We then linked individuals on the basis of the genetic distance of their isolates in terms of single nucleotide polymorphisms (SNPs), using a range of linkage cut-offs that capture possible person-to-person transmission 2 . This demonstrated many close acquisition linkages between individuals within both DCC and non-DCC clades ( Fig. 4b and Extended Data Fig. 5), showing that M. abscessus transmission is not limited to just the DCCs and raising the possibility of the emergence or discovery of more circulating clones in the future. We showed that local linkages dominate at smaller genetic distances, with national and then international linkages becoming more frequent at higher SNP distances ( Fig. 4c and Extended Data Fig. 5); the expected signature of geographical spread of transmissible clones. We were also able to show that, at all genetic distances, linkages could be identified between individuals with CF, between individuals with and without CF and directly between individuals without CF ( Fig. 4c and Extended Data Fig. 5) (although the relative proportions of these will be confounded by the different levels of sampling in the two groups). To confirm this apparent linkage between individuals with and without CF, we reconstructed transmission networks on the basis of SNP cut-offs of 10 and 38 SNPs (representing likely and possible transmission based on within-host diversity 2 , Fig. 4d). At both cut-off levels, transmission networks exist that include both individuals with and without CF, with each group exhibiting strong connectivity within the network (Extended Data Fig. 6). We further investigated the largest network (which consists of isolates from DCC1 from both individuals with and without CF, Fig. 4e) and found that this network had a global reach, with multiple international and national connections evident.

Discussion
Halting the growing rate of infections with M. abscessus will require a comprehensive understanding of how individuals acquire these infections and how the bacteria spread at local and global scales. We have applied evolutionary analyses of whole genome sequences to reconstruct the population history, spatiotemporal spread and mutational history of M. abscessus. Our analysis demonstrates a key role for person-to-person transmission in M. abscessus epidemiology. The synchronized expansion of six of the seven DCCs (apparent despite their different sample numbers, sample density and tree depths) was tightly associated with a period of multiple changes in the CF population ( Fig. 1d and Extended Data Fig. 2), indicating that the expansion was probably driven by the establishment of person-to-person transmission chains on the emergence of the CF lung niche. It is possible that these expansions may have additionally been promoted by increasing tobacco smoking rates in developed countries seen from the early 1900s to a peak around 1980 (Extended Data Fig. 2).
The rapid diversification in the DCC trees in the 1960s (Fig. 1c) can only be explained by multiple independent transmission chains that were established at this time, have persisted to the present day, and which involve both CF and non-CF individuals (Fig. 4). The age of these clones, the large number of individuals linked at very small SNP distances (Fig. 4), the high frequency of long range national and international linkages, and the correlation between genetic and geographical distance (Fig. 4c) are all patterns expected of a human lung pathogen and effectively rule out the hypothesis that a broadly spread genetically homogeneous environmental clone can explain the infection of CF individuals with related isolates 18 . While the main niche for the DCCs appears to be within human lungs, transmission is likely to be indirect and involve environmental and fomite intermediates 2,8 , which may lead to extrapulmonary DCC infections.
The smoking-related mutational signature that we identified within the DCCs (Fig. 3c,d) occurred along the internal phylogenetic branches within these clones, indicating that smokers have contributed to the expansion and continued transmission chains of M. abscessus DCCs. Smokers exhibit increased susceptibility to many infections, including NTMs 26,27,29,37,38 and Mycobacterium tuberculosis 28 , due to structural lung changes and impaired immune responses [39][40][41] . Correspondingly, previous studies have frequently identified M. abscessus in smokers 6,29,38 . It therefore seems likely that tobacco smoke contributes to M. abscessus mutational burden, although overall substitution rates may be similar across different patient groups due to the action of other mutagens, such as high ROS levels in CF individuals 35 .
Our data therefore support a model where individuals with CF are the most permissive host for M. abscessus infection (as rapid expansion only occurred when this niche became available) but where smokers and other individuals without CF can act as an inefficient reservoir for infection and long-distance transmission. This supports recent evidence that emerging M. abscessus lineages are evolving to become human adapted 42 . Strong evidence of M. abscessus transmission from individuals with CF 2,8 and higher rates of infection in this group compared with other susceptible groups 20 support the importance of individuals with CF as key hosts for M. abscessus. Involvement of individuals without CF in transmission chains (Fig. 4) efficiently explains two previously puzzling observations; the missing epidemiological links between individuals with CF infected with highly genetically similar isolates identified in previous studies [15][16][17] , and the evidence for international transmission when individuals with CF have historically had limited intercontinental travel. Furthermore, while infections do occur within hospitals 8,9 , our data indicate that many transmission events occur in other settings. Future studies incorporating a greater number of isolates from individuals without CF will be needed to determine the relative transmission rates between the different groups.
The population expansion that we describe in the DCCs only represents the initial expansion of these clones and epidemiological data indicate that the DCCs have continued to increase in prevalence to the present day 1 . While additional mechanisms have probably contributed, our data indicate that increasing person-to-person transmission is the key driver of the increased prevalence of M. abscessus. Our previous work 2 demonstrated phenotypic differences between clustered and unclustered M. abscessus isolates and comparison with the analysis here showed that that work included isolates from DCCs 1, 2, 3 and also from the newly identified 4 and 6 (but not DCCs 5 or 7). This shows that phenotypes associated with the clustered isolates are not limited to DCCs 1-3. However, further phenotyping comparing DCCs will be useful in future work. The seven DCCs are currently the only clones within the species whose expansion has been detected in our dataset. The more recent expansion of DCC4 (Fig. 1d) potentially occurred when this clone entered the population or had the opportunity to establish onward transmission. This coupled with the evidence for linkage between individuals carrying non-DCC strains (Fig. 4b), suggests that, as a species, M. abscessus is capable of human infection and transmission and that additional clones will continue to expand in the future when they have the opportunity to do so. Phylogeographic reconstruction ( Fig. 2a) did not identify the regions within which early circulation occurred and it is therefore difficult to forecast where additional clones may expand in the future. However, our data indicate that it is reasonable to expect the early stages of expansion to occur within CF populations. Continued sequencing of clinical isolates and comparison with global sequences will be important to detect any additional clones that emerge in the future. Results are shown as the difference from the environmental spectrum. Asterisks indicate significantly different proportions relative to the environmental mutational spectrum, calculated through permutation analysis. d, Deconvolution analysis reveals relative contributions of specific mutational processes to each spectrum: unassigned (grey), spontaneous (yellow), alkylating agent-associated (purple), ROS-associated (red) and smoking-related (blue).
Further studies to understand how individuals without CF might transmit M. abscessus (both to individuals with and without CF) are vital and may provide information on how to reduce transmission risk. However, from previous studies 2, 8,9 it is reasonable to expect transmission to be indirect through the environment, and the role of individuals without CF in contaminating such environments requires particular attention. Comprehensive surveillance of potential environmental intermediates and isolates from individuals with and without CF may better define these transmission pathways.
In conclusion, reconstruction of the evolutionary history of the M. abscessus DCCs supports early emergence into non-CF populations, including smokers, followed by expansion due to person-to-person transmission that began with the emergence of the CF lung niche and has continued with contributions from individuals with and without CF. It also suggests that long-distance transmission may be due to low-level infection of individuals without CF. Our study demonstrates the use of combining whole genome sequencing with evolutionary analyses to disentangle mechanisms of pathogen transmission.  Sequencing reads from each sample were mapped against the corresponding subspecies reference sequence using the multiple_mappings_to_bam pipeline (https://github.com/sanger-pathogens/bact-gen-scripts) with BWA-MEM as the aligner. ATCC19977 (accession no. CU458896.1) was used as the reference for M. a. abscessus and CIP_108297 (accession no. GCF_001792625.1) for M. a. massiliense. SNPs were called from this alignment using the multiple_ mappings_to_bam pipeline. Subspecies phylogenetic trees were reconstructed using RAxML v.8.2.12 (ref. 48 ) with the general time reversible (GTR) model of nucleotide substitution and gamma rate heterogeneity with four gamma classes. To enable extraction of maximal genetic variation, clusters of samples were identified in the subspecies trees using FastBAPS 49 . This clustering identified 19 clusters in M. a. abscessus and 17 clusters in M. a. massiliense (Extended Data Fig. 1). A summary of each FastBAPS cluster is provided in Supplementary Table 1.
All subsequent analyses were carried out on each FastBAPS cluster independently. De novo assembly was carried out on each sample as previously described 50 and the best assembly was identified for each FastBAPS cluster on the basis of number of contigs and N50-N90 values (Supplementary Table 1). Samples from each FastBAPS cluster were mapped against their respective best assembly as above to maximize the captured SNP diversity. Recombination was removed from FastBAPS cluster alignments using Gubbins v.2.4.1 (ref. 51 ) and phylogenetic trees were reconstructed for each FastBAPS cluster as above. Phylogenetic trees were viewed and figures constructed using FigTree 52 and GGTREE 53 .
DCCs were identified as clusters of highly related sequences collected from at least 20 patients on multiple continents. DCCs 1, 2, 3, 5 and 7 are FastBAPS clusters while DCCs 4 and 6 are subclusters within a FastBAPS cluster (Supplementary Table 1

). No DCCs were identified in M. a. bolletii.
Phylogenetic analyses, temporal, spatiotemporal and population reconstruction. Temporal phylogenetic reconstruction was carried out on DCCs 1-7. DCC-specific datasets were constructed containing the earliest sequenced sample from each patient that clusters within the DCC. These samples were mapped against the respective DCC reference sequence (Supplementary Table 1) as above and a maximum likelihood phylogenetic tree reconstructed with RAxML 48 as above. Methods to infer substitution rates and ancestral dates are only valid if there is a temporal signal within the dataset 54 . We initially assessed temporal signal within each dataset using root-to-tip randomization. In each case, the maximum likelihood tree was rooted to minimize the heuristic residual mean square score using TempEst 54 . We examined the root-to-tip correlation visually (Extended Data Fig. 7a) and through comparison of the R 2 correlation between sample collection date and root-to-tip distance with 1,000 randomizations of the tip dates to identify significance of the correlation. A significant positive correlation was observed for DCCs 1-4 (P < 0.001). We therefore initially reconstructed the temporal history of these DCCs using BEAST v.2.4.2 (ref. 19 ). We used the Hasegawa-Kishono-Yano (HKY) model of nucleotide substitution. We used the relaxed log-normal clock model with a log-normal prior on the substitution rate with mean set to the estimated slope in TempEst and standard deviation 0.5. We modelled the population history using the coalescent Bayesian skyline population prior. At least three independent runs were carried out for 100,000,000 steps for each dataset. Convergence was assessed using Tracer v.1.7 (ref. 55 ).
As a more thorough test of temporal signal within each DCC, we carried out the date randomization test 56 . BEAST v.2.4.2 was run on each DCC dataset using a uniform substitution rate prior between 1 × 10 −9 and 1 × 10 −5 , with these bounds chosen to encompass the likely substitution rates for Mycobacteria based on previous work 2, 56 . Other priors were as described above. The results from these uniform prior runs were highly similar to those with the informed substitution rate prior in each case (Extended Data Fig. 8). Ten date randomizations were performed where the sequence collection dates were randomly assigned to tips. BEAST was run on each of these randomized datasets independently using the same uniform substitution rate prior (1 × 10 −9 -1 × 10 −5 ). All four DCCs passed the date randomization test, defined here as the median posterior substitution rate and most recent common ancestor dates with the real sample collection dates not overlapping with that of any of the ten date randomizations (Extended Data Fig. 8). We did not attempt these analyses with DCCs 5-7 as they did not pass the correlation test.
The inferred substitution rates of DCCs 1-3 were highly similar (Extended Data Fig. 7b). The substitution rate of DCC4 is higher (Extended Data Fig.  7b), probably due to this clade having a far more recent common ancestor date than DCCs 1-3 (Fig. 1c) and therefore less opportunity to remove deleterious substitutions. As DCCs 5-7 contain similar levels of diversity to DCCs 1-3 (Extended Data Fig. 7c), we reconstructed their temporal history as above but using a uniform substitution rate prior with boundaries of 8.76 × 10 −8 -2.41 × 10 −7 , chosen as the upper and lower 95% highest probability density (HPD) substitution rate estimates for DCCs 1-3 (Extended Data Fig. 7b).
We determined whether each DCC has undergone a historical population expansion by using the Bayesian skyline plot estimates (Extended Data Fig. 9) of relative genetic diversity in the posterior distribution. We used all samples in the posterior distribution and found that all samples in all DCCs exhibited an increase in relative genetic diversity of more than tenfold relative to the value at the root of the tree, thereby strongly supporting a historical population expansion in each case. We identified the date of the expansion in each DCC by calculating the earliest date at which the relative genetic diversity increased by more than tenfold relative to the root of the tree and combined these values into a single distribution, from which the median and 95% HPD was calculated in each case.
Before carrying out spatiotemporal reconstruction, we calculated the association index 57 of the distribution of collection continents across the maximum likelihood tree of each DCC. This was significant in each case on the basis of 1,000 location randomizations (P < 0.001 in each DCC), indicating a correlation between phylogeny and continent of collection. We carried out asymmetric phylogeographic reconstructions of DCCs 1-3 using the BEAST_CLASSIC package v.1.3.2 in BEAST v.2.4.2 (ref. 19 ). Each sequence was labelled with the continent of collection. We used an informed log-normal substitution rate prior and Bayesian skyline population prior as above. We used an exponential prior on the overall rate of lineage movements with mean 1.0. The relative rates of migration between different continent pairs were modelled using a gamma distribution with alpha and beta both set to 1.0. As the number of sequences collected from each continent is unequal for each DCC, we assessed the robustness of our inferences by randomly subsampling the sequences from overrepresented continents and rerunning the spatiotemporal reconstruction. We carried out the subsampling five times and found that the results were highly similar in all subsamples (for example, Fig. 2b). Supported migration routes were identified using SPREAD v.0.9.6 (ref. 58 ) as directed continent pairs had Bayes factor support greater than three in the dataset without subsampling and at least four of the five subsampled datasets.
Mutational spectrum analysis. The mutational spectrum consists of all of the mutations that have occurred within the history of a sample set in their surrounding nucleotide context 31 . It is necessary to identify the direction of each mutation, that is, the parental nucleotide and the descendent nucleotide. To identify the mutations that have occurred and their direction, we carried out ancestral reconstruction on each FastBAPS cluster phylogenetic tree. Recombination was removed and phylogenetic trees reconstructed as above.
Ancestral reconstruction was carried out on all variable alignment positions using the phylogenetic analysis by maximum likelihood (PAML) package v.4.9 (ref. 59 ). We compared the fit of HKY, HKY + GAMMA, GTR and GTR + GAMMA models of nucleotide substitution. Results were highly similar with all models and in all cases either the GTR or GTR + GAMMA model was supported. The mutations that occurred along each branch in the phylogeny were extracted from the PAML output. The surrounding nucleotide context of each mutation was identified from the reference sequence that was mapped against. The number of polymorphic sites contributing to each mutational spectrum is shown in Extended Data Fig. 10.
To compare the mutational spectrum in different niches, the branches in the phylogenetic tree were divided into categories (Fig. 3a). We reasoned that the internal branches within the non-DCC clusters will have been environmental as these branches probably often span hundreds to thousands of years during which time prolonged human infection will have been unlikely. We therefore calculated the environmental mutational spectrum by combining contextual mutations inferred along all internal branches of non-DCC clusters.
The phylogenetic branches within clades containing sequences from a single patient represent within-patient evolution. We therefore calculated the within-patient mutational spectrum by combining contextual mutations inferred along branches within monophyletic patient clades.
The relative contributions of environmental and within-patient evolution along tip branches and branches leading to patient ancestors is unclear as the patient may have acquired the infection at any point along this branch. We therefore did not include these branches in the environmental mutational spectrum or the within-patient spectrum.
To examine the mutations acquired during DCC transmission chains, we combined the contextual mutations that occurred along the internal branches within the seven DCC trees (Fig. 3a). We excluded the deep branches in these clades that precede population expansion (Fig. 3a) to only examine mutations that have occurred since emergence and therefore in more recent transmission chains.
We compared mutational spectra between niches by subtracting the inferred environmental spectrum from the DCC internal branch and within CF patient spectra. Significance of observed differences was assessed through 1,000 independent down-samplings of the inferred environmental mutations to the number identified along DCC internal branches or within CF patients. Contextual mutations were identified as significant if their calculated proportion fell outside two standard deviations of the mean proportion in the 1,000 replicates. This process was repeated ten times and all reported significant mutations were significant in all ten runs.
Decomposition of the mutational spectrum into input signatures was carried out using signal 32 (https://signal.mutationalsignatures.com/, date last accessed 15 November 2020). The contextual mutations that were elevated in the DCCs relative to the environment were combined into a 10,000 mutation catalogue with their relative frequencies representing their relative enrichment above the environment. This catalogue was used as input for signal specifying lung as the originating organ. Mutational drivers were assigned from the respective COSMIC mutational signature (https://cancer.sanger.ac.uk/cosmic/signatures/SBS/index.tt, date last accessed 15 November, 2020).
Transmission network reconstruction. SNP distances were calculated between all pairs of samples within each FastBAPS cluster using PairSNP 60 and the minimum SNP distance between each pair of patients extracted. Patients were classified as being linked at a given SNP cut-off if their closest pair of samples differed by that number of SNPs or fewer. Localized linkages were classified on the basis of available metadata if the patients were in the sample hospital, CF Trust, city or state.
Transmission networks were reconstructed on the basis of minimum SNP distance between patient isolates. SNP distances of 20 and 38 SNPs were previously shown to represent 'probable' and 'possible' transmission, respectively, on the basis of the number of SNPs observed in within patient infections 2 . We therefore plotted the transmission network at ten SNPs to represent very likely transmission and 38 SNPs to represent possible transmission.
Transmission network connectivity measures (Extended Data Fig. 6) were calculated using 38 SNPs as a cut-off to include linkages, with 38 chosen to include possible person-to-person transmission events 2 . Therefore, any patient linkages of 39 SNPs or more were not included. The total number of linkages involving patients with or without CF was identified and divided by the total number of patients within the respective group to calculate the average number of transmission linkages per patient with and without CF. To calculate the weighted connectivity measures, each edge in the transmission network was given a weighting of a 39-SNP distance. Therefore, edges linking patient pairs whose isolates differ by zero SNPs were given a weighting of 39 and edges had zero weighting if they connect patient pairs whose isolates differ by 39 or more SNPs. The total weighting of all edges involving patients with or without CF was identified and divided by the total number of patients within the respective group to calculate the average weighted connectivity. To identify the average weighting of CF-CF, CF-non-CF and non-CF-non-CF linkages, the total weighted connectivity of each edge type was calculated by summing the weights of all respective edges and this was divided by the total number of potential linkages of that type.
Trends in tobacco smoking. Annual estimates of the number of cigarette sales per adult per day were obtained from https://ourworldindata.org/smoking (last accessed 18 March 2021). 30 countries were included with data available from before 1960.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All source data files are available at https://doi.org/10.5281/zenodo.5116229. Accession numbers of all samples used in analyses are included in this source data and in Supplementary Table 3. Source data are provided with this paper.

Code availability
Code used for data analysis is available at https://doi.org/10.5281/zenodo.5116229. Fig. 2  The temporal evolutionary history of each DCC was reconstructed with an informed prior on the substitution rate based on root-to-tip analysis and a uniform prior (see methods). The results from these priors matched closely in all cases. We compared the tree root date and substitution rate inferences from these priors with ten date randomization runs where the sample collection dates were randomly assigned to sequences and the temporal history reconstructed. In all cases, the posterior distributions inferred with the real dates were highly different from those inferred with the date randomizations, suggesting the existence of a significant temporal signal within each DCC.

nature research | reporting summary
April 2020 Corresponding author(s): R. Andres Floto, Julian Parkhill Last updated by author(s): Jul 20, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection No software was used in collection of the data Data analysis Phylodynamic and phylogeographic analyses were performed with BEAST v2.4.2. Convergence of log files was examined with Tracer v1.7. Examination of temporal signal was carried out with TempEst v1.5.1. Read mapping was performed using the multiple_mappings_to_bam pipeline v1.6. Cluster assignment was carried out using FastBAPS v1.0.0. Initial assignment of samples to subspecies was carried out using mash v2.1.1. Recombination was removed using gubbins v2.4.1. Reconstruction of non-temporal trees was carried out with RAxML v8.2.12. Trees were viewed with FigTree v1.4.4 and GGTREE v3. Supported inter-continental movements were identified with SPREAD v0.9.6. Nucleotide substitutions were reconstructed onto phylogenetic trees using PAML v4.9. SNP distances between sequence pairs were calculated using PairSNP v0.0.1. Additional custom scripts, including those used to calculate mutational spectra are available in GitHub at DOI https://doi.org/10.5281/zenodo.4279139.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

April 2020
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The accession codes for all sequences used in this study are provided in Table S3. All source data, including alignments, phylogenetic trees, BEAST XML files and mutational spectra have been deposited in https://doi.org/10.5281/zenodo.5116229.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. All studies must disclose on these points even when the disclosure is negative.

Study description
Phylogenetic clustering of available whole genome sequences followed by phylodynamic and phylogeographic analysis of selected clones. Mutational spectrum analysis comparing between clusters. Transmission network analysis of whole genome sequences.

Research sample
Raw data was 2045 whole genome sequencing samples of Mycobacterium abscessus available on the European Nucleotide Archive. This data represents the global population structure of Mycobacterium abscessus

Sampling strategy
We included all available whole genome sequencing samples which capture the diversity from over 1100 patients on five continents

Data collection
Data was downloaded from the European Nucleotide Archive, sample metadata was obtained from authors of previous studies or data tables in previous studies Timing and spatial scale We used all available while genome sequencing samples, which were collected between 1998 and 2017. These samples were collected from 16 countries on 5 continents. We retained all available samples to maximise sequence, temporal and spatial diversity

Data exclusions
No data were excluded from the study Reproducibility All Bayesian analyses involved at least three independent runs and results were highly similar between runs. Phylogeographic analyses were carried out with all data and with five independent down-samplings of over-represented locations. All repeats were successful and all repeat analyses showed the same results

Randomization
Samples were divided into clusters based on phylogenetic relationships. Cluster cut-offs were assigned using hierarchical Bayesian analysis of population structure