Mycobacterium abscessus pathogenesis identified by phenogenomic analyses

The medical and scientific response to emerging and established pathogens is often severely hampered by ignorance of the genetic determinants of virulence, drug resistance and clinical outcomes that could be used to identify therapeutic drug targets and forecast patient trajectories. Taking the newly emergent multidrug-resistant bacteria Mycobacterium abscessus as an example, we show that combining high-dimensional phenotyping with whole-genome sequencing in a phenogenomic analysis can rapidly reveal actionable systems-level insights into bacterial pathobiology. Through phenotyping of 331 clinical isolates, we discovered three distinct clusters of isolates, each with different virulence traits and associated with a different clinical outcome. We combined genome-wide association studies with proteome-wide computational structural modelling to define likely causal variants, and employed direct coupling analysis to identify co-evolving, and therefore potentially epistatic, gene networks. We then used in vivo CRISPR-based silencing to validate our findings and discover clinically relevant M. abscessus virulence factors including a secretion system, thus illustrating how phenogenomics can reveal critical pathways within emerging pathogenic bacteria.

We examined the relationship between phenotypes, finding correlations within, and sometimes between, pathogenic traits ( Fig. 1b and Supplementary Fig. 3). To explore whether there were distinct patterns of bacterial behaviours, we used experimentally derived data to plot individual isolates in phenotypic space, identifying three discrete groups, each associated with different clinical outcomes (Fig. 2a-c and Supplementary Fig. 3). Specific phenotypic groups were overrepresented in particular clades and among phylogenetic nearest neighbours, indicating that these phenotypic groups represent distinct heritable traits (Fig. 2d,e).
Isolates from Group 3 demonstrated the fastest growth in liquid culture and quickest replication within macrophages, caused higher mortality in infected macrophages and Drosophila, and the greatest antimicrobial and inflammatory responses in flies, whereas Group 1 isolates had the opposite characteristics. Group 2 isolates had phenotypic behaviours that were intermediate compared with the other two groups and were associated with the most favourable clinical outcome, potentially related to their macrolide susceptibility (a key determinant of treatment response 15,16 ) explained by known erm41 and 23S ribosomal RNA genotypes ( Supplementary  Fig. 3). By contrast, we found that, despite having similar levels of macrolide resistance, Group 1 and Group 3 isolates were associated with very different clinical outcomes in infected patients, highlighting the importance of phenotypic characteristics other than antimicrobial susceptibility in determining prognosis, and suggesting that immunogenic isolates might be cleared more easily by patients (as reported previously for other pathogenic bacteria [17][18][19][20]. We next examined the contribution of different colony morphotypes and M. abscessus subspecies to the phenotypic analysis. Although morphotype transition from smooth to rough, caused by disrupted glycopeptidolipid production, has previously been linked to increased in vitro and in vivo virulence 11,21 , the 18% of our isolates that were of the rough morphotype were not associated with worse patient outcomes, or changes in outcome during macrophage or Drosophila infection ( Supplementary Fig. 4). Similarly, stratifying by M. abscessus subspecies revealed no differences in clinical outcome and only limited differences in phenotypic behaviour (apart from the expected difference in clarithromycin resistance due to recognized erm41 truncation in M. abscessus subspecies massiliense; Supplementary Fig. 4). Phenotypic clustering and resultant group composition were not affected by considering only isolates with a smooth morphotype or from the M. a. abscessus subspecies, indicating that our analysis has uncovered unexpected phenotypic relationships.

Structure-guided GWAS.
To understand the genetic basis for these important variations in M. abscessus behaviour, we used whole-genome sequence data to perform a GWAS for each phenotype (Fig. 3a), evaluating approximately 270,000 genetic variants comprising single nucleotide polymorphisms (SNPs), insertions and deletions (INDELs). We used mixed models corrected for population structure 22 to identify locus effects, as well as uncorrected linear models to ensure we captured lineage effects 23 . In total, we identified 1,926 hits (involving 1,000 genes) across 46 phenotypes (Supplementary Data). These included previously known genetic determinants, such as the 16S and 23S rRNA mutations associated with constitutive aminoglycoside and macrolide resistance (P = 1.3 × 10 −75 and P = 1.5 × 10 −54 respectively; Supplementary  Fig. 5), thereby confirming the effectiveness of our approach.
Current GWAS approaches are limited in their ability to accurately identify causal variants by both the presence of linkage disequilibrium, which in the case of M. abscessus (as with other bacteria 24,25 ) is extensive and genome-wide ( Fig. 3a and Supplementary  Fig. 6), and by a failure to consider the impact of mutations on protein function 26,27 .
We therefore applied proteome-wide computational structural modelling to evaluate the probable functional impact of all non-synonymous SNPs across the genome, by applying our graph-based machine learning method mutation cut-off scanning matrix (mCSM) 28 to our comprehensive M. abscessus structural database Mabellini 29 (Fig. 3b) to identify probably causal mutations.
As an example, the GWAS for intracellular replication of M. abscessus within macrophages identified a number of hits at genome-wide significance including a cluster of variants within mycobactin synthesis genes (Fig. 3c). Mycobactins are mycobacterially produced iron chelators that efficiently scavenge iron during intracellular growth within macrophages, providing the iron essential for mycobacterial protein synthesis and other critical cell processes 30,31 . Structural modelling predicted that one variant, a missense mutation (Ile256Thr) in the mycobactin polyketide synthetase (mbtD) gene, was most likely to result in loss of protein function and therefore be causally related to the phenotypic change, probably through altering the ability of intracellular M. abscessus to access iron. To experimentally validate this structural modelling, we created an MbtD knockout mutant that demonstrated impaired intracellular growth in macrophages, and was able to be complemented by episomal expression of MbtD with the Thr410Ala mutation (predicted by mCSM to be tolerated), but not by the Ile256Thr mutation (predicted to be deleterious; Fig. 3d).
Analysis of genome-wide epistasis through mutational coevolution. To understand whether mutations across the genome might have co-evolved, indicating potential epistatic interactions between genes, we deployed correlation-compressed direct coupling analysis (CC-DCA 32 ) on whole-genome sequences from 2,366 clinical isolates of M. abscessus to identify whether variant co-occurrence deviated from the expected frequencies based on linkage disequilibrium 33,34 , and thus indicates evolutionary co-selection. We evaluated 10 12 potential couplings (resulting from approximately 10 6 genetic variants) and identified 1,168,913 that were significantly enriched (accepting a false discovery rate (FDR) of 10 −6 ; Fig. 4a and Supplementary Fig. 6). We found many enriched couplings between known or predicted virulence genes ( Fig. 4b and Supplementary Data), indicating pathogenic evolution of M. abscessus (as identified previously 5,35 ). We used the ranked outputs from the CC-DCA analysis to establish discrete networks of genes that have co-evolved, and thus probably interact functionally (Fig. 4c). Many of these putative interactions could be recapitulated using orthogonal Cla, clarithromycin; Clo, clofazimine; FeV1, forced expiratory volume; Fox, cefoxitin; Lin, linezolid; OD-AUC, area under the OD curve; qPCR, quantitative polymerase chain reaction. b, Pearson correlation coefficients within and across phenotypic groups shown as a matrix, with two-sided non-significant (unadjusted P > 0.05) associations shown in white.   36 . As examples, we find highly connected clusters of mammalian cell entry genes, implicated in controlling adhesion, uptake and intracellular survival within macrophages 37,38 , and genes involved in bacterial secretion systems. In addition, we discovered a network of mycobactin synthesis genes (Fig. 4d), including some identified through our GWAS analysis (Fig. 3c,d) that, when silenced by CRISPR interference (CRISPRi) knockdown, led to similar impairment of intracellular bacterial growth (Fig. 4e), supporting a functional basis for these CC-DCA-derived gene networks.
Defining genetic determinants of in vivo virulence in M. abscessus. Finally, we sought to integrate outputs from our detailed multidimensional phenotyping, structure-guided GWAS analysis and DCA-based epistatic mapping, to achieve a systems-level understanding of the genetic basis for important pathological processes in M. abscessus.
We focused on in vivo infection in Drosophila, a model that replicates some features of human mycobacterial infection (particularly innate and cell-autonomous immune responses) (Fig. 5a) [39][40][41][42] . Among the top hits from our structure-guided GWAS analysis ( Fig. 5b and Supplementary Fig. 8) were a deletion in a component of a putative Type II secretion system (MAB_0471) and a deleterious mutation in a non-ribosomal peptide synthetase (MAB_3317c). Both variants had independently arisen as homoplastic mutations across the M. abscessus phylogenetic tree (Fig. 5c), including within the ancestor of one of the dominant circulating clones (DCC2) of M. a. abscessus, responsible for several transmission networks among CF patients 3,5 . We found that isolates with either of the two genetic variants were associated with prolonged survival of infected Drosophila and more persistent clinical infection of CF patients ( Fig. 5d and Supplementary Fig. 8).
We sought to experimentally validate both these GWAS hits through CRISPRi-based transcriptional silencing as described previously 43 . Although we found no effect of gene silencing on growth in liquid media, silencing of either MAB_0471 or MAB_3317c during in vivo infection significantly increased Drosophila survival ( Fig. 5e and Supplementary Figs. 8 and 9), indicating that these genes regulate M. abscessus virulence.
Our DCA analysis revealed that both these GWAS hits were part of a discrete network of likely epistatic genes involved in bacterial secretion, cell wall biosynthesis, metabolism and transcriptional groups. e, Nearest phylogenetic neighbours most commonly belong to the same phenotypic group. P values were calculated using a two-tailed chi-squared test or one-way analysis of variance, as appropriate. regulation ( Fig. 5f and Supplementary Fig. 8). To experimentally test this predicted epistasis, we selected another gene from the same network (MAB_0472) and transcriptionally silenced it during in vivo infection. We found that Drosophila survival was also increased by its CRISPRi knockdown (Fig. 5g), suggesting that all three genes are functionally interacting.

Discussion
We have shown that phenogenomic analysis can accurately identify critical gene networks responsible for virulence and other characteristics in poorly understood bacterial pathogens, such as M. abscessus. Our approach of integrating computational structural modelling with conventional GWAS analyses and DCA-driven mapping of gene interaction networks has revealed key determinants of M. abscessus antibiotic resistance and virulence.
We have discovered three phenotypic clusters, independent of colony morphotype and subspecies, with distinct virulence characteristics and clinical outcomes (not attributable to the known influence of macrolide resistance), that could represent distinct evolutionary trajectories or different points on a single pathoadaptive journey.
To gain systems-level understanding of M. abscessus pathobiology, we deployed GWAS analysis, informed by proteome-wide computational structural modelling, to a wide spectrum of in vivo, in vitro and clinical traits, confirming known genetic associations for antibiotic resistance and discovering a large number of unknown genotypephenotype associations, several of which we validated experimentally. For example, we identified MbtD, a polyketide synthase involved in mycobactin synthesis, that regulates intracellular survival of M. abscessus and therefore could be targeted therapeutically. b, To identify causal variants and overcome genome-wide linkage, the functional impacts of genetic variants were classified as having high effects (large deletions, frameshifts, start/stop alterations; red), moderate effects (inframe insertions/deletions; blue and green) and low effects (synonymous and intergenic variants; grey). The impacts of missense mutations were estimated using proteome-wide computational structural modelling with variants considered as having high (red), moderate (blue) or low (green) functional effects based on terciles of the change in protein stability, estimated using mCSM. c, Manhattan plot of the mixed model GWAS analysis of 264,122 genetic variants for intracellular M. abscessus replication (two-sided Wald test) with the threshold for multiple hypothesis testing (black line). Several loci in mbtD, including four missense mutations, were identified as potential mechanisms relevant for intracellular M. abscessus survival. (Inset) Three-dimensional structural model of MbtD with the high-effect missense mutation Ile256Thr shown in red. d, MbtD knockout mutants complemented with wild-type or identified MbtD variants had similar growth rates in broth culture but replicated differently within THP-1 cells. experiments were performed in triplicate on at least three separate occasions and are presented as mean ± s.e.m. Conditions were compared with a two-sided unpaired t-test.   We successfully explored potential epistatic interactions by applying DCA to discover co-evolved proteins and thus inferring networks of potentially functionally linked genes. We confirmed the ability of DCA to reveal gene-gene interactions by comparing outputs with orthogonally derived gene networks created from prior knowledge by the STRING database and experimentally validated the functional relatedness of some of the DCA networks by evaluating CRISPR knockdown of linked genes in both in vitro and in vivo infection assays.
Combining these approaches, we were able to discover several clinically relevant mycobacterial virulence factors. For example, by using a Drosophila infection model and structure-guided genomic mapping, we revealed two genes, a putative secretion system protein (MAB_0471) and a non-ribosomal peptide synthetase (MAB_3317c), that were linked within a DCA-discovered functional network. We validated both genes experimentally and found that both were associated with clinical outcomes in patients. Our approach capturing and mapping multidimensional phenotypes to genotypes using structural-guided GWAS and defining epistatic interactions through mutational co-evolution can identify clinical relevant phenotypes, virulence-associated mutations and important pathobiological pathways that could be readily applicable to any pathogen, permitting rapid identification of prognostic indicators and potential drug targets.

Methods
Sample collection. Samples were obtained from patients with chronic pulmonary disease and respiratory M. abscessus infection (baseline characteristics are given in Supplementary Table 1

Analysis of bacterial growth on different media.
Single M. abscessus colonies were picked for phenotypic analysis. Bacterial growth in nutrient-rich medium (Middlebrook 7H9 supplemented with 0.4% glycerol and 10% albumin dextrose catalase enrichment) or carbon source limited media (Middlebrook 7H9 plus carbon source) was assessed in 96-well plates and quantified by measuring the optical density at 600 nm (OD 600 ) every 12 or 24 h for 10 d. An OD 600 above 0.15 assessed in 96-well plates correlated well with log(colony-forming units) (c.f.u.; initial R 2 , 0.96; R 2 after 1 d mycobacterial growth in plates, 0.97). The carbon sources tested were acetate (10 mM), glucose (2.5 mM), lactate (10 mM) and pyruvate (10 mM). Growth of each isolate across all conditions was assessed in quadruplicate. For each well, a logistic function was fitted using the R package growthcurver 44 . OD values on day (d)1 were used for early growth and the area under the logistic curve for up to d10 were used to assess general growth. The median of the quadruplicates was used as the representative phenotype. If the readout was highly variable (coefficient of variation >20%) the measurement was considered missing. For assessing potential growth differences of M. abscessus mutants, mutants were grown in glass tubes in Middlebrook 7H9 supplemented with 0.4% glycerol and 10% ADC, and assessed daily with a McFarland reader. CRISPRi mutants were additionally supplemented with 100 ng ml −1 anhydrotetracycline.

Drug resistance. Drug resistance was quantified with minimal inhibitory concentrations (MIC) according to the Clinical and Laboratory Standards
Institute guidelines 45 . In brief, ~5 × 10 4 c.f.u. of each isolate were inoculated in increasing antibiotic concentrations in Mueller Hinton broth (amikacin, cefoxitin, clarithromycin and linezolid) or Middlebrook 7H9 supplemented with 0.4% glycerol and 10% ADC (clofazimine) per well. Experiments, including a growth control, were carried out in duplicate for every isolate. The reference strain ATCC 19977 was evaluated once per experimental batch. The MIC was recorded as the lowest drug concentration inhibiting visible growth at d3, d5, d11 and d14. The mean of both experiments (that is, the antibiotic concentration), was recorded and log 2 transformed. Experiments in which a single MIC could not be obtained (for example, because of visible growth at higher drug concentrations) were excluded.
Transformation of clinical isolates. An expression plasmid carrying tdTomato (obtained from L. Kremer) was used to transform clinical isolates, grown in 10 ml of Middlebrook 7H9 supplemented with 0.4% glycerol, 10% ADC and 0.05% Tween 80 at 37°C in a shaking incubator. Competent log-phase bacteria were washed with 10% glycerol containing 0.05% Tween 80. Then 200 μl of the pellet together with 1 μg of DNA was transferred to a cuvette and electroporated (2,500 V, 1,000 Ω, 25 μF). Transformed bacteria were recovered for 24 h in antibiotic-free medium and then transferred to a selective agar plate (7H11 complemented with 10% oleic albumin dextrose catalase enrichment and 1 mg ml −1 hygromycin). Red colonies were picked and cultured in media containing 1 mg ml −1 hygromycin.

Generation of single cell suspensions.
The isolates were obtained from frozen stocks and grown in Middlebrook 7H9 (supplemented with 0.4% glycerol, 10% OADC and 0.05% Tween 80). Exponentially growing isolates were centrifuged at 200g for 5 min and the supernatant passed multiple times through a 27-gauge needle before filtrating with a 5 μm filter (Acrodisc syringe filter). Single cell suspensions were standardized to a McFarland turbidity of 0.5 and frozen at −80°C. High-content image acquisition and analysis. After paraformaldehyde fixation plates were stored at 4°C and imaged within 24 h on the high-content screening platform Opera Phenix (Perkin Elmer). Spinning disc confocal images of 37 fields per well and three fluorescence channels (blue 405/456, red 561/599, far-red 640/706) were acquired with a ×63 water immersion objective (NA 1.15). Automated image analysis was performed with Columbus software (v.2.9.0, Perkin Elmer). The 37 fields were pooled to single wells. Blue (4,6-diamidino-2-phenylindole) and far-red (CellMask DR) fluorescence channels were used to define cells and their borders. To evaluate the viability of individual macrophages, a supervised machine learning approach (Columbus; Perkin Elmer) based on nuclear, cytosolic and cell features was used to train a linear classifier, which was then applied to all images to classify macrophages as dead or alive. Intra-and extracellular mycobacteria were defined using a spot assay on the red fluorescence channel. For each cell, as well as the extracellular space, the spot area and mean fluorescence intensity were documented. Both measures were used to quantify the mycobacterial load (intracellular load = total sum of (spot area per cell × mean spot intensity per cell); extracellular load = extracellular spot area × extracellular mean spot intensity; total mycobacterial load = intracellular load + extracellular load). Wells with a cell number below 800 were removed; the median of the remaining wells was used. As the most meaningful outputs we reported the fraction of total cells infected (number of M. abscessus infected cells/total number of cells), the intracellular and total M. abscessus load as well as the fraction of cells alive (number of cells alive/total number of cells). Mycobacterial load or cell kinetics are reflected in the ratio d2/d0 (delta). Quantitative PCR with reverse transcription of Drosophila antimicrobial peptides and cytokines. At least five flies were infected with each isolate to assess the immune response to infection. At 28 h after infection, flies were homogenized in 100 μl of TRIzol (Invitrogen) and stored at −20°C. RNA was then extracted and complementary DNA synthesis was carried out with the RevertAid Reverse Transcriptase (200 U µl −1 , Thermo Fisher Scientific). Quantitative PCR analyses were performed in duplicate using the Sensimix SYBR no-ROX kit (Bioline) 46,47 using the primers given in Supplementary Table 2.
Patient outcomes. Clinical outcome data were available for 300 CF patients (as reported previously 3,5 ). Patients were classified as having cleared M. abscessus infection (defined as documented culture conversion or a sustained clinical improvement where further cultures were unavailable) or as having persistent infection (if cultures remained positive or the clinical state worsened where no cultures were available) 5 . Lung function decline was estimated as the percentage change in the forced expiratory volume from the available lung function assessment over a period of 12 months from baseline (before infection).

Phenotype association.
To assess relatedness of phenotypes and phenotypic groups, all phenotype pairs were correlated (Pearson correlation) and a correlation matrix plotted. To identify characteristic phenotypic signatures of clinical isolates, isolates were clustered using representative experimental phenotypes (amikacin MIC d11, clarithromycin MIC d11, growth d10, change in intracellular MAB load, macrophage cell death d2, Drosophila attacin level, mean Drosophila survival). Some 199 isolates with at most one missing value (52 isolates had one missing value) were correlated using pairwise Pearson correlation. The resulting correlation matrix was used as a distance measure to cluster isolates with t-SNE 48 using the R package Rtsne. Clustering was validated with k-means clustering with a predefined set of three clusters. Phenotypic groups were compared using one-way analysis of variance or chi-squared test, as appropriate, and mapped onto the phylogeny. For each isolate a nearest phylogenetic neighbour was identified, thereby assessing whether neighbours are more likely to belong to the identical phenotypic group (chi-squared of each phenotypic group comparing neighbour pairs versus non-neighbour pairs).
Genome-wide association analysis. Two statistical genome-wide association approaches were employed to assess the effect of individual variants (SNPs, INDELs, large deletions) on phenotypes. A linear mixed model controlling for population structure, where the phenotype is modelled on the fixed locus effect and the random effect of the relatedness matrix, was used. However, controlling for population structure considerably reduces power for population-stratified variants 23 . Because population-stratified variants are common in bacteria, genome-wide associations were also analysed with a linear model. Both analyses were performed in GEMMA 22 . Hits were defined as the top 50 significant associations within a phenotype. Manhattan plots were generated using LocusZoom 49 .
Genome-wide protein structure prediction. Because the structures of most proteins in the M. abscessus proteome have not been resolved experimentally, it was necessary to model them computationally. We therefore extended our M. abscessus structural proteome database, Mabellini 29 , which provides only high-confidence, well-annotated structural data, to aim for comprehensive coverage of the entire proteome. Therefore, additional proteins were modelled with lower-confidence templates aided with extensive macromolecular modelling and refinement protocols. The multiple sequence alignments were converted into profile hidden Markov models (HMMs) using HH-suite3 (ref. 50 ), which were then used to search against a pdb70 (Protein Data Bank chains clustered at 70% sequence identity) database using Hhsearch 50 . The identified templates were used for comparative modelling, using the modified, MODELLER-based 51 , multi-template structure modelling pipeline of Larsson et al. 52 . In addition to structural consensus and a machine learning-based single-model quality assessment protocol, we also incorporated a rapid method for annotating the quality of protein models through comparison of their distance matrices 53 . As a result, for each of the modelled protein sequences, we obtained a set of theoretical models, ranked by predicted model quality.
Machine learning for assessing effects of missense mutations. To evaluate the effect of polymorphisms on M. abscessus protein structures, we used the models generated in the previous step to estimate the effect of missense mutations. We applied mCSM 28 , which, through graph-based signatures, represents the structural environment of wild-type residues and learns which mutations are detrimental to protein structure. For each of the mutations, one or more modelled structures have been used.
Comparative modelling of MAB_2119c (MbtD). The model of putative polyketide synthase (mbtD, MAB_2119c) was produced as part of Mabellini using the following models: 2hg4, 3tzz and 2jgp 29 . The Mabellini-derived structure was then subjected to extensive relaxation using Rosetta 54 suite, in both a wild-type and mutated variants, where the lowest energy structure has been chosen for subsequent analysis.
Ranking of predicted functional impact of SNPs. Based on SNP annotation (intergenic, synonymous, inframe INDEL, frameshift) and structural modelling predictions of functional impact (above), variants were allocated to four groups: low-effect variants (intergenic and synonymous SNPs; grey), low-moderate-effect variants (inframe INDEL, missense mutations with lowest tertile mCSM scores; green), moderate-high-effect variants (missense mutations with middle tertile mCSM scores; blue) and high-effect variants (frameshift variant, large deletion, start/stop alteration and missense mutations with highest tertile mCSM scores; red).

Summary of GWAS hits.
To summarize the identified variants across all phenotypes, up to five significant, highest ranking hits were extracted from each genotype-phenotype association (a single high-or moderate-effect variant per gene). In total, 2 × 58 genotype-phenotype associations (linear mixed model and linear model) were performed. To assess genetic linkage between these variant hits, we calculated R 2 using PLINK 55 .
Identification of homologues and construction of multiple sequence alignments. For each of the proteins in the M. abscessus proteome, we have constructed a multiple sequence alignment of homologous proteins, which forms a basis for subsequent work. The alignments have been constructed using HHblits, a fast, highly sensitive, HMM-HMM-based sequence search method 56 and used the bundled nr30 database. In the interest of exploring a broader evolutionary landscape of proteins in question, we have decided to include proteins with an E-value ≤10 −4 in the alignment.
Genome-wide evolutionary coupling inference. Exponential models to understand co-evolution in biological sequences have been applied to protein structure prediction 57 , and more recently to bacterial genomic sequences. We have previously shown that the method genomeDCA 33 can be effectively employed to understand the co-evolution of Streptococcus pneumoniae 34 , and is extensible and applicable to other systems 32,34,58 . Here, we employ an approach that blends genomeDCA 33 and CC-DCA 32 to ensure unbiased sampling of evolutionary pressures onto individual positions and pairs of positions across genomic sequences. CC-DCA 32 permits genome-wide coupling inference without needing to resort to extensive sampling, as proposed in genomeDCA 33 . We modified this approach to elucidate the effects of low-frequency alleles across the entire M. abscessus genome. We conducted at least 60,000 runs, each subsampling 25% of positions in the genome. We defined variant-variant couplings as statistically significant based on the Gumbel distribution (as described previously 33 ) corresponding to an FDR of <10 −6 . Variant-variant pairs that spanned a distance of more than 100 bp were ranked by coupling strength and visualized on the M. abscessus genome using the Circos package 59 . Subsequently, we pooled the statistically significant couplings by gene-gene pairs, and ranked them by the number of couplings. Cytoscape was used to plot the network of the 1,000 strongest gene-gene couplings, highlighting the number of couplings (edge width), coupling strength (edge colour) and predicted gene function (node colour) 60 . For CC-DCA validation, we assessed the protein-protein interactions of putative functional clusters with STRING v.11.5 (nodes, observed and expected edges, protein-protein interaction enrichment P value) 36 .
Generation of CRISPRi mutants. Analogous to CRISPR-mediated gene silencing in Mycobacterium tuberculosis and Mycobacterium smegmatis, we established a CRISPRi platform in M. abscessus 35,43,61 . M. abscessus ATCC 19977 was transformed with pTetInt-dCas9 and a second vector (pGRNAz) containing the small-guide RNA cassette. For each gene, two oligonucleotides were synthesized (forward and reverse), annealed and cloned into pGRNAz. Oligonucleotide sequences are outlined in Supplementary Table 3. The strains were grown in Middlebrook 7H9 broth (supplemented with 0.4% glycerol, 10% ADC and 0.05% Tween 80) and selected with hygromycin (1 mg ml −1 ) and zeocin (300 μl ml −1 ). dCas9 and sgRNA expression were under the control of a tet-inducible promotor. To achieve maximal gene repression cultures were supplemented with 100 ng ml −1 anhydrotetracycline. As controls, an empty vector control and YidC (essential gene) knockdown were used. To validate CRISPR-induced transcriptional repression we complemented knockdown mutants with rescue vectors, in which MAB_0471 or MAB_472 containing silent mutations at the CRISPR-binding sites were cloned into pGRNAz under a strong promoter. In these mutants, CRISPR guides bind and repress chromosomal gene expression, but not the mutated gene expressed in the plasmid.
Generation of knockout and complemented mutants. To validate structural predictions, a MbtD knockout mutant was generated on the ATCC 19977 background via recombineering 62 . In brief, primers which amplified the 1,000-bp flanking regions up-and downstream of the respective gene were designed and a zeocin cassette was cloned between these fragments to synthetize an allelic exchange substrate. pJV53 was used to generate the recombineering strain ATCC19977-pJV53, which was grown to the exponential phase and induced with 0.2% acetamide 44 . The allelic exchange substrate was then electroporated into ATCC19977-pJV53 and plated on Middlebrook 7H11 agar supplemented with 10% OADC containing 300 μg ml −1 zeocin and then grown in broth culture to remove pJV53. To complement ΔMAB_2119, MAB_2119 was PCR-amplified, digested and ligated into pMV306-hsp60. To generate ΔMAB_2119 + Ile256Thr and ΔMAB_2119 + Thr410Ala complemented mutants, pMV306-MAB_2119 was PCR-amplified using oligonucleotides containing the chosen mutation (Supplementary Table 3). These plasmids were then electroporated into ΔMAB_2119 on Middlebrook 7H11 agar supplemented with 10% OADC and kanamycin (200 μg ml −1 ) and confirmed by PCR.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All sequencing data of this study is deposited in the European Nucleotide Archive with the respective accession codes provided in Supplementary Table 6. Source data are provided with this paper.
2 nature portfolio | reporting summary

March 2021
Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
The required GWAS sample sizes were based on assumed effect sizes of antimicrobial resistance and the number of available samples. With this sample sizes we could identify several unknown mechanisms; however it is likely that a much larger data set (n>1000) would have revealed even more information.
Data exclusions M. abscessus isolates were phenotyped in replicates. If replicate variation was too large (as outlined in the online supplement), the phenotypic information was removed from final analysis.

Replication
Mycobacterial phenotyping was done in replicates and all replicates were analysed, except those not meeting quality criteria (as outlined in the online Supplement).
Randomization Not applicable. Samples were not allocated to experimental groups.

Blinding
Not applicable. Samples were not allocated to experimental groups.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Mycoplasma contamination
Mycoplasma contamination was ruled out on a monthly base.
Commonly misidentified lines (See ICLAC register) No misidentified cell lines were used in the study.

Animals and other organisms
Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research

Wild animals
No wild animals were used in the study.
Field-collected samples No field-collected samples were used in the study.