Population genomics confirms acquisition of drug-resistant Aspergillus fumigatus infection by humans from the environment

Infections caused by the fungal pathogen Aspergillus fumigatus are increasingly resistant to first-line azole antifungal drugs. However, despite its clinical importance, little is known about how susceptible patients acquire infection from drug-resistant genotypes in the environment. Here, we present a population genomic analysis of 218 A. fumigatus isolates from across the UK and Ireland (comprising 153 clinical isolates from 143 patients and 65 environmental isolates). First, phylogenomic analysis shows strong genetic structuring into two clades (A and B) with little interclade recombination and the majority of environmental azole resistance found within clade A. Second, we show occurrences where azole-resistant isolates of near-identical genotypes were obtained from both environmental and clinical sources, indicating with high confidence the infection of patients with resistant isolates transmitted from the environment. Third, genome-wide scans identified selective sweeps across multiple regions indicating a polygenic basis to the trait in some genetic backgrounds. These signatures of positive selection are seen for loci containing the canonical genes encoding fungicide resistance in the ergosterol biosynthetic pathway, while other regions under selection have no defined function. Lastly, pan-genome analysis identified genes linked to azole resistance and previously unknown resistance mechanisms. Understanding the environmental drivers and genetic basis of evolving fungal drug resistance needs urgent attention, especially in light of increasing numbers of patients with severe viral respiratory tract infections who are susceptible to opportunistic fungal superinfections.

F ungal infections affect more than a billion people worldwide, with a mortality rate that matches that of malaria or tuberculosis 1 . A growing concern is Aspergillus fumigatus, a globally prevalent environmental mould that can cause multiple clinical diagnoses. Among these, invasive aspergillosis (IA) can occur in at-risk populations, such as patients with severe neutropenia, haematopoietic stem cell or solid organ transplants, patients receiving immunosuppressive drugs and, increasingly, patients with influenza and coronavirus disease 2019 (COVID-19) (as an associated infection) 2,3 . Patients with cystic fibrosis (CF) are also at risk of chronic infections, with 30% developing Aspergillus-related bronchitis and 19% developing allergic bronchopulmonary aspergillosis 4

. With over 2.25 million individuals suffering from infections caused by
A. fumigatus in the European Union alone 5 , this is a global concern. Unfortunately, recent studies have also reported an emerging worldwide resistance to azole antifungal drugs in both clinical and environmental isolates [6][7][8] , which have long proven effective against A. fumigatus 9 .
Azole drug resistance has serious clinical implications, with retrospective studies of patients with drug-resistant IA showing a 25% increase in mortality at day 90 compared to patients with wild-type (WT) infections 10 . While in vivo resistance emergence during extended azole therapy is well documented 11,12 , more recent studies postulate an ex vivo evolution of resistance in the environment as a result of exposure to agricultural chemicals-particularly sterol 14α-demethylation inhibitor fungicides developed in the 1970s 13,14 . Broadly, environmentally occurring azole resistance in A. fumigatus is characterized by signature mechanisms involving expression-upregulating tandem repeats (TRs) in the promoter region of cyp51A accompanied by point mutations within this gene, which decrease the affinity of azoles for the target protein; the most commonly occurring alleles are known as TR 34 / L98H and TR 46 /Y121F/T289A and are associated with high-level itraconazole and voriconazole resistance, respectively both inside and outside the clinic [15][16][17] . The spatially widespread occurrence of these alleles alongside increasing reports of more complex cyp51A resistance-associated polymorphisms [18][19][20] underpin the hypothesis that the broad application of agricultural azole fungicides is driving natural selection, amplification and ultimately acquisition of azole-resistant airborne A. fumigatus conidia by susceptible patients 21 . Furthermore, the potential for global spread of these resistance mechanisms through floriculture products, especially plant bulbs, has been demonstrated 22 , while the global dispersal of conidia on air currents is impossible to contain.

Infections caused by the fungal pathogen Aspergillus fumigatus are increasingly resistant to first-line azole antifungal drugs. However, despite its clinical importance, little is known about how susceptible patients acquire infection from drug-resistant genotypes in the environment. Here, we present a population genomic analysis of 218 A. fumigatus isolates from across the UK and Ireland (comprising 153 clinical isolates from 143 patients and 65 environmental isolates). First, phylogenomic analysis shows strong genetic structuring into two clades (A and B) with little interclade recombination and the majority of environmental azole resistance found within clade A. Second, we show occurrences where azole-resistant isolates of near-identical genotypes were obtained from both environmental and clinical sources, indicating with high confidence the infection of patients
Modern genomic epidemiological methods further indicate a potential link between the increasing clinical incidence of azole-resistant IA and the increasingly broad range of azole-resistant genotypes that are being reported in the environment 23 . The rate at which environmental resistance develops will be determined by natural selection on beneficial mutations. This is in turn influenced by recombination, gene flow and dispersal, which leave their characteristic signatures in the genome. Evidence for these expectations comes from a recent global study from our laboratory demonstrating the non-random distribution of azole resistance in multilocus microsatellite genotypes 24 .
In this study, we used whole-genome sequencing (WGS) of 218 A. fumigatus isolates (n = 65 environmental isolates and n = 153 clinical isolates) to interrogate the molecular epidemiology of this fungus and determine whether acquisition of drug-resistant isolates by at-risk patient groups occurs. We also leveraged the power of these data to perform genome-wide association studies (GWAS) and pan-genome analyses to identify the variation associated with itraconazole drug resistance, revealing potentially new mechanisms of resistance.

WGS of 218 A. fumigatus isolates.
In this study, we used reference-guided and de novo assembly methods to analyse the population genomics and pan-genome of 218 clinical and environmental A. fumigatus isolates, spanning the UK and Ireland (Supplementary Table 1). Of these 218 sequenced isolates, 153 (70%) were clinical in origin and the remaining 65 (30%) originated from environmental sources in the UK and Ireland. The genomic dataset can be accessed as a Microreact project 25 at https://microreact.org/ project/6KR8996ywtVRV5wm233YhP (Extended Data Fig. 1). A chi-squared test showed a significant bias towards the MAT1-2 idiomorph (P < 4.82983 × 10 −05 ), primarily seen in the environmental (P < 1.00183 × 10 −05 ) and to a lesser extent the clinical (P < 0.04396) populations (Supplementary Table 2). Only one isolate (ARAF005) displayed partial ploidy of chromosome 1.
Thirteen isolates reported raised MICs but displayed no known drug resistance polymorphisms (Supplementary Table 3).
A total of 329,261 sites were polymorphic (approximately 1.1%), a proportion that was in line with our previous study 21 . Pairwise identities showed that, on average, each isolate of A. fumigatus in this dataset differed from all others by 11,828 single-nucleotide polymorphisms (SNPs). Genotypes and their relative frequencies are shown in Supplementary Table 4.
Phylogenomics and signatures of selection associated with resistance. Phylogenetic analysis identified two broadly divergent clades ( Fig. 1 and Supplementary Fig. 2) with 100% bootstrap support; 'clade A' and 'clade B' contained 123 and 95 isolates, respectively (Supplementary Table 5). An additional 41 publicly available WGS of non-UK origin were added to the phylogeny to confirm that clade assignment (Extended Data Fig. 3 and Supplementary Table  6) was not an artefact of these data but also seen globally. Most (n = 99) resistance-associated genotypes and azole-resistant phenotypes clustered within clade A (Fig. 1, Supplementary Fig. 2 and Supplementary Table 5). Conversely, most isolates with no resistance polymorphisms and azole-susceptible phenotypes (n = 82) clustered into clade B. MICs for three azole drugs, as well as the occurrence of polymorphisms associated with the cyp51A gene, were significantly higher in clade A than in clade B (chi-squared test P = 3.27308 × 10 −14 , d.f. = 1). For TR-associated polymorphisms, 100% of the TR 34 and 71% (n = 5) of the TR 46 -associated alleles occurred within clade A.
Twenty-three per cent (n = 75,317 SNPs) of the total A. fumigatus diversity seen in the dataset occurred within clade A, despite comprising 56% of the isolates sampled. No SNPs were uniquely associated with the TR 34 /L98H or TR 46 polymorphisms. This mirrors earlier STRAf microsatellite analysis, illustrating that isolates harbouring drug resistance polymorphisms displayed reduced genetic diversity and were genetically depauperate compared to randomly selected WT isolates 24 . Multivariate methods were used to identify and describe clusters of genetically related isolates; principal component analysis (PCA) identified three optimal clusters, with no geographical or temporal clustering, which broadly corresponded to the phylogeny (Extended Data Fig. 4). Cluster 1 corresponded to a subset of clade A containing a broad selection of cyp51A polymorphisms; cluster 2 corresponded to clade B; and cluster 3 overlapped with isolates within clade A containing TR 34 /L98H only. Discriminant PCA and STRUCTURE confirmed the three clusters (Fig. 2b,c). Estimates of index of association and rBarD implied no significant linkage among the loci for all three clusters, indicating they are recombining with each other.
The fineStructure-linked coancestry model found the highest levels of shared genomic regions between isolates C4, C54 and C178 within clade B, providing evidence for strong haplotype sharing (Extended Data Fig. 5). These are all MAT1-2 idiomorphs. Overall, however, strong haplotype donation occurred mostly within clades; an exception to this observation was the strong haplotype donation between C178, C4, C54 (clade B isolates) and C365 (a clade A isolate).
The average fixation index (F ST ) value was 0.1312 (range: 0-0.944035; s.d. = 0.0823); average F ST values and ranges for each chromosome are detailed in Supplementary Table 7. Chromosome 1 displayed extremely variable F ST values (range: 0-0.9440) (Fig. 3b). In particular, a region of 590 kilobase pairs (kbp) on the right arm of this chromosome displayed an average F ST value of 0.2273 but with a range in F ST values from 0 to 0.944035, suggesting near panmixis in some parts of this region between clades A and B. Across this region, 184 genes were found (gene ID Afu1g15860 to Afu1g17640 (Supplementary Data 4)). Of these genes, 9 contained SNPs that were significantly associated with itraconazole resistance using treeWAS. Also within this region, three extremely high outlier F ST values were observed where the average F ST value was 0.9321 (range: 0.9238-0.9440) with the spanned regions containing 9 genes (Supplementary Table 8). Regions of higher than average F ST (approximately 0.5) were also observed in chromosomes 4 and 7 ( Fig. 3b), implying population subdivision. The average F ST value for the cyp51A region, found in chromosome 4, was 0.1193.
Highly variable Tajima's D values were observed for both clades (Fig. 3a), suggestive of differing patterns of demography and    Table 7). The region covering cyp51A appeared to have a lower than average D value (−1.02123; Fig. 3a) compared to the rest of clade A.
To determine the extent that variation in Tajima's D and F ST owe to intraclade population substructure, we subset the dataset into three clusters defined by PCA, discriminant PCA and STRUCTURE. Three-way F ST (Extended Data Fig. 5) between the three clusters showed regions in chromosome 1 approaching an F ST of 1, showing that these highly diverged alleles were present across clade A. Highly variable F ST values when comparing clusters 1 and 3 were observed (range: 0-0.786203; average: 0.1309), clusters 1 and 2 (range: 0-0.9322; average: 0.1577) and clusters 2 and 3 (range: 0-1; average: 0.1424). Estimating Tajima's D in clusters 1, 2 and 3 also found highly variable positive and negative values of D across all 3 clusters (Extended Data Fig. 6). The average value of D for all three clades was around zero, a marked departure from the previous analysis of solely clade A (cluster 1: −0.089152, range: −2.40694 to 4.68301; cluster 3: 0.01773, range: −2.62465 to 4.901) suggesting that the positive signature of selection in clade A was largely owed to population substructure.
Gene-phenotype associations mirror regions of high F ST . tree-WAS is a microbe-specific approach that utilizes a phylogeny-aware approach to performing genome-wide association while being robust to the confounding effects of clonality and genetic structure. The algorithm identified 2,179 significant loci using the subsequent test; of these 1,385 significant loci were located within chromosome 4 and 64% of significant loci (n = 1,391) were found to be intergenic. The 3 most significant loci (P < 2.03 × 10 −6 ) were all located on chromosome 4: 1 was intergenic (position 1,779,747), one was a non-synonymous SNP (S237F) in Afu4g07010 (position 1,816,171) and the third was the L98H substitution in cyp51A (position 1,784,968; Supplementary Data 1). Other significant loci mapped to genes involved in secondary metabolism, including fumitremorgin. There were also loci within Afu8g00230, encoding verruculogen synthase, which is associated with A. fumigatus hyphae and conidia modifying the properties of human nasal epithelial cells 26 . A single significant SNP was also identified in Aspf2 (Afu4g09580), which is involved in immune evasion and cell damage 27 .
Peaks of gene-phenotype associations on chromosomes 1, 4 and 7 mirrored regions of high F ST when comparing clades A and B (Fig. 3b) probably reflecting the impact of azole selection on these alleles. An exception to this was observed in chromosome 8, where F ST values did not peak above 0.370951 but treeWAS P values were significant. Significant loci within chromosome 8 are located in genes such as verruculogen synthase, polyketide synthase (PKS)-non-ribosomal peptide synthetase (NRPS) hybrid synthetase psoA (Afu8g00540) and brevianamide F prenyltransferase (Afu8g00210). Out of the 356 significant loci in chromosome 8, most (n = 354) were located within 500,000 bp of the start of the chromosome; 62% were intergenic.
To determine the link between significant loci identified by treeWAS and azole resistance, preliminary investigations were undertaken using 28 gene deletion mutants from the COFUN knockout collection where the corresponding loci were deleted. As expected, the null mutant ΔAFUB_063960 (cyp51A) was not able to fully grow in medium containing >0.06 mg l −1 itraconazole. The null mutant ΔAFUB_016810 (abcA) was unable to grow in medium containing >0.25 mg l −1 itraconazole (Fig. 3c).
The control strain and all other null mutants were unable to grow in medium containing >1 mg l −1 ( Fig. 3c and Table 1). Potentially, these genes may work in combination with others to confer drug resistance.
Clinical and environmental isolates are highly related. PCA and discriminant PCA showed a lack of genetic differentiation among clinical and environmental isolates, showing that clinical isolates are drawn from a wider environmental population (Fig. 2a). Phylogenetically, 6 pairs or groups of A. fumigatus contained both clinical and environmental isolates that were genetically very highly related (Supplementary  Table 2).
An additional cluster of very highly related A. fumigatus isolates consisting of 14 clinical and 14 environmental isolates all harboured the TR 34 /L98H cyp51A polymorphism and MAT1-2 idiomorph. This clonal clade sits within the larger clade A and within cluster 3 and will henceforth be referred to as clade A A (Fig. 1c). Isolates within clade A A were found to be broadly distributed across England, Wales, Scotland and Ireland, covering a spatial distance equivalent to the whole dataset (Fig. 1b). Clade A A appeared with high frequency, comprising 13% of the total dataset and 23% of the isolates found within clade A. The average number of SNPs separating the clade A A isolates was 294 SNPs (2.48% of total genetic  diversity) and had high bootstrap support (100%) within the phylogeny (Extended Data Fig. 2). Isolation by distance showed no significant correlation (observation = 0.0379) between genetic and geographical distances suggesting that the clade A A clone will have a broader extent than just the British Isles. Clade A A was significantly overrepresented (chi-squared test P = 2.75 × 10 −13 , d.f. = 1) for drug-resistant environmental isolates, compared to only 63% of environmental isolates in the rest of the dataset. In comparison, only 54% of clinical isolates within the dataset contained the TR 34 /L98H polymorphism. Two pairs of isolates from clinical and environmental sources in clade A A showed high genetic identity between clinical and environmental isolates-pairs 1 and 2 (Supplementary Table 9). Pair 1 showed the highest identity, with 227 SNPs separating environmental isolate C21 and clinical isolate C112. Nucleotide diversity (π) tests showed that the genetic diversity separating pair 1 isolates was significantly different to the mean nucleotide diversity within the clade A A isolate (one-tailed t-test P < 3.0473 × 10 −252 ; Supplementary Table 9), showing that pair 1 isolates were genealogically tightly linked and that this shared identity could not have occurred by chance occurrence alone.
Accessory gene content is associated with drug resistance. By utilizing the WGS performed in this study, we generated an A. fumigatus pan-genome of 218 isolates. De novo assemblies produced genome sizes between 27.3 and 31 Mb (average = 28.8 Mb) and had high contiguity, with a mean N50 of 12,843 bp and mean number of contigs of 5,652 (range: 5,039-8,754).
The pan-genome was validated against 55 essential Af293 genes 28 ; 53 of these were found in >99% of isolates. The rest were found in 156 (AFUA_2G09060) and 173 (AFUA_4G11280) isolates but all were discovered (Supplementary Table 10). In total, our pan-genome identified 13,206 genes, comprising 6,705 core genes and 6,501 accessory genes. Approximately 50% of genes were shared between all isolates and 1,592 genes were isolate-specific (approximately 12% of the pan-genome) with an average of 7 new genes per isolate. Tettelin's model 29  indicating that the total number of genes within the pan-genome had not reached saturation for the 218 genomes (Fig. 4a).
A total of 10,810 genes were found at least once in isolates from clades A and B, whereas 863 and 1,533 genes were only present in isolates from clades A and B, respectively (Fig. 4b); 140 genes were statistically associated with clade A and 170 with clade B (Supplementary Data 5). Thirteen genes were found within clade B itraconazole-resistant isolates with WT cyp51A, indicating the presence of clade-specific genes (Supplementary Data 6). In contrast, clade A included 37 genes that also associated themselves with either itraconazole resistance (n = 19), itraconazole resistance with cyp51A polymorphism (n = 2) or both (n = 16) (Supplementary Data 7). No genes affiliated with itraconazole-susceptible isolates were found within clade A (Fig. 4c). This finding indicates that enriched genes in clade A were strongly associated with polymorphisms in cyp51A compared to the gene content of clade B where enriched genes were associated with alternative unmapped resistance mechanisms.

Discussion
Breakthrough infections by azole-resistant A. fumigatus have been observed with striking increases across northern Europe, where the incidence has increased from negligible levels pre-1999 up to a Our genomic analysis of environmentally and clinically sourced A. fumigatus yielded four main findings. First, we identified strong genetic structuring into two clades A and B, with most environmentally occurring azole resistance alleles segregating inside clade A and showing signatures of selection at multiple loci, some of which are known to adapt in response to selection by fungicides. Phylogenomic analysis confirmed that the population of A. fumigatus was not panmictic; it is structured into a characteristic 'dumb-bell' phylogeny marked by two clades with limited interclade recombination. We used four hypothesis-free population genetic methods, PCA, discriminant PCA, STRUCTURE and fineStructure to independently confirm the existence of the two clades, as well as the occurrence of strong genetic subclustering within clade A and evidence of widely occurring clonal genotypes (for example, clade A A ). Clade A contained most isolates (88%) harbouring polymorphisms in cyp51A and associated with drug resistance compared to clade B, which was predominantly WT for cyp51A. Phenotypically defined resistance recovered a similar pattern with 78% of itraconazole MICs above clinical breakpoints being compartmentalized into clade A. Further analysis using PCA, discriminant PCA and STRUCTURE showed that clade A is divided into 2 subclusters, clusters 1 and 3. All TR 34 /L98H polymorphisms were found within cluster 3 while cluster 1 contained isolates with a variety of cyp51A polymorphisms. Interestingly, isolates containing the TR 46 /Y121F/T289A polymorphism were found in both clade B and clade A cluster 1 but not clade A cluster 3. These analyses suggest that TR-associated azole resistance has evolved a limited number of times and recombination has not yet had the impact of homogenizing these alleles across the wider A. fumigatus phylogeny. Similar conclusions were drawn by Camps et al. 31 based on microsatellite and cell surface protein marker analysis of European isolates, who also suggested that the TR resistance form had developed from a common ancestor or restricted set of genetically related isolates. Inclusion of non-UK publicly available data confirmed the two-clade structure, with most environmentally associated drug resistance alleles occurring within clade A. Additionally, other non-UK studies have assigned isolates to clades A and B, further confirming the two-clade structure globally 24,32 . However, future work on global collections of A. fumigatus that have been collected across longer periods of time than in the current study will be needed to trace and date the spatiotemporal origins of these alleles.
fineStructure enabled the confirmation of three subpopulations within this dataset, which corresponded to clades A, B and A A , and the presence of a subtle population substructure, which has been described in other studies 33 . Strong donation of haplotypes between the clade B isolates C4, C54 and C178 and the clade A isolate C365 is indicative of recent recombination between these two clades, highlighting the potential for recombination-driven introgression of resistance alleles to occur. Previous studies have confirmed that asexual reproduction facilitates the emergence of TR 34 /L98H within A. fumigatus 34 ; this mechanism may also facilitate the emergence of new resistance alleles across the population. However, the relative impact of sexual/asexual recombination as mechanisms responsible for facilitating the emergence and/or spread of resistance alleles is to be determined. The lack of a reliable molecular clock for A. fumigatus, combined with evidence of recombination between the clades, currently hinders the dating of the time of emergence of resistance alleles.
Second, we observed multiple exemplars of drug-resistant genotypes in the clinic that matched those in the environment with very high identity. Since patients have never convincingly been shown to transmit their A. fumigatus to the environment, this finding demonstrates that at-risk patients were infected by isolates that have pre-acquired their resistance to azoles in the environment. The statistically significant association between the azole-resistant isolates C354, C355 and C357, two clinical isolates and an environmental isolate, respectively, isolated within the same city, coupled with their very high genetic similarity suggests an environment-to-patient acquisition of azole-resistant A. fumigatus with very high confidence. The low number of SNPs separating the isolates and nucleotide diversity tests showed that these isolates are genealogically tightly linked and were strongly supported phylogenetically with 100% bootstrap support. This shared identity was statistically significant (P < 3.0473 × 10 −252 ) and could not have occurred by chance alone. Our PCA analysis showed that clinical isolates were drawn from a wider environmental diversity, with a lack of genetic differentiation among isolates in these populations (Fig. 2a). The presence of other significant shared identity between clinical and environmental isolates (Supplementary Table 8) demonstrates that this environmental-to-clinical acquisition via inhalation of fungal spores is not a rare occurrence.
Third, gene-phenotype associations mirrored regions of high F ST , suggesting that these itraconazole-resistant phenotypes are linked not only to regions of the genome under selection but also to the genetic structure that we observed. We found a striking congruence where significant peaks of gene-phenotype associations on chromosomes 1, 4 and 7I mirrored regions of high F ST when compared to clades A and B (Fig. 3a). It appears that fungicide-associated resistance is driving the patterns of evolution and is the most likely explanation for much of the observed genetic architecture. treeWAS also identified significant SNPs in genes involved in secondary metabolism. Recent research has shown that secondary metabolites combat the host immune system and aid growth in the host (human) environment 35 . Our results identified four SNPs in the gene encoding fumitremorgin C monooxygenase (Afu8g00240), which are part of the pathway for secondary metabolite fumitremorgin C, a mycotoxin that acts as a potent ATP-binding cassette sub-family G member 2/breast cancer resistance protein inhibitor that reverses multidrug resistance 36 . This pathway has also been suggested as regulating the brevianamide F gene cluster 37 ; two non-synonymous SNPs in the gene encoding brevianamide F prenyltransferase (Afu8g00210) were also identified as significant. Further complexity was observed in chromosome 8, where F ST values did not peak above 0.370951 but treeWAS P values were highly significant. Significant loci within chromosome 8 were located in genes such as verruculogen synthase, PKS-NRPS synthetase psoA (Afu8g00540) and brevianamide F prenyltransferase (Afu8g00210). Future work now needs to widen our search and focus on the use of reverse functional genomic approaches to interrogate the function of these genes within the context of their potentially epistatic interrelationships with the canonical ergosterol biosynthesis cyp51A azole resistance alleles on chromosome 4.
Finally, through pan-genome analysis, we showed that A. fumigatus has an extensive, variable accessory genome that expands with the addition of each new genome (Fig. 4a). These results are in contrast to previous studies 38,39 , possibly due to differing numbers of isolates occupying different geographies and habitats, as well as alternative bioinformatics pipelines 38,39 . From the analysis of the accessory gene content, we found that genes linked to itraconazole resistance and/or known resistance-associated cyp51a polymorphisms were found strictly within clade A (Fig. 4c). Within clade B, we found evidence for genes in itraconazole-resistant isolates that contained no known cyp51a drug resistance polymorphisms. However, a large proportion of these genes were also found in itraconazole-susceptible isolates (11 out of 13, 85%) (Fig. 4c). These findings point to the existence of new resistance mechanisms with an underlying polygenic basis and show that further understanding the role of accessory genes in generating drug-resistant phenotypes is needed. Such studies may also shed light into the extent that resistance in A. fumigatus may be influenced by additional factors, such as the epigenetic-mediated resistance observed in Mucor circinelloides 40 .
Therefore, this study supports the hypothesis that the widespread use of azole fungicides in agriculture is coupled to widespread isolation of azole-resistant A. fumigatus from environmental sources 14 . These isolates, in turn, bear hallmark multilocus genotypes that are indistinguishable to those recovered from patients, supporting our conclusion that adaptation to fungicides in the environment leads to transmission of A. fumigatus-bearing azole resistance genotypes without obvious detriment to their clinical fitness 41,42 . The identification of spatially widespread A. fumigatus clones that are not only resistant to azoles but are also highly represented in both the environment and clinic suggests that there are few fitness costs associated with this phenotype. Respiratory viruses such as H1N1 influenza are known to predispose critically ill patients to secondary mould infections 2 and case studies increasingly show that similar infections are experienced by patients with COVID-19 (ref. 3 ). Therefore, the growing numbers of susceptible individuals underscores the need to more fully understand the risk posed by environmental reservoirs of pathogenic fungi that, primarily through the use of agricultural antifungals, have evolved resistance to first-line clinical azoles.

Fungal isolates.
A total of 218 isolates were included in this study, spanning 12 years (2005-2017), covering a spatial range of 63,497 miles 2 in England, Wales, Scotland and Ireland (Fig. 1b); 153 clinical A. fumigatus isolates from 5 participating centres were included. Patients either had respiratory colonization with A. fumigatus or were suffering from different manifestations of aspergillosis and had the following underlying disorders (Supplementary Table 1): CF (64%); other conditions (11%); unknown (25%). Environmental isolates (n = 65) were collected from the following sources: soil (72%); plant bulbs (12%); air (3%); compost (2%); and unknown (11%). Briefly, plant bulbs were purchased from a garden centre in Dublin, Ireland and swabbed and plated onto Sabouraud dextrose agar 22 . Soil samples were collected from 16 sites across south England between May and July 2018; locations were selected to incorporate a range of habitat types. Dry soil was loosened and collected into a 5 ml Eppendorf tube (Eppendorf AG) 43 . Soil and air samples from south Wales were collected from June to November 2015 in urban and rural locations 44 .
Many isolates from both clinical and environmental sources were specifically selected for WGS because they displayed phenotypic azole resistance (raised MICs to at least one triazole drug using EUCAST or CLSI) and did not constitute a randomized sample. We report MICs for itraconazole (n = 200), voriconazole (n = 201) and posaconazole (n = 177) in Supplementary Table 7.
All referred isolates were identified by phenotypic morphology because the A. fumigatus species complex was based on colonial morphology and microscopic characteristics. Isolates were cultured on Sabouraud dextrose agar (Oxoid) and malt extract agar (Sigma-Aldrich) at 37 °C ± 2 °C for 5-7 d in the dark. The adhesive tape technique was used for microscopic examination of fungal cultures. Slides were prepared with lactophenol cotton blue as the mounting and staining fluid. The Atlas of Clinical Fungi (https://www.clinicalfungi.org) was consulted as an identification reference. In addition, growth at 45 °C was used to exclude most cryptic species within section Fumigati. Isolates with elevated azole MICs were confirmed to be A. fumigatus by matrix-assisted laser desorption ionization-time of flight mass spectrometry, performed with a Microflex LT system (Bruker) using the Biotyper 3.0 software with the additional fungi library (Bruker) according to the manufacturer's recommendations or as described by Dunne et al. 22 . Exact identification of azole-resistant A. fumigatus isolates from two participating centres in London, UK, was confirmed by sequencing the partial calmodulin gene (CaM locus) as described previously 45 . Antifungal susceptibility testing was completed as part of the original sampling studies or determined according to the standard EUCAST 46 or CLSI M38-A2 broth microdilution methods 47 . MICs for itraconazole and voriconazole were determined for 92% of isolates (n = 200 and 201, respectively). MICs for posaconazole were determined for 81% of isolates (n = 177). Seventeen isolates (8%) were not tested for susceptibility and therefore have no recorded MICs for any antifungal drug.

DNA preparation and WGS.
High-molecular-weight DNA was extracted from all 218 isolates and quantified. Briefly, high-molecular-weight DNA was extracted using the MasterPure Yeast DNA Purification Kit (Epicentre Biotechnologies) with bead beating with 1.0 mm zirconia/silica beads (BioSpec Products) in a FastPrep-24 system (MP Biomedicals) at 4.5 m s −1 for 45 s. Genomic DNA was quantified with a Qubit 2.0 fluorometer and dsDNA BR Assay Kit (Thermo Fisher Scientific) and quality-controlled with a TapeStation 2200 (Agilent Technologies) and gDNA ScreenTape assays (Agilent Technologies). Genomic DNA libraries were constructed with the Illumina TruSeq Nano Kit (Illumina) at the Natural Environmental Research Council (NERC) Biomolecular Analysis Facility (BAF), University of Edinburgh, UK (http://genomics.ed.ac.uk/). Prepared whole-genome libraries were sequenced on an Illumina HiSeq 2500 sequencer at BAF, generating 150 bp paired-end reads in high output mode.
Phylogenetic analysis. Whole-genome SNP data were converted into presence/ absence of an SNP with regard to the reference. SNPs identified as low confidence in the variant filtration step were converted to missing data. These data were converted into relaxed interleaved Phylip format and maximum-likelihood phylogenies were constructed to assess sequence similarity between isolates using rapid bootstrap analysis over 1,000 replicates using the GTRCAT model of rate heterogeneity in RAxML 56 v.8. Previous studies used nucleotide diversity (π) as a metric to test whether pairs of isolates are epidemiologically linked to infer transmission 58,59 . Nucleotide diversity (π) tests were implemented in VCFtools 60 v.0.1.13.
Genetic similarity and population allocation was investigated via hypothesis-free approaches. PCA and discriminant PCA 61 were performed to interrogate the relationship between clinical and environmental isolates based on SNP data using the R package adegenet 57 v.2.1.1. Genetic clusters were allocated based on identifying the optimal number of clusters (k) corresponding to the lowest Bayesian information criterion (BIC). STRUCTURE v.2.3.4 (ref. 62 ) was used to independently investigate the population structure assignments that were predicted by discriminant PCA and PCA.
We analysed a coancestry matrix based on whole-genome SNP data to assign individuals to populations via model-based clustering using fineStructure 63 v.2.0.7. fineStructure uses chromosome painting to identify important haplotypes and describe shared ancestry within a recombining population. These analyses were performed using a pan-clade genome-wide SNP matrix to infer recombination, population structure and assignment, and the ancestral relationships of all lineages. ChromoPainter reduced the SNP matrix to a pairwise similarity matrix under a linked model, using information on linkage decay and reducing the within-population variance of the coancestry matrix relative to the between-population variance.
Sliding window population genetic statistics (Tajima's D, nucleotide diversity (π) and F ST ) were calculated for non-overlapping windows of 10 kb using VCFtools 60 v.0.1.13, with resulting graphics produced in R v.3.5.3 using ggplot2. We measured the signatures of genome-wide population differentiation via the F ST analysis of non-overlapping 10 kb windows by comparing isolates within clade A against those isolates from clade B and for each cluster identified by PCA, discriminant PCA and STRUCTURE; the F ST is a measure of population differentiation due to genetic structure, with values ranging from 0 (implying complete panmixis) to 1 (no genetic diversity is shared between the two populations). Tajima's D 64 measures departures from neutral expectations and selection.
The index of association and rBarD are commonly used to estimate linkage disequilibrium. These two statistics were calculated using Poppr v.2.8.5 (ref. 65 ) in R v.3.5.3 using 999 resamplings of the data under the null hypothesis of no linkage disequilibrium and that no recombination would not be rejected.
Identifying loci associated with itraconazole resistance. Loci associated with itraconazole resistance were identified using treeWAS 66 , a method recently developed to address challenges specific to microbial GWAS. treeWAS uses phylogenetic information to correct for the microbial population structure; therefore, we used the phylogeny presented in this study along with a nucleotide alignment for all 218 isolates. treeWAS was performed for all isolates with itraconazole MIC information and a binary phenotype, categorized as itraconazole MIC defining susceptibility as MIC < 2 mg l −1 and resistant as MIC ≥ 2 mg l −1 (n = 200) with a P cut-off of 0.01 for 3 tests of association (subsequent, simultaneous and terminal) between azole susceptibility phenotype and genotype. However, only the subsequent test results were used because this test was previously defined by Collin and Didelot 66 as most effective at detecting subtle patterns of association. A phylogeny for these isolates was reconstructed as previously described with 281,874 SNPs common to 1 or more isolates with reported itraconazole MIC. The phenotype information was encoded as a discrete vector based on above the MIC breakpoint for itraconazole (and therefore resistant) or below (susceptible). Significant SNPs common to all three tests (Supplementary Data 1-3) were combined and were mapped to their respective genes via FungiDB release 50 beta 67 .
Drug sensitivity screening. Null mutants were obtained from the COFUN genome-wide knockout collection 68 and the COFUN transcription factor knockout library 69 . MFIG001 (A1160p+) was used as the parental isolate 70  GeneMark-EP and ProtHint 76 were combined into a semi-supervised gene calling pipeline that utilized a set of know proteins to predict potential orthologues within a genome. Orthologous proteins were obtained from OrthoDB 77 v.10.1 and were used to train ProtHint v.2.6.0; GeneMark-EP v.4.68 was used to predict genes on isolate genomes. Assembly statistics optimized the minimum contig size for prediction; assemblies whereby contigs >5 kb covered <40% of the genome had a minimum contig size set to 5 kb; otherwise, the minimum contig size was 1 kb (--max_intro 200 --max_intergenic 100000 --min_contig=1000 --min_ contig_in_predict = (5000/1000)). Proteins were subsequently extracted using the GeneMark-EP built-in script.
To uncover commonly shared genes and enable pan-genome construction, a protein BLAST 78 v.2.9.0 database was produced and an all-versus-all BLASTP search was performed in parallel. The BLAST results for all isolates were combined and used as input into the pan-genome builder PanOCT 79 v.3.23 (-S Y -L 1 -M Y -H Y -V Y -N Y -F 1.33 -G y -c 0,25,50,75,100 -T). The resulting pan-genome was interrogated for the total number of protein-coding genes; a presence-absence matrix was constructed for each gene in all isolates. Correct construction of the pan-genome was assessed by mapping 55 previously identified Af293 essential genes 28 to the pan-genome using BLAST. Extraction of these gene sequences was performed using the R package biomaRt 80 v.2.44.4. Pan-genome clusters were then allocated a protein identifier by first mapping to the Af293 proteome with BLAST (-max_hsps 1 -perc_identity 70 -qcov_hsp_perc 50) then mapping to the RefSeq non-redundant database using DIAMOND 81 v.2.0.13 (-e 0.00001--query-cover 50--subject-cover 50--max-hsps 1--max-target-seqs 1--id 70); finally, conversion of protein identifier to gene accession was performed using Entrez Direct 82 v.1.6.2.
A custom script was used to extract pan-genome statistics. The R package micropan 83 v.2.1 was used to model the openness of the pan-genome using Heaps' law as described by Tettelin et al. 29 , with the number of permutations set to 1,000. Data generated by this model were plotted with R v.4.0.2.
The metadata produced in this study were correlated with the gene variation matrix to identify genes associated with clade structure, itraconazole MIC values and cyp51a polymorphisms. Correlations were carried out using Scoary 84 v.1.6.16, supplied with the phylogenetic tree generated in this study. Associations were considered significant if the Bonferroni P < 0.05 and assigned to either binary trait by odds ratio (OR). Subsequent Venn diagrams were produced using Eulerr 85 v.6.1.0. All parameters for the prior pipeline were default ones unless stated otherwise.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All raw reads have been submitted to the European Nucleotide Archive under accession no. PRJEB27135. Six isolates (ARAF001-6) were sequenced as part of Abdolrasouli et al. 21 , with reads deposited under accession no. PRJEB8623. Source data are provided with this paper.

Code availability
Custom scripts for the pan-genome analysis can be found at https://github.com/ harrychown/asp_pan.

Corresponding author(s):
Last updated by author(s):

Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size ( ) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. , , ) with confidence intervals, effect sizes, degrees of freedom and value noted The data were generated via the Illumina HiSeq sequencing platform For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Source data are provided in the paper, its supplementary files, and a Source Data file for Figures. The data are deposited in the .

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences