Increased power from conditional bacterial genome-wide association identifies macrolide resistance mutations in Neisseria gonorrhoeae

The emergence of resistance to azithromycin complicates treatment of Neisseria gonorrhoeae, the etiologic agent of gonorrhea. Substantial azithromycin resistance remains unexplained after accounting for known resistance mutations. Bacterial genome-wide association studies (GWAS) can identify novel resistance genes but must control for genetic confounders while maintaining power. Here, we show that compared to single-locus GWAS, conducting GWAS conditioned on known resistance mutations reduces the number of false positives and identifies a G70D mutation in the RplD 50S ribosomal protein L4 as significantly associated with increased azithromycin resistance (p-value = 1.08 × 10−11). We experimentally confirm our GWAS results and demonstrate that RplD G70D and other macrolide binding site mutations are prevalent (present in 5.42% of 4850 isolates) and widespread (identified in 21/65 countries across two decades). Overall, our findings demonstrate the utility of conditional associations for improving the performance of microbial GWAS and advance our understanding of the genetic basis of macrolide resistance.

. The overall proportion of variance explained by the model was 66.5%. The variance explained by a predictor was determined by the change in model R 2 after inclusion of that predictor. Three approaches were used to calculate this change: the "first" metric compares a model without any predictors to a model with just the predictor of interest, the "last" metric compares a model with all predictors except the one of interest to a model with all predictors, and the "lmg" method averages the change in R 2 over all possible model subsets.

Supplementary Figures
Supplementary Figure 1 -High genetic linkage between significant azithromycin MICassociated variants in the GWAS. The recombination-corrected phylogeny from Figure 1 was annotated with the presence and absence of significant variants from the GWAS corresponding to 23S rRNA, hprA, WHO_F.1254, and ydfG (outermost to innermost). Branch length represents total number of substitutions after removal of predicted recombination.

Supplementary Figure 2 -Distribution of r 2 values between significant variants (p-value < 2.97×10 -7 ) and 23S rRNA-associated unitigs in the single-locus GWAS.
Significant variants with high linkage to 23S rRNA are likely to be spurious associations. See methods for details on calculation of r 2 .
Supplementary Figure 3 -GWAS conditional on 23S rRNA mutations compared to unconditional GWAS results recovers similar results as in Figure 1 but does not control for dataset-specific confounders (spr and the intergenic region between genes WHO_F.2279 and 2280). As in Figure 1, genetic linkage measured by r 2 to 23S rRNA mutations A2059G and C2611T for significant variants is colored as indicated on the right. Variants associated with previously experimentally verified resistance mechanisms in the mtrR and mtrCDE promoters and coding regions are denoted in the legend. Bonferroni thresholds, calculated using the number of unique patterns, for both GWASes are depicted using a dashed line at 3.38×10 -7 . Plot axes are limited to highlight variants associated with lower-level resistance; as a result, the highly significant 23S rRNA substitutions and mtrC indel mutations are not shown.
Supplementary Figure 4 -RplD G70 is part of the azithromycin binding pocket in the 50S ribosome from Thermus thermophilus (PDB ID: 4v7y). N. gonorrhoeae RplD is relatively similar to its T. thermophilus homolog (28.4% identical, 49.1% similarity using a BLOSUM62 matrix over 218 amino acids with 20 insertions/deletions). PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC) was used to depict azithromycin in orange and RplD in blue (with the G70 amino acid highlighted as blue spheres) and to hide the 23S rRNA for clarity.

Supplementary Figure 5 -Growth curve experiments for RplD G70D isogenic strains
show no in vitro fitness cost. Data are presented as mean values +/-SD calculated across the three technical replicates. Top -Calculated CFUs / mL from the full growth experiment for the two strains graphed on a logarithmic axis. Bottom -estimation of exponential phase best fit lines using GraphPad Prism following removal of lag phase data points and log-transformation of CFUs / mL; see Supplementary Table 2 for estimated parameters.