Distribution of the type III DNA methyltransferases modA, modB and modD among Neisseria meningitidis genotypes: implications for gene regulation and virulence

Neisseria meningitidis is a human-specific bacterium that varies in invasive potential. All meningococci are carried in the nasopharynx, and most genotypes are very infrequently associated with invasive meningococcal disease; however, those belonging to the ‘hyperinvasive lineages’ are more frequently associated with sepsis or meningitis. Genome content is highly conserved between carriage and disease isolates, and differential gene expression has been proposed as a major determinant of the hyperinvasive phenotype. Three phase variable DNA methyltransferases (ModA, ModB and ModD), which mediate epigenetic regulation of distinct phase variable regulons (phasevarions), have been identified in N. meningitidis. Each mod gene has distinct alleles, defined by their Mod DNA recognition domain, and these target and methylate different DNA sequences, thereby regulating distinct gene sets. Here 211 meningococcal carriage and >1,400 disease isolates were surveyed for the distribution of meningococcal mod alleles. While modA11-12 and modB1-2 were found in most isolates, rarer alleles (e.g., modA15, modB4, modD1-6) were specific to particular genotypes as defined by clonal complex. This suggests that phase variable Mod proteins may be associated with distinct phenotypes and hence invasive potential of N. meningitidis strains.


Results
Distribution of mod genes and alleles. To investigate the distribution of mod genes and alleles in N. meningitidis (Fig. 1), 1,689 isolates were surveyed, comprising 211 carriage and 1,478 disease isolates from four collections originating in diverse geographic locations (including the USA, UK, Czech Republic, and Australia) and time periods . These analyses determined that a modA gene was present in all isolates examined (although two isolates possessed only fragments of modA). A modB gene was identified in 78% (1,298) of isolates, while a modD gene was present in only 25% (423) of isolates (Fig. 2). Overall, 364 isolates contained modA only, 900 isolates contained both modA and modB, and 398 isolates contained modA, modB and modD genes. The number of DNA repeat units among isolates ranged from 2-34 tetranucleotide repeats in modA; 2-28 pentanucleotide repeats in modB; and 2-15 pentanucleotide repeats in modD (Fig. 1). Phase variation of tetranucleotide repeat tracts containing ≥ 3 repeat units occurs at a high frequency 37 , and the majority of these mod genes are predicted to be phase variable (> 98% of fully assembled alleles contain ≥ 3 repeat units).
For each of the three mod genes, common and rare mod alleles (DRD variants) were found (Figs 1 and 2). The majority of modA positive isolates contained modA12 (1,159 isolates, 70%) or modA11 (456 isolates, 27.5%), with modA15 comprising 2% of modA positive isolates (38 isolates) (Figs 1b and 2b). All other modA alleles were found at low frequency in the dataset, two of which, modA2 and modA6, had not been reported in N. meningitidis before. In addition, several minor variations in the conserved regions of the modA11 and modA12 were seen among the isolates. The most frequent of these was a 15-nucleotide deletion (encoding [S(A/V)KNQ]) in the region encoding the C-terminus of ModA (Fig. 1b), found in 67% of modA11 alleles and 59% of modA12 alleles Scientific RepoRts | 6:21015 | DOI: 10.1038/srep21015 (Fig. 2b). This deletion removed 5 amino acids from the full length of the protein. In addition, the N-terminal, phase variable DNA repeat sequences were altered in some isolates. The typical modA repeat unit in N. meningitidis was 5′ -AGCC-3′ , however modA11 in 27 isolates (6% of modA11 isolates; 1.6% of total modA positive isolates) had 5′ -AGTC-3′ repeats (Fig. 2b).
The majority of modD positive isolates contained the modD1 allele (316 isolates, 75%), or modD6 (72 isolates, 17%; Figs 1d and 2d). All other modD alleles were found at low frequency, including modD3 that has not been reported in N. meningitidis before, modD6 that was previously identified in N. meningitidis 6938 but mis-categorized as modD2 38,39 , and modD7 that has not previously been identified. The novel modD7 allele was present in 5 isolates (1.2% of modD positive isolates), and shared 7-15% identity with other modD alleles over the length of the DRD (Fig. 1d). In addition, this allele had an extended DRD, with an additional 47 amino acids at the C-terminal end of the DRD, compared to other modD alleles (Fig. 1d).
Associations among different mod alleles were also identified. For example, of those meningococci containing only modA and modB, 71% of isolates with modB1 also contained modA11, while 96% of isolates with modB2, and 89% of isolates with other modB alleles, were associated with modA12. Furthermore, 86% of isolates with a modD gene also contained modA12 and modB2, which rose to 97% in isolates containing the modD1 allele.
Specific mod genes and combinations are associated with invasive or carriage meningococci. Given that different mod alleles regulate distinct sets of genes (phasevarions) that affect the phenotype of the isolate, the distribution of mod alleles relative to the isolate's disease outcome (i.e., invasive disease or carriage) was considered. Several associations were observed among mod alleles and invasive or carried meningococci, and these associations were particularly strong for atypical alleles and allelic combinations. For example, modA11 and modB2 were more commonly associated with invasive, rather than carriage, isolates (respectively, 29% vs. 13% for modA11 (p < 0.0001); 40% vs. 28% for modB2 (p = 0.002); Table 2). The modA12-modB2-modD1 combination was associated with cc41/44 invasive isolates (20% invasive vs. 0% carriage, p < 0.0001), while carriage isolates from this clonal complex possessed modA12-modB1 but lacked modD1 (87% carriage vs. 9% invasive with modA12-modB1, p < 0.0001) (Fig. 3). The modA12-modB4 combination was associated with cc213 invasive isolates and was only found in two carriage isolates of different sequence types (6.5% of invasive isolates vs. 1% carriage, p = 0.0004). Also, the modA11-modB1 combination was associated with invasive isolates, and present in hyperinvasive lineages cc32 and cc269 (22% of invasive vs. 2.4% carriage isolates, p < 0.0001). The atypical modA15 allele was usually found alone and in cc92 carriage isolates (12% of total carriage isolates vs. 0.2% of invasive isolates, p < 0.0001).

Discussion
The phase variable type III DNA methyltransferases encoded by the modA, modB and modD genes are a global control mechanism by which N. meningitidis can randomly alter the expression of distinct sets of genes, known as phasevarions 29 . Several alleles of each of the mod genes have been characterized, based on differences in the region encoding the DNA recognition domain (DRD), each of which methylates a different sequence and mediates the epigenetic regulation of different sets of genes 28,30 . Many isolates contain multiple mod genes (i.e., have multiple phasevarions) which can phase vary independently. This enables the reversible induction of polygenetic phenotypes, which increases bacterial adaptability to distinct ecological environments and may affect invasive capacity. For example, the phase variable ON/OFF switching of meningococcal ModD1 alters resistance to oxidative stress 27 , while phase variation of ModA11 and ModA12 results in altered antibiotic resistance 40 . Similarly, previous studies of the gonococcal ModA13 phasevarion have shown that ON/OFF variants have distinct phenotypes for biofilm formation, resistance to antimicrobials, and survival in primary human cervical epithelial cells 28 . Our analysis of the distribution of mod genes and alleles in meningococci isolated from carriage and invasive disease revealed high levels of diversity in type III DNA methyltransferases and their respective DRD regions, and identified associations between mod alleles and clonal complexes. These clonal complexes included both hyperinvasive lineages, responsible for the majority of disease worldwide (e.g., cc11, cc32, cc41/44, and cc269), as well as those that are rarely associated with invasive disease (e.g., cc35, cc92, cc106 and cc116) 7-10,41 . While some alleles, such as modA12, were found in both hyperinvasive (cc11 and cc41/44) and non-invasive (cc106) lineages, some alleles were more commonly associated with one or the other. For example, in terms of hyperinvasive lineages, modA11 was associated with cc32 isolates, modB1 with cc269 and cc32 isolates, and modD1 with cc41/44 isolates. This is consistent with previous studies conducted in smaller datasets 27,28 . The modA11-modB1 combination was associated with hyperinvasive lineages cc32 and cc269; and the modA12-modB2-modD1 combination with the hyperinvasive ST-41 cluster. On the other hand, the modA15 allele was predominantly found in isolates of the cc92 lineage, which is a non-invasive lineage. Furthermore, the distribution of mod alleles relative to the disease outcome or phenotype of each isolate (i.e., invasive or carriage) showed associations of atypical alleles and allelic combinations with invasive or non-invasive meningococci. For example, modA11-modB1, modA12-modB2-modD1 and modA12-modB4 combinations were associated with invasive isolates, while modA15 or the modA12-modB1 combination were associated with isolates from carriage collections. While the mod alleles could be considered a marker of clonal complex rather than infection outcome, it is important to note that there is not always a strict correlation between clonal complex and infection outcome. This is highlighted within the cc41/44 lineage that contains two central STs, where ST-41 meningococci are commonly associated with invasive disease, while ST-44 are typically associated with carriage 10 . The distribution of modD1 in this clonal complex is specifically associated with invasive disease; modD1 is present in 84% of cc41/ 44

Table 2. Distribution of individual alleles and allele combinations in carriage and invasive meningococci.
a Total number in the dataset with each of the most frequent modA, modB and modD alleles, or allele combinations. (− ) indicates the mod gene is absent. b % (total number) of carriage or invasive isolates with the mod allele or allele combination, calculated as a proportion of total carriage or invasive isolates, respectively. For these analyses, alleles/isolates were included if full allele data was available, and if the allele was considered to encode a potentially functional methyltransferases (i.e., contained no point mutations or insertions/ deletions (other than insertions/deletions in the phase variable repeat region); e.g., an allele profile of modA11 -modB1::IS1301 was considered to only contain a putatively functional modA11). Total number of isolates included in analyses: 211 carriage and 1478 invasive isolates for allele distribution; 211 carriage and 1448 invasive isolates for allele combinations (30 invasive isolates removed due to incomplete allele combination data). c Predominant clonal complexes containing the allele or allele combination. Clonal complexes are only included if they contain > 10 isolates in the database. Sequence types not assigned to a clonal complex are excluded. d p-value for the association of the given the allele or allele combination with either carriage or invasive isolates, calculated using the Fisher's Exact test (two-tailed). experimental work will clarify whether the observed associations directly correspond to a difference in the phenotype and virulence of strains. An increased focus should also be placed on collecting and sequencing carriage isolates, as to date these are underrepresented in the available meningococcal isolate panels. Several questions remain regarding the evolution of mod diversity, the frequency of mod allele mobilization, and whether the repertoire is being neutrally inherited, or selectively maintained. These data suggest that the more commonly found alleles (e. g., modA11, modA12, modB1 and modB2), along with their allelic combinations, may have been acquired early in the evolution of N. meningitidis, and have been successively inherited by vertical transmission from progenitor cells that have differentiated into multiple clonal complexes. The rarer alleles, and differences in mod allele combinations, may have arisen from more recent horizontal gene transfer (HGT) and recombination events common to Neisseria 13,38,39,[41][42][43] . For example, the discovery of single isolates containing the modA2, modA4, modA6, and two isolates with modD3 alleles is consistent with HGT in N. meningitidis. Transfer and replacement of alleles is possibly enhanced by the structure of the mod genes, as the conserved flanking regions may facilitate homologous recombination and replacement of the central variable DRD. This has previously been reported for the modA gene, with sequence analysis suggesting some DRDs originated from other bacterial species 33 . Similarly, the distribution of several alleles (e.g., modA2, modA6, modD3 identified in N. meningitidis for the first time) suggests horizontal mobilization of DNA from outside the species [44][45][46] , as these alleles have been previously identified in H. influenzae, N. polysaccharea and N. lactamica 27,29,33 . In order to investigate the temporal and geographical nature of the associations seen, more diverse and long-term isolate collections are needed. However, it can be noted that the some associations, such as that seen between cc11 isolates and modA12 and modB1, are consistent in all the collections over the time period, while others are more specific to certain locations but may reflect the invasive or carriage focus of the collections, for example, the cc92 carriage isolates associated with the modA15 allele are largely isolates from 1993 from the Czech carriage study: whereas modD1 is associated with invasive isolates from the UK and Australian collections.
The evolution and biological significance of the 15-nucleotide C-terminal deletion and the DNA repeat tract variants are unknown. The C-terminal deletion does not affect the well-described conserved motifs of type III Mods, such as the catalytic region (DPPY) and the S-adenosyl-L-methionine methyl donor-binding pocket (FXGXG) (See Fig. 1a) 29,47,48 , and ModA12 has been shown by methylome analysis to be functional in strains with (M0579) or without (B6116/77) this deletion 30 . It is noteworthy, however, that the typical 5′ -AGCC-3′ modA repeat tract is recognized by the ModD methyltransferase DNA recognition domain. Previous studies propose that methylation within gene coding regions may alter transcription 45 : this may suggest that variations arise as a mechanism to circumvent methylation of the repeat tract.
Previous studies posit that meningococcal restriction-modification systems, such as the type III systems that the mod genes are part of, are clade associated, and maintain barriers to gene transfer between groups 15 ; however, other studies suggest that genome recombination is more frequent than expected 38 , and that restriction-modification systems do not necessarily prevent genetic transfer, either because they are being mobilized themselves, or due to transient or inefficient function 39 . This latter point may be particularly true for the mod alleles, given that Mod is phase variable and the restriction enzyme is dependent on the presence of Mod in type III restriction-modification systems, and that inactivating mutations can be found in the cognate restriction enzymes in some systems 28,49 . If this is indeed the case, then the selective maintenance of the mod allele repertoire may be attributable to the epigenetic regulatory functions of Mod, as has previously been suggested to explain the dominance of the modA12 allele in N. meningitidis 33 . To support this hypothesis, full characterization of strains, and the phasevarions regulated by Mod proteins is needed. These studies will clarify the significance of mod allele and clonal complex associations, and how ON/OFF switching affects the phenotype of N. meningitidis, especially in light of the fact that isolates can contain up to three distinct mod alleles and phasevarions. Given that each mod allele can phase vary independently, each isolate can give rise to eight possible distinct sub-populations. These variants would appear genetically identical, differing only in the loss or gain of a few repeat units in the mod gene, but would have profound differences in gene expression and potentially in their invasive phenotype. It is also important to note that the potential fluidity of allelic exchange of mod alleles means that with a single gene recombination event, N. meningitidis strains may acquire a new mod allele and the ability to regulate genes in a different phasevarion, and consequently display a different phenotypic profile.
In terms of mod allele associations with invasive meningococci, it is important to note that disease does not increase the fitness of meningococci as there is no corresponding benefit to transmission 50 . However, if certain Mod proteins do increase the invasiveness of certain clonal complexes, the fact that Mod expression is phase variable would allow the Mod regulatory system to be maintained. For example, even under conditions where Mod ON is deleterious or increases the probability of invasion, cells with Mod in phase OFF may persist, in which case, selective pressure against mod is removed without affecting the future function of the Mod regulatory system or bacterial fitness. Accordingly, these alleles provide a mechanism that enables cells to variably express complex phenotypes, which may facilitate a transition from carriage to invasion and vice versa depending on environmental changes. If so, this may help elucidate how transmissibility and virulence are linked in meningococcal lineages 43 . Hence, the study of these regulators, and how they change over time, may provide critical insights into how and why N. meningitidis cells generate distinct phenotypes without overt changes in gene content, and how this may mediate transition from carriage to invasive disease. While it is tempting to speculate on the ON vs. OFF status of the Mods from the available genome sequences, it is important to note that the samples used for sequencing were not collected for this purpose and are therefore not an accurate reflection of the natural ON/OFF status and ratio of the in vivo bacterial population. Therefore, unlike past epidemiological studies that typically isolate and characterize single meningococcal colonies from patient samples, future studies will require direct and unbiased sequencing, or the isolation of representative populations, of bacteria from blood, cerebrospinal fluid and the nasopharynx. The characterization of the presence and expression state of individual mod alleles in these samples will help determine the significance of mod allele distribution relative to meningococcal carriage and disease.
Screening for mod genes. The mod genes were identified by bioinformatic analysis of WGS data using the BIGSdb platform hosted on pubmlst.org/neisseria 55 . Locus records for modA (NEIS1310), modB (NEIS1194) and modD (NEIS2364) were generated in the PubMLST database (http://pubmlst.org/neisseria) 55 using previously identified reference modA and modB nucleotide sequences from N. meningitidis MC58 28,53 and the modD sequence from N. meningitidis M0579 15,27 . The BIGSdb 'Scan Tag' tool 56 was implemented for the identification of mod loci within WGS data of each isolate, returning BLAST matches greater than 30% alignment and 50% identity to the sequences stored in locus records. The nucleotide diversity of mod loci required that hits be exported to MEGA6 57 for manual alignment with previously identified (see reference alleles below) full length mod coding sequences and DNA recognition domain (DRD) based alleles. For example, alignment gaps were inserted in the phase variable repeat region of phase OFF sequences to allow comparison with the full-length translated amino acid sequence. Trimmed sequences were uploaded to the appropriate locus record for storage where they were assigned unique numeric identifiers (allele id numbers), and grouped into variants based on the DRD (corresponding to the mod alleles described below). Database alleles were flagged with information such as phase-variation status and where the open reading frame was interrupted due to insertions, deletions or point mutations (other than changes to the number of phase variable repeat units), and database alleles were assigned the flags 'internal stop codon' and 'frameshift' where necessary. These interrupted alleles were not included in the analysis of invasive versus carriage meningococci. The BIGSdb 'autotagger' and 'autodefiner' tools 56 were then used to identify mod genes within genomic data stored for each isolate, allowing tagging of nucleotide positions and database allele id numbers.
The mod loci were frequently present on different contiguous sequences of a genome assembly due to break points within phase variable regions or insertion sequences: this permitted identification of the presence of a mod gene and its allele, but not assignment of a database allele id number. In these cases, a database flag was inserted at these genomic positions to indicate a partial assembly, and isolates were considered to contain mod genes but were not included in subsequent allele analyses. This process was performed iteratively until no new alleles of the three mod genes could be identified in the genomes; at this stage, genomes without tagged mod genes were investigated using implementations of the BIGSdb BLAST tool (parameters: word size 11; reward 2; penalty -3; gap open 5; gap extend 2). For each locus variant, six hits were investigated per isolate, regardless of E-value significance, and 1000 bp of flanking sequence were extracted with the hit for investigation in MEGA6; new alleles were uploaded to the database as before, otherwise genomes were tagged as 'missing' mod loci.
To screen for mod genes and alleles in the Australian disease isolates (no WGS available), PCR and sequencing analyses was performed as previously described 27,28 . MLST analysis was performed in accordance with the scheme guidelines (http://pubmlst.org/neisseria/info/) 55 .
Sequence alignments were performed using ClustalW 58 and were exported into Jalview 59 to generate alignment Figures. The Neighbor-Net graph was generated in SplitsTree4 60 from allelic distances among the 49 rMLST loci of unique ribosomal Sequence Types (rSTs) (n = 639) 61 among clonal complexes comprising > 10 isolates in the dataset. rSTs (available at http://pubmlst.org) were annotated with the combination of mod alleles present in isolates, and clusters were labeled according to the clonal complexes of these isolates. 186 isolates were not assigned to a known clonal complex.
Statistical analyses were carried out using 2-tailed Fisher's Exact tests (Graphpad Software Inc., San Diego, CA, USA), with p-values of p < 0.05 taken to indicate statistical significance. mod gene/allele reference sequences. The modA, modB, or modD genes were defined by > 90% sequence identity along the length of the deduced amino acid sequence, excluding the variable DRD, to the modA11 and modB1 genes of N. meningitidis MC58 28 , or the modD1 gene of M0579 27 . The alleles of each mod gene were defined by ≥ 95% amino acid identity across the DRD to the reference sequences listed below.
For modD, reference sequences were as previously described 27