Introduction

DNA methylation was discovered in bacteria more than a half century ago1. It is now known that modification of the four canonical DNA bases by methylation can act as an epigenetic regulator — that is, it can impart distinct and reversible regulatory states to identical genetic sequences. In eukaryotes, epigenetic regulation can occur at multiple levels: DNA methylation, nucleosome positioning, histone variants and histone modifications. By contrast, bacteria lack histones and nucleosomes; therefore, DNA methylation is their primary means of epigenetic gene regulation.

Three different forms of DNA methylation exist in bacterial genomes: N6-methyladenine (6mA), which is the most prevalent form; N4-methylcytosine (4mC); and 5-methylcytosine (5mC). Although 5mC is the dominant form in eukaryotes, 6mA is the most prevalent form in prokaryotes. DNA is methylated by methyltransferase (MTase) enzymes, which transfer a methyl group from S-adenosyl-l-methionine (SAM) to the appropriate position on target bases (Fig. 1). Importantly, only a select few sequence motifs in each bacterial genome are targeted by MTases; for example, in Escherichia coli, 5ʹ-GATC-3ʹ is targeted by DNA adenine methylase (Dam) and 5ʹ-CCWGG-3ʹ by DNA-cytosine methyltransferase (Dcm). However, nearly every occurrence of the target motifs is methylated2. The MTase specificity domain that determines the target motif varies widely across species, resulting in a large diversity of methylated motifs across the bacterial kingdom.

Fig. 1: Primary types of DNA methylation in bacteria.
figure 1

Chemical structures are shown for the most common forms of DNA methylation in bacteria, including N4-methylcytosine (4mC), 5-methylcytosine (5mC) and N6-methyladenine (6mA). In each instance, a methyltransferase (MTase) transfers a methyl group (CH3) from S-adenosyl-l-methionine to the unmodified nucleotide, producing a methylated nucleotide and S-adenosyl-homocysteine.

MTases function either alongside a cognate restriction enzyme as part of a restriction–modification (RM) system or as ‘orphans’ that lack a cognate restriction enzyme. DNA methylation mediated by both types of MTases has been found to play important regulatory roles in bacteria2,3,4,5,6,7,8,9,10,11,12. RM systems protect cells from invading DNA by methylating endogenous DNA and cleaving non-methylated foreign DNA2,4. RM systems are divided into three main categories based on the subunits involved and the precise site of DNA restriction13,14,15,16 (Fig. 2). Orphan MTases, such as Dam in Gammaproteobacteria, are thought to regulate DNA replication and gene expression, among other functions2. There is also emerging evidence that heterogeneity in methylation patterns within bacterial populations (often caused by phase variation of MTases) can promote adaptive selection by generating heterogeneity in gene expression and cellular phenotypes beyond those provided by genetic variation alone17,18. The vast majority of the > 6,000 bacterial genomes sequenced to date have been found to encode MTases and are, therefore, likely to be subject to DNA methylation19,20. Nonetheless, the precise sequence targets and biological roles of most MTases remain unknown21, largely owing to the historical lack of high-throughput tools for detecting 6mA and 4mC. Indeed, while method development for detecting eukaryotic 5mC has flourished over the past few decades22,23,24,25, only modest methodological advances for detecting the principle forms of DNA methylation in bacterial genomes have been made over the same period.

Fig. 2: Types of restriction–modification systems.
figure 2

Restriction–modification (RM) systems are divided into three categories based on the subunits involved and the precise site of DNA restriction. Type I systems are composed of a single enzyme containing restriction, modification and specificity subunits, and multisubunit complexes are necessary for both modification and restriction. They target bipartite motifs, that is, two short sub-motifs of a specific sequence that are separated by a fixed number of nonspecific nucleotides. Cleavage can occur up to several kilobases away from the non-methylated motif site14. Type II RM systems are composed of distinct methyltransferase (MTase) and restriction enzymes that target short palindromic motifs. The restriction enzyme cleaves DNA close to or within the non-methylated motif sites15. Type III RM systems consist of complexes of multiple modification and restriction subunits, with the specificity element contained in the MTase. Short, non-palindromic sequences are targeted for methylation, and non-methylated motif sites are targeted by a separate restriction enzyme, which must bind with the MTase to achieve sequence specificity and cut at a location roughly 25 bp from the motif16. In each panel, the nucleotide targeted for methylation within a motif is indicated in bold text.

The recent introduction of new sequencing technologies is beginning to address this problem (Table 1). In particular, single-molecule, real-time (SMRT) sequencing has enabled all three major forms of bacterial DNA methylation to be detected simultaneously for the first time, and this technology has been used to generate most of the > 2,000 mapped bacterial and archaeal methylomes3,20,26,27,28,29,30,31 that are currently available in the centralized REBASE database19. These methylomes represent a wide variety of isolates from more than 750 distinct species, including common human pathogens such as Salmonella enterica (n = 150), E. coli (n = 123), Klebsiella pneumoniae (n = 93) and Staphylococcus aureus (n = 47).

Table 1 Methods currently used to detect methylated DNA in prokaryotes at single-nucleotide resolution

Here, we review the currently available methods for mapping bacterial methylomes, with a focus on cutting-edge technologies such as SMRT sequencing and Oxford Nanopore sequencing32. We discuss their potential to provide us with fully characterized methylomes that contain not only the full set of methylated positions and targeted motifs but also a complete map of the MTases and RM systems responsible for each methylated motif. We provide an overview of the insights into bacterial epigenomes that these new technologies have afforded and discuss how they might be used in the future to obtain a more complete understanding of bacterial epigenomes and the complex roles they play in defining interactions between bacteria and their host organisms. We do not attempt to review the rich history and foundations of bacterial epigenetics, which have been thoroughly reviewed elsewhere2,4,21,33,34.

Early methods for mapping methylomes

The bulk of methodological development for DNA methylation detection has historically been devoted to characterizing 5mC in higher eukaryotes, largely because the biological importance of 5mC in mammalian cells has been recognized for more than half a century35,36,37. However, characterization of bacterial methylomes requires alternative methods that can detect the more prevalent 6mA and 4mC in addition to 5mC. A number of different approaches have traditionally been used to characterize DNA methylation in bacterial genomes (Table 1).

Restriction enzyme-based mapping

Prokaryotic MTases are known to primarily target specific sequence motifs for methylation. The genome-wide methylation status of these motifs can often be deduced by digesting genomic DNA with one or more methyl-sensitive restriction enzymes of known specificities and analysing the pattern of cut and uncut restriction sites by next-generation sequencing (NGS)38,39. This approach is robust, reliable and accurate but is limited to the study of methylation motifs that perfectly or partially match the known specificities of available restriction enzymes. Thus, although still useful for assessing methylation events within known sequence motifs, it is generally not suitable for discovering new motifs.

Sanger sequencing-based mapping

Theoretically, the most common forms of bacterial DNA methylation can be detected as a by-product of Sanger sequencing because the presence of 4mC, 5mC and 6mA in the DNA template affects the amplitude of peaks in the sequencing trace. Although several studies have used this method to investigate the methylomes of pathogenic bacteria40,41,42,43,44, technical limitations, including subtle peak signatures and the low throughput of Sanger sequencing, have prevented it from achieving wider usage32.

Bisulfite sequencing-based mapping

Despite its reputation as the gold standard for characterizing 5mC in eukaryotic genomes (owing to its high sensitivity, accuracy and compatibility with NGS technologies), bisulfite sequencing has only quite recently been applied to the study of 5mC in bacteria45,46. More recently, it has been shown that treating genomic DNA with ten–eleven translocation (TET) enzymes before bisulfite treatment makes it possible to characterize both 5mC and 4mC bacterial methylomes47. Before bisulfite treatment, TET enzymes oxidize 5mC to 5-carboxylcytosine (5caC), which is subsequently read as a thymine in bisulfite sequencing data. By using a combination of standard bisulfite sequencing and 4mC-TET-assisted bisulfite sequencing (4mC-TAB-seq), it becomes possible to distinguish some fraction of 4mC positions from the 5mC positions; ideally, a sufficient number are detected to permit identification of the 4mC motif47. However, 6mA remains undetectable with this approach.

Newer methods for mapping methylomes

Recent innovations in so-called third-generation sequencing technologies, including SMRT sequencing and nanopore sequencing (Fig. 3), make it possible to directly interrogate native DNA molecules without PCR amplification. Importantly, these approaches retain chemical modifications in the DNA and enable many of them to be detected for the first time at a genome-wide scale.

Fig. 3: Technologies for detecting DNA methylation through direct sequencing of native DNA molecules.
figure 3

a | Libraries for single-molecule, real-time (SMRT) sequencing from Pacific Biosciences consist of double-stranded DNA fragments flanked by hairpin SMRTbell adaptors that permit the polymerase to process both strands of the template. The libraries can be configured to accommodate the requirements of the specific application: short-insert libraries generate multiple subreads from both strands of the template molecule, which is useful for generating higher accuracy consensus subreads, and long-insert libraries are used to generate the longest possible subreads, which is critical for de novo assembly and detection of structural variants. b | SMRT sequencing relies on a sequencing-by-synthesis approach. A DNA polymerase is bound within a zeptolitre-scale observation chamber, called a zero-mode waveguide (ZMW), and uses a strand from the native sequencing library as a template for the read, incorporating fluorescently labelled deoxyribonucleoside triphosphates (dNTPs) as they diffuse into the ZMW. Each incorporated dNTP is briefly immobilized at the polymerase active site, emitting a fluorescent pulse in the corresponding colour channel. c | When observing the fluorescent traces produced by each ZMW, which are highly multiplexed on a chip, the order of pulses provides the read sequence, and pauses between pulses indicate the presence of a covalent modification in the template DNA. d | The 1D library preparation from Oxford Nanopore Technologies (ONT) uses a lead adaptor (loaded with a motor protein) and a tethering adaptor, which helps co-locate the molecule near the nanopore, to enable the sequencing of a single DNA strand from the molecule58. e | ONT sequencing instruments rely on engineered biological nanopores embedded in a lipid membrane to sequence single-stranded DNA (ssDNA). A voltage potential is applied across the membrane, and ssDNA is ratcheted through the nanopore by a motor protein bound to the DNA library molecule. f | The ionic current flowing through the nanopore depends on the precise set of nucleotides occupying the constriction point. Methylated nucleotides in the ssDNA introduce distinct current patterns, making it possible to detect the existence of modified bases relative to non-methylated DNA or precomputed models. For clarity, only two changes in current levels are shown in each box.

Direct detection of DNA methylation with single-molecule, real-time sequencing

SMRT sequencing, which is available in the commercialized RS II and Sequel instruments manufactured by Pacific Biosciences, is the first third-generation sequencing technology with a record of successfully characterizing bacterial methylomes. SMRT sequencing can simultaneously report nucleotide sequence and all three major types of DNA methylation in bacteria, albeit at different sensitivities (high for 6mA, moderate for 4mC and low for 5mC) owing to the signal-to-noise ratios specific to each modification type3,18,20,27,28,48.

In SMRT sequencing, each template molecule consists of a double-stranded native DNA fragment that has been circularized by ligating hairpin adaptors to each end26 (Fig. 3a). A DNA polymerase enzyme is bound to the hairpin-adapted template molecule, and a specially designed surface chemistry immobilizes the polymerase–template complex at the base of a zeptolitre-scale observation chamber, called a zero-mode waveguide (ZMW), which limits background fluorescence originating outside this small observation chamber26,49 (Fig. 3b). Sequencing by synthesis is initiated, and the DNA polymerase proceeds around the circularized DNA template multiple times, generating multiple subreads. During each base incorporation event, a fluorescently labelled deoxyribonucleoside triphosphate (dNTP) complementary to the template base is briefly immobilized in the ZMW observation window by the polymerase. A camera captures the resulting fluorescent pulse, and because each of the four canonical bases has a different fluorescent label, the series of pulses observed as the complementary DNA strand is synthesized can be used to construct the sequence read. Although base calling is accomplished by monitoring the order of dNTP incorporation events, DNA modifications are detected by identifying changes in the kinetics of the polymerase as it translocates along the DNA template (Fig. 3c). Specifically, the polymerase kinetics is described by the inter-pulse duration (IPD), which is the time interval between pulses that indicate nucleotide incorporation events. It has been shown that the IPD can be perturbed by the primary and secondary structures of the DNA template molecules26 and by covalent DNA modifications, including 4mC, 6mA, 5mC, 5-hydroxymethylcytosine (5hmC) and other types of DNA methylation and damage27,48,50,51. The signal-to-noise ratio for 4mC and 6mA is sufficiently high that they can be directly detected in native DNA3,18,48. However, detection of 5mC and 5hmC requires either high sequencing depth or additional steps to convert them to 5-formylcytosine and 5caC, both of which have higher signal-to-noise ratios48,51.

In order to detect modified nucleotides, IPD values from native DNA sequencing data can be compared with control IPD values from either methylation-free whole-genome amplified (WGA) DNA or precomputed in silico IPD models. The in silico model is trained using large amounts of sequencing data from unmodified DNA and consists of predicted IPD values for a given local sequence context28. The local sequence context surrounding the site of nucleotide incorporation strongly affects the processivity of the polymerase, and resulting fluctuations in IPD values must, therefore, be accounted for when looking for IPD deviations caused by a base modification event27,48. Owing to the extent of contact between the polymerase and the template DNA molecule, a modified base can affect the IPD values both upstream and downstream of the modified position. The resulting IPD signatures differ between 6mA, 4mC and 5mC, so it is usually possible to assign a methylation type to an observed methylation motif20,27. The vast majority of SMRT methylome studies have used a consensus approach to assess IPD values in aligned native reads from data of high sequencing depth. By comparing the native IPD values at a genomic position with either the predicted unmodified IPD value (using the in silico model) or a distribution of control IPD values obtained from WGA DNA, a simple statistical test can be used to identify methylated positions where the native and control IPD values diverge3,27,52. Alternative methods and statistical models have been proposed for methylation detection in conditions where the sequencing depth is expected to be low. For example, in metagenomic sequencing, low-abundance species would not be expected to be well covered, and for bacterial populations exhibiting heterogeneous methylation, one would not expect to find uniform methylation patterns in a given sample18,53.

Direct detection of DNA methylation with nanopore sequencing

Nanopore sequencing has been under active development for decades, but recent progress has led to the release in 2014 of MinION, the first commercially available sequencing platform by Oxford Nanopore Technologies54,55,56,57,58,59,60,61 and, more recently, the release of the GridION and PromethION. In this technology, genetically engineered protein nanopores are placed in a lipid membrane, across which a voltage is applied to drive the negatively charged single-stranded DNA (ssDNA) through the nanopores for sequencing. Multiple protocols are available for constructing libraries for nanopore sequencing, which all involve the ligation of adaptor sequences coupled with a motor protein to double-stranded DNA (dsDNA) fragments (Fig. 3d). The tethering adaptor sequences help to concentrate the DNA fragments near the nanopore-containing lipid membrane, while the motor protein facilitates the processive ratcheting of ssDNA through the protein nanopore at a fixed rate during sequencing. Sensors monitor the current through each nanopore during this process and detect variations caused by the translocation of the polynucleotide strand obstructing the channel (Fig. 3e). These current fluctuations are a function of the roughly six specific nucleotides occupying the constricted part of the nanopore channel at a given moment (Fig. 3f) and are processed by a proprietary recursive neural network to construct the sequence of nucleotides in the read61.

Although the vast majority of research applications to date have focused on using MinION to call the four canonical bases, current signals have been shown to differ between canonical bases and covalently modified nucleotides, which provides the possibility of detecting chemical DNA modifications62. However, the presence of modified bases can potentially complicate the base calling process, which relies on characteristic current levels produced as each k-mer (a combination of nucleotides of length k) passes through the nanopore. The presence of multiple types of base modifications greatly expands the set of possible k-mers beyond those constructed exclusively from the four canonical bases, which introduces substantial computational challenges.

Early attempts to detect methylation during nanopore sequencing, using a variety of protein nanopore configurations and experimental conditions, focused on eukaryotic applications and, therefore, were limited to 5mC and 5hmC detection54,55,61,63. However, the introduction of the MinION device has recently broadened the development focus towards characterizing prokaryotic methylomes64,65,66,67. For instance, a variable order hidden Markov model (HMM) was trained to identify methylation events in bacterial genomes64. When paired with a hierarchical Dirichlet process (HDP) to learn current distributions from the MinION device, it can detect both 5mC and 6mA at the specific motifs included in the training data. This HMM–HDP approach can also detect these modifications in individual reads, albeit at substantially lower sensitivities than when it is applied to consensus current signals from multiple aligned reads. However, the model is constrained by the contents of its training data, which limits its ability to identify novel modification types or novel methylated motifs. Although encouraging, such model-based approaches remain limited in their ability to identify DNA methylation at various sequence contexts. A preprint article has described an alternative method for nanopore-based methylation detection that uses a statistical comparison of current signals from native and methylation-free WGA DNA66 and that builds upon the design first proposed for detecting base modifications in SMRT sequencing27. This approach is not limited to the detection of DNA methylation at specific sequence motifs and has detected several expected 4mC, 5mC and 6mA motifs in bacterial genomes carrying MTases of known specificity, although the detection accuracy fluctuates with different methylation types and motif specificities66. Although encouraging, detection is not yet possible at the level of single molecules, and methods such as this one that do not require a priori knowledge may not be able to distinguish between diverse forms of DNA modification events, especially in eukaryotic genomes68.

These studies have demonstrated the feasibility of nanopore-based methylation detection; however, some challenges remain. For instance, none of these nanopore-sequencing-based methods has been applied for the biological characterization of an unknown bacterial methylome. Nonetheless, the rapid pace of method development in this field and the ongoing technological advancements in the underlying sequencing technology make nanopore sequencing a promising field to watch.

Methylation motifs and methyltransferases

Comprehensive mapping of a bacterial methylome requires more than just the detection of methylated nucleotides; it also requires identification of methylation motifs and the MTases responsible for the observed methylation patterns (Fig. 4).

Fig. 4: Steps for comprehensive characterization of a bacterial methylome.
figure 4

Methylome characterization is increasingly becoming a standard component of bacterial genomic research. The detection of methylated positions can lead to the identification of precise methylated sequence motifs. A methylated motif can then be assigned to the responsible methyltransferase (MTase) based on either querying a database of MTases with known target motifs or through experimental means that involve comparing wild-type strains with strains where the MTase is inactivated. Multiple lines of functional investigation can lead from this basic characterization of the primary features of a bacterial methylome. SNPs, single-nucleotide polymorphisms.

Identifying methylation motifs

DNA methylation events in prokaryotic genomes are highly motif-driven for all three of the primary methylation types. If a methylation motif is targeted by an MTase, typically >95% of motif occurrences are methylated2,18,20,53. Historically, methylation motifs for novel type II RM systems have been identified through restriction digest approaches, as in these systems, restriction occurs precisely within the specific sequence motifs. However, the restriction site cannot serve as a proxy for the methylation motif in type I and type III RM systems, in which restriction occurs at a variable distance from the site of methylation69 (Fig. 2). As a result, there was until recently a notable paucity of known type I and type III RM systems contained in REBASE. However, the introduction of SMRT sequencing and the resulting accumulation of methylome surveys have contributed to a rapidly growing catalogue of known bacterial RM systems19,29,30,70,71,72 (Fig. 5). The initial output of a methylation survey is a list of genomic positions that are likely modified. In order to infer the methylation motif from this list, tools such as MEME73 can be used to build a consensus motif from the sequence context immediately surrounding the modified position.

Fig. 5: The accelerating pace of methyltransferase discovery.
figure 5

Historically, type II methyltransferases (MTases) have been the most amenable to discovery, primarily through restriction enzyme digest and fragment analysis. By contrast, restriction enzyme digest is not well suited to de novo discovery of methylated motifs in type I and type III restriction–modification systems because the cut sites of cognate restriction enzymes are typically located at a variable distance from the methylated motif site. The introduction of methylation detection using single-molecule, real-time (SMRT) sequencing in 2010–2012 resulted in a surge of newly discovered MTases belonging to these systems.

Identifying methyltransferases responsible for motif methylation

Gene prediction tools and homology search tools, such as SEQWARE74, are often used to identify genes likely to encode components of an RM system, including subunits responsible for restriction, specificity and MTase activity. These components are typically encoded by genes proximal to each other in the genome and can be classified by RM system type (Fig. 2) on the basis of type-specific functional domains. Once classified by type, the characteristic methylation properties of the different RM system types can be leveraged to narrow down the list of putative MTases responsible for an observed methylation motif. For instance, type I MTases target complementary bipartite motifs on both strands, while type III MTases target contiguous, non-palindromic motifs on a single strand. After narrowing the set of candidate MTase genes, the sequences of these candidates can be queried against MTase sequences with known motif specificities in REBASE19, and a high-quality sequence match is often sufficient for a confident mapping20,28. In the absence of a high-quality MTase match in REBASE, two experimental approaches can be used to identify the MTase responsible for an observed methylated motif. The first relies on heterologous expression of the putative MTase gene in an otherwise non-methylated host, such as E. coli ER2796 (refs3,28,52,75,76). Alternatively, the putative MTase can be subjected to an inactivating mutation, where the mutation is either introduced experimentally29,77 or occurs naturally in a related strain78,79. If heterologous expression of the MTase results in methylation of the motif in question or if inactivation of the MTase abolishes methylation at that motif, the causal role of that MTase is confirmed.

Insights into restriction–modification systems from methylome analysis

RM systems often represent an important obstacle to genetic manipulation of an organism by leading to low transformation efficiencies. The design of effective shuttle vectors must, therefore, either include compatible methylation patterns to provide protective methylation or limit the number of motif sites in the vector that are subject to restriction by the host RM system80,81. Both of these approaches require a thorough understanding of the host RM repertoire and benefit from a comprehensive catalogue of known RM systems and specificities.

Perhaps the most surprising observation to come from the multitude of prokaryotic methylome studies is the remarkable diversity of MTase genes and target specificities. A recent survey of 230 diverse bacterial and archaeal epigenomes, which was enabled by SMRT sequencing, found DNA methylation in 93% of genomes across a wide diversity of methylated motifs (834 distinct motifs; averaging three motifs per organism)20. The primary driver behind this diversity is the spread of MTase-containing mobile genetic elements through horizontal gene transfer (HGT)20,82,83. Mutation events can also occur in the target recognition domain of MTase genes and thereby modify the sequence motif targeted for methylation, providing a route to further methylome diversification30. As a consequence of such diversification, researchers commonly find substantially divergent methylomes not only among species but also among different strains of the same species29,72,84,85,86,87,88,89.

Insights into methylation types

The recent surge of studies devoted to the characterization and functional examination of bacterial methylomes has built upon decades of previous work, most of which has relied on experimental approaches focused on a handful of loci in a relatively small number of well-characterized genomes4,10,38,90,91,92. The hard-won insights from these foundational studies have long hinted at an unappreciated level of complexity and regulatory potential present in modifications to the four canonical bases. Genome-wide mapping by modern methylation detection technologies is shedding new light on the distribution and roles of the three primary forms of DNA methylation in bacteria.

N4-methylcytosine

The extent of 4mC in bacterial genomes is not well understood, and its function largely remains a mystery, although it is known to be involved in multiple RM systems. 4mC occurs less frequently than 6mA in all bacteria but has been observed more often in thermophilic bacteria than in non-thermophilic bacteria, possibly because it is substantially more resistant to heat-induced deamination than 5mC, the other form of cytosine methylation found in bacteria93,94. RM-based analysis and modified bisulfite methods have been used to map 4mC in a number of bacterial genomes, including Caldicellulosiruptor kristjanssonii and Enterococcus faecalis47,81,95,96. However, SMRT sequencing is currently the most broadly applied method for 4mC detection. A variety of 4mC motif specificities have been identified in a range of species, including Bacillus cereus28, Helicobacter pylori29,30,70, Campylobacter jejuni71 and S. enterica72. Despite this progress in mapping the distribution of 4mC, its biological functions remain largely unclear. Only a single published study on the gastric carcinogenic bacterium H. pylori provides new insight into its potential functions. Deletion of the 4mC MTase M2.HpyAII altered the expression of 102 genes, resulting in decreased adherence to a human gastric adenocarcinoma cell line, reduced potential to induce inflammation and a diminished capacity for natural transformation97.

5-methylcytosine

Dcm, the orphan cytosine MTase of E. coli, has been the subject of study for several decades, and its target specificity of 5ʹ-CCWGG-3ʹ has long been known98. The EcoRII RM system is known to protect bacteria against parasitism99, and methylation by Dcm has been associated with Tn3 transposition100, lambda phage recombination101 and the expression of ribosomal proteins during stationary phase102. However, more general insights into the biological role of 5mC in bacteria have remained somewhat elusive.

Two recent studies have taken advantage of the genome-wide and single-nucleotide resolution of bisulfite sequencing to thoroughly investigate 5mC function in Gammaproteobacteria. In the first study, deletion of Dcm and the resulting suppression of methylation at 5ʹ-CCWGG-3ʹ in E. coli resulted in increased expression of the RNA polymerase sigma factor rpoS gene and many of its target genes during stationary phase45. The second study revealed that methylation of 5ʹ-RCCGGY-3ʹ by the cytosine MTase VchM is required for optimal growth in Vibrio cholerae and affects the cell envelope stress response, potentially by downregulating genes required for modifying the lipopolysaccharide inner core of the cell envelope46. While it is tempting to conclude from these studies that 5mC in bacteria is a suppressor of gene expression, more work will be needed to confirm this role — particularly as neither study demonstrated direct regulation of transcription by 5mC methylation.

SMRT sequencing has revealed the 5mC motif specificities of active cytosine MTases in a variety of bacterial species and strains29,80,84,88,103. Identification of 5mC methylated positions in isolates of Neisseria meningitidis showed them to be mutational hot spots, indicating that 5mC methylation may play a role in genome plasticity and evolution84. An improved understanding of 5mC motif specificities has also facilitated the design of plasmids capable of overcoming barriers to transformation in an important strain of Bifidobacterium animalis80, thereby enabling the molecular mechanisms underlying the observed correlations between bifidobacteria and gut health to be studied104.

N6-methyladenine

Although traditional RM digestion-based approaches were used in a recent genome-wide mapping study of 6mA105, the majority of bacterial studies have adopted SMRT sequencing. The abundance of 6mA MTases in the bacterial world and the robust IPD signature generated by 6mA during SMRT sequencing have led to the discovery of a vast diversity of 6mA MTases and methylated motifs in bacteria, which include many previously unknown orphan MTases and a multitude of previously uncharacterized type I, II and III RM systems19,20 (Fig. 5). The elucidated 6mA methylated motifs span a wide variety of organisms across multiple phyla, including Bacteroidetes87, Firmicutes81,89,106,107, Actinobacteria31,80,85 and Proteobacteria28,29,30,70,71,72,76,77,78,84,86,88,103,108,109. Functional knockout studies in many of these organisms highlight the ability of certain 6mA MTases to induce widespread transcriptional changes3,18,31,106,110,111, while other work has revealed differentially methylated 6mA positions in response to varied growth stages and environmental conditions31,77,103.

Researchers have also taken advantage of SMRT sequencing to explore mechanisms of bacteriophage invasion and host defence. For instance, 936-type bacteriophages, which commonly infect Lactococcus lactis starters used in cheese production, have been shown to encode multiple 6mA MTases75, which likely provide the bacteriophages with protective methylation that allows them to circumvent host RM systems. Conversely, the bacteriophage exclusion system is a gene cassette that confers bacteriophage resistance in a wide range of host bacteria. Interestingly, although activity of a 6mA MTase in the cassette is required for successful host defence, phage DNA does not appear to be targeted for restriction, which suggests a novel mechanism of methylation-based host defence112.

Insights into epigenetic regulation

In addition to providing a better understanding of the modifications and modifiers involved in DNA methylation, advances in methylation detection methods are also starting to reveal their biological functions and, in some cases, the mechanisms by which they exert their biological effect.

Methylation as a cellular regulatory signal

Several MTases have been shown to be capable of inducing consequential shifts in gene expression3,45,46,110,113. For instance, in the local competition model, competitive binding between an MTase and other DNA-binding proteins (such as transcription factors) at specific motif sites affects transcription of a nearby gene, leading to phenotypic variation within the bacterial population6,114,115,116 (Fig. 6a). In effect, the MTase methylates specific targets in some fraction of the population, thereby inducing or repressing local transcription in this fraction of the population. Canonical examples of this model in E. coli include the transcriptional regulation of the pyelonephritis-associated pili (pap) operon and the outer membrane protein-encoding gene agn43 by Dam methylation at nearby 5ʹ-GATC-3ʹ sites114,115,117. In the case of pap, which is required for adherence of uropathogenic E. coli to the host urinary tract, the local processivity of Dam is hindered by the sequence context of the 5ʹ-GATC-3ʹ in the pap promoter, which provides more time for the DNA-binding proteins to access their target sites and thereby skews the competition in their favour118. In both examples, methylation provides a means for modulating the antigenic profile of the population, thereby playing a role in immune evasion of host-adapted pathogenic bacteria.

Fig. 6: Epigenetic mechanisms of gene regulation and their consequences.
figure 6

a | The methylation status at motif sites within the upstream regulatory sequence of a gene can affect its expression. The presence of methylated bases in this region can interfere with the binding of regulatory proteins, leading to either upregulation or downregulation of the gene. For instance, methylation can prevent a transcription factor (TF) from binding to its TF-binding site (TFBS), thereby preventing transcription of the downstream gene. b | Phase-variable methyltransferases (MTases) are capable of inducing genome-wide changes in methylation status and gene expression. Spontaneous and reversible frameshift mutations, often caused by slipped-strand mispairing in tandem repeat sequences during replication, induce inactivating premature stop codons in the gene encoding the modification (M) subunit. Cells containing the inactivated form of the M subunit lack methylation at the target motif, thereby providing a clonally expanded bacterial population with divergent methylation activity and distinct gene expression regimes69. In type I and type III restriction–modification (RM) systems where ON/OFF phase variation is most common, restriction activity requires both restriction (R) subunits and full-length M subunits. Therefore, both the methylation and restriction functions of these RM system are toggled ON and OFF by these frameshift mutations. c | Genetic rearrangements, such as inversion events, within the gene encoding the specificity (S) subunit can result in the expression of multiple specificity alleles. When paired with an active M subunit, these diversified S subunits target multiple motifs for methylation106. d | If a gene affected by methylation status encodes a TF or another protein with promiscuous DNA-binding specificity, the local methylation status can potentially trigger a cascade of downstream changes on gene expression. e | DNA methylation is likely to be involved in alternative mechanisms of gene regulation. For example, methylation is known to affect the curvature of DNA molecules, which could potentially control which regions of a chromosome are exposed to the transcriptional machinery of the cell, thereby affecting gene expression.

Traditional methylation detection approaches have identified examples of antigenic variation in other Gammaproteobacteria that are generated by competition between Dam and DNA-binding proteins, including in the leucine-responsive regulatory protein (Lrp) and the oxidative stress response protein (OxyR)2,4,119,120,121,122. However, the prevalence of this type of epigenetic regulation was revealed only when it became possible to systematically map the frequency and distribution of non-methylated sites with SMRT sequencing. Studies have reported several hundred non-methylated motif sites across various bacteria31,85,103,110,123; these sites are enriched at gene regulatory regions2,4,20, which suggests they are involved in competitive regulation of gene expression. Although detailed mechanisms remain to be identified in most cases, it has recently been shown through SMRT sequencing that variable site-specific Dam methylation at a 5ʹ-GATC-3ʹ motif in the regulatory region of the opvAB operon of S. enterica is responsible for determining the O-antigen chain length, which is a major determinant of phage resistance in S. enterica124,125.

DNA methylation also exerts critical regulatory signals during DNA replication. For example, specific motif sites in the replication origins of E. coli and Caulobacter crescentus (5ʹ-GATC-3ʹ and 5ʹ-GANTC-3ʹ, respectively) must be methylated for replication to occur6. Furthermore, Dam methylation of a hemimethylated 5ʹ-GATC-3ʹ site in the promoter of the transposase gene of IS10 represses its transcription during replication, presumably to ensure that potential transposition occurs only when a cell contains more than one copy of the chromosome10. Clues to the biological function of an MTase can occasionally be found by identifying genomic regions that are enriched for the methylation target motifs. For instance, enrichment of the Dam 5ʹ-GATC-3ʹ target motif near the origin of replication in E. coli and other Gammaproteobacteria has been well documented and is linked to its roles in the initiation of replication77,92. SMRT sequencing has revealed further examples of enrichment of 6mA motif sites near origins of replication in Arthrobacter and Nocardia, which indicates that this phenomenon may not be limited to Gammaproteobacteria20. Examples of over-enriched and under-enriched motif sites at other regions of the genome have been identified by SMRT in a wide variety of bacteria3,20,31,46,103, which could provide important clues about the biological purpose of the MTases responsible for their methylation.

Phase variation and epigenetic heterogeneity

Phase variation of bacterial surface proteins, caused by reversible mutations at a hypervariable locus126,127,128, has long been recognized as a mediator of antigenic variation and immune evasion. However, the importance and extent of phase-variable MTases have only more recently become apparent. Hypervariable mutations in the regions regulating and encoding these MTases can result in cell-to-cell differences in their expression (which affects the methylation status of their targets; Fig. 6b) or in their target specificity (which results in methylation of a different set of targets; Fig. 6c). Consequently, heterogeneous methylation patterns can develop within a clonally expanded population, which often has dramatic and genome-wide regulatory consequences69,129,130,131,132. The set of genes affected by a particular phase-variable MTase is called a phase-variable regulon or a phasevarion69,130. Note that this phenomenon is different from the example of phase variation of surface proteins described in the previous section, in which population-level variation in methylation at specific motif sites is caused by competition between MTases and DNA-binding proteins and not by phase-variable MTases.

Phase-variable MTases were first observed almost two decades ago as hypervariable inversion events in the hsd genes of Mycoplasma pulmonis133 and Streptococcus pneumoniae134, which encode the type I RM system of these bacteria. Further examples were subsequently uncovered in Pasteurella haemolytica135, Moraxella catarrhalis136, Haemophilus influenzae130,137,138,139, H. pylori140,141,142 and N. meningitidis79,139,143,144. Their biological importance was quickly appreciated owing to their effects on multiple genes throughout the genome, but in the absence of techniques to determine the underlying motif-specific methylation events, their phase-variable behaviour could be inferred only indirectly, and the precise mechanisms by which they affect gene expression remained unknown. SMRT sequencing has since been used to characterize the target motifs and methylation sites for a number of previously identified phase-variable MTases from a range of bacteria, including ModA and ModD in N. meningitidis109,144, ModM2 in the human respiratory pathogen M. catarrhalis78 and ModA in H. influenzae76,130,137,138,139. It has also provided insights into the mechanisms that give rise to phase-variable MTases and how they regulate phasevarions. Studies aiming to characterize how phase-variable MTases in H. pylori contribute to its highly complex methylome identified multiple phase-variable MTases generated by slipped-strand mispairing in homopolymer tracts as well as an unusual type I MTase that targets multiple bipartite motifs by interacting with several target recognition domain elements; this process can generate methylome diversification through recombination within the specificity subunit (S subunit)29,30. Although it had been previously shown that phase-variable MTases in H. pylori are capable of regulating phasevarions142, SMRT sequencing demonstrated the importance of the ModH5 allele of the phase-variable ModH MTase in regulating virulence genes in H. pylori145. A study in S. pneumoniae found that a previously observed phase variation of the type I hsd system134 is capable of inducing dramatic changes in the bacterial methylome. This example is one of the most complex phasevarions characterized to date: reconfiguration of five target recognition domains in the S subunit leads to six possible MTase alleles, each with its own target specificity106. Taken together, these findings have deepened our understanding of previously identified phase-variable MTases.

Other studies have taken advantage of the hypothesis-free nature of analysing methylomes with SMRT sequencing to uncover novel phase-variable MTases in other pathogenic bacteria. For example, SMRT sequencing led to the recent discovery of MTase phase variation in the human gastric pathogen C. jejuni108 and the bovine respiratory pathogen Bibersteinia trehalosi88. The phase variation in C. jejuni was shown to affect cell adherence, invasion and biofilm formation, but additional study is required to determine the functional consequences of MTase phase variation in B. trehalosi. Use of a software package named SMALR, which was developed to extract single-molecule-level methylation status from SMRT sequencing data, revealed a new type of epigenetic heterogeneity in the marine bacterium Chromohalobacter salexigens18, in which methylation is dispersed across some, but not all, instances of a target motif. The biological reason for this observed pattern of incomplete methylation is unknown.

There is now a wealth of evidence indicating that MTase phase variation is a crucial survival mechanism for host-adapted bacteria. Variability in methylation patterns has been observed to affect gene expression and phenotypes, but future work will be required to clarify the precise mechanisms through which methylation regulates gene expression.

Epigenetic regulation of clinically important phenotypes

Of the many molecular and cellular phenotypes regulated by DNA methylation, clinically important phenotypes are of particular interest. Previous studies using traditional methods hinted at the clinical relevance of bacterial methylation. For instance, methylation of 5ʹ-GATC-3ʹ by Dam in Salmonella enterica subsp. enterica serovar Typhimurium was shown to be essential for virulence146,147. More recent studies have linked additional clinically important phenotypes to bacterial DNA methylation, and many have used SMRT sequencing to associate specific methylation motifs targeted by phase-variable MTases with particular phenotypes.

For instance, the ModA11 and ModA12 alleles of the phase-variable ModA MTase in N. meningitidis have been linked to sensitivity to several antibiotics that are typically prescribed for meningococcal disease. The phase-variable ModD MTase has also been linked to hypervirulent strains of the same pathogen79,144. The phase-variable MTase ModM in M. catarrhalis has potential roles in colonization, infection and immune evasion in humans. Specifically, a recent study observed a significant enrichment of the ModM3 allele over the more common ModM2 allele in middle ear isolates from individuals with otitis media, which suggests that genes regulated by ModM3 methylation play a part in colonization and infection78. Specific ModA alleles of H. influenzae were selected for in vivo during progression of otitis media in chinchillas, suggesting a role for DNA methylation in H. influenzae colonization and infection76. Additionally, experiments using locked variants of these phase-variable ModA alleles demonstrated regulation of a variety of clinically important pathways, such as immune evasion, biofilm formation, antibiotic susceptibility, virulence and niche adaptation76,148. These results corroborate orthogonal studies that implicate ModA phase variation as an important regulator of virulence and immune evasion149,150. In S. pneumoniae, the MTase of the SpnD39III RM system possesses six specificity alleles that are generated through genetic rearrangement and that target different motifs for methylation. These alleles have different virulence phenotypes and are selected at various stages of colonization and infection106.

Collectively, these studies imply that many bacterial pathogens exploit epigenetic switches as a flexible mechanism to regulate gene expression during host colonization and infection. Some of these mechanisms may serve as targets of potential therapeutic intervention strategies.

Towards deeper mechanistic insights

The first step in studying the functional impacts of bacterial DNA methylation is to compare global gene expression between wild-type strains and MTase mutant strains. A number of studies that used RNA sequencing for such comparisons have shown that perturbation of a single DNA MTase often results in differential expression of tens or hundreds of genes, reaching as many as a thousand in some cases3,18,31,106,110,111. These data highlight that the effects of DNA methylation in the regulation of bacterial gene expression have been underestimated but also reveal some unexpected findings. In some cases, the regulatory effects of MTases can be conclusively traced to methylation at the promoters of target genes. For instance, the ModH5 MTase in H. pylori has been shown to regulate the activity of the gene flagellin A (flaA) via methylation in the flaA promoter145. However, generally, only a small proportion ( <10%) of the differentially expressed genes have methylated sites in their promoter regions3,45,46,110, which implies that the local competition model, in which a DNA MTase and other DNA-binding proteins compete for binding at the promoter of a gene, does not apply to most differentially expressed genes (Fig. 6a). Another possibility is that the methylation status at individual motif sites might regulate the expression of a transcription factor, causing a broad downstream shift in the expression of its target genes (Fig. 6d). In order to determine which mechanisms are at work, specific methylation sites must be individually mutated using genetic tools such as site-directed mutagenesis114,115,116. Multiple studies have observed a positive correlation between the number of methylation sites in a gene and the fold change of expression between wild type and MTase mutants3,46, suggesting that epigenetic regulation of expression may in fact be driven by multiple methylation sites in both the promoter region and the gene body. Another intriguing hypothesis relates to the effect of DNA methylation on the chromosome topology151,152,153, whereby methylation induces structural changes that alter the repertoire of genes exposed to the cellular transcriptional machinery (Fig. 6e).

Comparisons with eukaryotic methylomes

Analyses of DNA methylation in eukaryotic genomes have focused on 5mC. However, even with the advent of second-generation and third-generation sequencing technologies, functional studies of 5mC in the bacterial kingdom have been rare because it is less prevalent than 6mA. Thus, comparisons between bacterial and eukaryotic methylomes have not been feasible until the recent discovery of 6mA in a number of eukaryotes154, including algae155, fungi156, worms157, insects158 and mammals159,160. These studies have revealed diverse functions for eukaryotic 6mA, including the regulation of gene expression157,158,160, regulation of transposon mobility158,160 and crosstalk with histone variants and histone modifications157,160.

The genomic distribution of 6mA differs between prokaryotes and eukaryotes. The frequency of 6mA (as a fraction of the total number of adenine residues in the genome) is orders of magnitude lower in most eukaryotes than in prokaryotes68,154. Furthermore, eukaryotic 6mA events are much less motif-driven than those in prokaryotes. For example, very few occurrences (often < 3%) of 6mA motifs identified in the genomes of Chlamydomonas reinhardtii, Caenorhabditis elegans, Plasmodium falciparum and mouse embryonic stem cells (mESCs) are methylated68,155,157,160,161. One likely explanation for these observations is that modified 6mA sites are not targeted by cognate restriction enzymes in eukaryotes and, therefore, do not need to be located at specific sequence motifs. Another possible reason is that MTases have limited access to eukaryotic DNA because it is packaged in nucleosomes, and thus only exposed motifs will be methylated.

Despite these important differences, some commonalities do exist. For example, 6mA events are known to inhibit the transcription of a form of transposon called insertion elements in bacteria10,162, which is analogous to the observed enrichment of 6mA events at transposons in both C. elegans and mESCs157,160. More fundamentally, the intrinsic properties of 6mA and its effect on DNA conformation are expected to be consistent between bacteria and eukaryotes151, although different organisms may exploit these properties in different molecular and cellular contexts. Complete high-resolution maps will be the foundation for future comparisons of bacterial and eukaryotic 6mA methylomes. Although SMRT sequencing and Oxford Nanopore sequencing hold great promise for mapping DNA methylation in bacteria, their successful application to eukaryotic genomes faces critical challenges stemming from the scarcity of modified sites and the lack of clear target motifs. As recent work has suggested, 6mA detection in eukaryotes requires cross-validation by integrating complementary sequencing technologies with molecular technologies based on restriction enzymes and 6mA antibodies68.

Future perspectives

Integration with orthogonal assays for mechanistic insights

Technological breakthroughs have made it easier than ever to map bacterial methylomes. However, comprehensive studies will be necessary to fully characterize the precise mechanisms by which DNA methylation modulates gene expression and alters bacterial phenotypes. Such studies would benefit from a richer collection of functional genomics data (such as transcription factor binding assays) from many bacterial species, across different genetic backgrounds (wild type and MTase mutants) and in conditions of growth or stress. These experiments must be followed by genetic experiments that mutate and characterize specific methylation sites. In addition, future studies could test the hypothesis that the thermodynamic effect of DNA methylation induces conformational changes to a bacterial chromosome, rendering previously inaccessible genes accessible to the transcriptional machinery151,163 (Fig. 6e). Chromatin conformation capture analyses, such as Hi-C, can be used to elucidate the effects of bacterial DNA methylation on DNA conformation and, consequently, on gene transcription1,54.

Phasevarions in vaccine development

The ability of phase-variable MTases to activate antigenic diversity in host-adapted pathogens (Fig. 7) makes them very relevant for vaccine development. Highly diverse and variable antigens do not make good vaccine candidates and are typically avoided. However, genes for outer membrane proteins or other antigens that lack simple tandem repeats (which are common indicators of phase variation) might still be subject to variable expression if they are part of a phasevarion164. Indeed, it has been shown that multiple vaccine candidates are likely subject to this epigenetic means of antigenic variation. Thus, identifying phase-variable MTases and their phasevarions in host-adapted pathogens76 is likely to facilitate more effective vaccine development.

Fig. 7: Phenotypic consequences of epigenetic heterogeneity.
figure 7

a | The presence of phase-variable methyltransferases (MTases) can introduce heterogeneous methylation patterns in a clonally expanded bacterial population, leading to subpopulations with distinct gene expression regimes and phenotypes. b | Phenotypically distinct subpopulations can emerge within a colony as a result not only of genetic variation but also of epigenetic variation, that is, variation in DNA methylation status. These subpopulations serve as units of adaptive selection and provide a means of population-level flexibility in response to rapidly changing environments.

Methylation detection using nanopore sequencing

SMRT sequencing has been instrumental in enabling the study of bacterial methylomes, but other sequencing technologies, such as those commercialized by Oxford Nanopore Technologies, have the potential to make important contributions to the field of bacterial epigenetics in the near future. Assuming continued maturation of the technology and improvements in the modification detection algorithms, the very long read lengths offered by nanopore sequencing devices may provide single-molecule phased detection of bacterial DNA methylation in samples from a variety of environments. This ability will be helpful for the epigenetic study of heterogeneous bacterial samples, including metagenomic populations, where the study of methylation has so far been limited18,87. The recent use of methylation signatures as discriminative features for metagenomic binning suggests that the applications of methylation detection in long reads extend beyond identifying methylated motifs in bacteria53.

These advances come at a time when the presence and importance of DNA methylation types that have traditionally been recognized only in prokaryotes, such as 6mA, are being investigated in eukaryotes156,157,160. As these epigenetic marks become better understood, it will be interesting to see whether eukaryotic modifications share any functional traits with those found in their prokaryotic ancestors.

Conclusions

The study of bacterial methylomes has been revolutionized by the introduction of technologies capable of detecting 4mC, 5mC and 6mA at a genome-wide scale and single-nucleotide resolution. Application of these new technologies has led to a greater appreciation for the sheer quantity and diversity of methylation systems and their target specificities in bacteria. Deposition of newly discovered MTase genes and their target motifs to community databases such as REBASE19 has created a powerful resource for researchers, providing a catalogue of the RM systems that can act as barriers to efficient transformation. Technological advances have also highlighted hypervariable MTases and their consequences on genome-wide methylation, gene expression and phenotypic plasticity. Given the modern sequencing-based tools at their disposal, researchers are better equipped than ever before to probe the previously hidden epigenetic mechanisms of the bacterial realm.