Transcriptome-wide reprogramming of N6-methyladenosine modification by the mouse microbiome

Dear Editor, Microbiome affects many aspects of human health and disease and elicits a wide range of host responses including remarkable epigenetic changes such as DNA methylation, histone modification and non-coding RNA expression. A still poorly explored area of microbiome-host interaction is the response of host RNA modification. N-methyladenosine (mA) is the most abundant mRNA modification in mammalian cells, occurring at ~3 modified adenosine residues per transcript. The mA mapping and biology have been extensively studied recently. At the physiological level, mA affects embryonic development, circadian clock, immunoresponse, and others. At the cellular and molecular level, mA affects all key aspects of mRNA processing, translation and decay. Importantly, mA is a predominant, transcriptome-wide mark that is responsive to environmental changes; this dynamic mA pattern is maintained by the writer enzyme complex containing the METTL3 and METTL14 proteins, and two eraser enzymes of FTO and ALKBH5. We investigated the host response marked by mA in the transcriptome to the presence of microbiome in mice (Fig. 1a). We employed one group of germ-free (GF) mice to identify the host response to the absence, and the other group of specific pathogen-free (SPF) mice to identify the host response to the presence of microbiome. We validated the absence of gut microbiota in our GF mice by PCR of the representative 16S genes (Supplementary information, Fig. S1a). 16S rRNA gene amplicon sequencing of the SPF mice showed that all three mice in this group had similar bacterial compositions at the genus level, which were mainly blautia and roseburia (Supplementary information, Fig. S1b). We harvested three tissues of GF and SPF mice of the same genetic background at 4 weeks of age, brain, intestine, and liver, and performed mA analysis in polyA-selected RNA by liquid chromatography/mass spectrometry (LC/MS) to determine the total mA/A ratios and by the mA-MeRIP sequencing to determine the transcriptomic mA pattern and distribution. These three tissues were selected based on their pervasive studies in the literature on the GF and SPF mouse physiology. The mA/A ratios of the polyA-selected RNA are in the expected range of 0.2%–0.6%; brain showed the highest mA content for both GF and SPF mice, and brain and intestine showed higher mA content in the GF mice (Fig. 1b). The polyA-selected RNA in kidney also showed higher mA content in the GF mice (Supplementary information, Fig. S2a). The higher mA content in the brain tissue was also observed in GF and SPF mice that were 10 weeks old (Supplementary information, Fig. S2b) and even 2 years old (Supplementary information, Fig. S2c). Our mA-MeRIP results of all three tissues (Supplementary information, Table S1) showed the well-known mA pattern across the mRNA transcripts such as the strong enrichment of mA peaks at the junction of coding region (CDS) and 3′ UTR (Fig. 1c). We identified the mAcontaining transcripts that were present in all three GF or SPF mouse groups as “high confidence” data and used only these for further analysis (Supplementary information, Fig. S3). We recovered the known mA installation consensus sequence, RRACH (R = A/G, H= A/C/U) among the mA peaks with a preference of guanosine 5′ to the mA site (Fig. 1d). We validated our sequencing results by quantitative RT-PCR of specific transcripts (Supplementary information, Fig. S4a). Our sequencing result was also consistent with the expected mRNA expression difference of GF versus SPF mouse reported in the literature (Supplementary information, Fig. S4b). These results validated the high-quality nature of our mA-MeRIP data. We identified several differences in mA patterns between the GF and SPF tissues. First, the mA peak distributions had distinct shapes among these tissues (Fig. 1c). When benchmarked against the mA cluster near the stop codon, the GF brain mA occurrence was higher in the CDS region compared to the SPF brain. Second, the mA installation consensus sequences in GF and SPF tissues were deviated in brain, but identical in intestine and liver (Fig. 1d). Third, only 25% of the mA peaks in the GF brain overlapped with those in SPF brain, whereas > 59% of the mA peaks overlapped in GF and SPF intestine and liver (Fig. 1e). The large brain mA peak differences was also shown by principal component analysis and their statistical significance in gene ontology analysis across all tissues (Fig. 1f and Supplementary information, Fig. S5), and was not derived from global transcript expression differences (Supplementary information, Fig. S6). On the other hand, SPF brain may have higher mA modification fraction for some common mA peaks (Supplementary information, Fig. S7). All together, these results indicate that the presence of microbiome has a profound influence on the cellular mRNA mA patterns in a tissuedependent manner. Alteration of the mA pattern is most pronounced in the brain, where the mA methylome is substantially reduced in the presence of microbiome. We performed in-depth analysis for the GF and SPF brain tissues to further elucidate the mA alteration in the mRNA (a representative read coverage plot shown in Fig. 1g). Among the 8643 and 2750mA-containing transcripts, 67 and 80% had only one mA peak in the GF and SPF brains, respectively, and the GF/ SPF transcript ratio was 2.6 (Fig. 1h). However, this GF/SPF ratio for the transcripts containing two or more mA peaks steadily increased to 4.4 for two, 6.7 for three, and 9.2 for more than three mA peaks. GF brain transcripts also had a broader distribution of mA peak/exon ratios (Fig. 1i). These results indicate that more mA clusters are present in individual GF than SPF brain transcripts. The abundance of the mA-containing transcripts was lower in GF than SPF brain (Fig. 1j), which might be associated with a major known role of mA in accelerating mRNA decay. More mA peaks were present in all three mRNA regions in GF than SPF brain (Fig. 1k). The mA location in different mRNA regions has been associated with different functions. For example,

addition of 200 μl of 5 M NaCl to the fecal mixture and incubation on ice for 10 min, the samples were centrifuged at 16,000× g for 15 min. Fecal DNAs were extracted from the supernatants by phenol/chloroform/isoamyl alcohol (25:24:1) followed by ethanol precipitation.
The genomic DNA was resuspended in 100 μl nuclease free water, the concentration of each DNA extract was determined spectrophotometrically at 260 nm using a NanoDrop 2000 Spectrophotometer (Thermo Scientific) and stored at -80 °C until used. The condition of GF mice was confirmed by 16S rRNA PCR amplification using two sets of universal bacterial paired-end amplicon sequences. We removed low-quality reads from the raw sequencing results using illumina-utils 1 (available from https://github.com/merenlab/illumina-utils). We then used the program 'iu-merge-pairs' with default parameters, which merged partially overlapping paired-end Illumina reads while simultaneously removing any pair with more than 3 mismatches at the overlapped region. For pairs with fewer mismatches, we picked the base to be used in the final merged sequence from the read with the higher Q-score. We used Minimum Entropy Decomposition 2 with default parameters to cluster high-quality 16S rRNA gene amplicons at single base resolution, and GAST 3 to assign taxonomy to each read individually.
RNA isolation and purification. Tissues were homogenized in 2 ml TRIzol with glass beads using homogenizer. Total RNA was isolated from frozen tissues using TRIzol reagent (Ambion) following the manufacturer's protocol. RNA samples were dissolved in RNase-free water. The concentration and purity of total RNA in each sample were quantified spectrophotometrically at 260 nm using a NanoDrop 2000 Spectrophotometer (Thermo Scientific). RNA samples were kept at -80 °C until used. To obtain mRNA, total RNA was purified using PolyATtract® mRNA Isolation System IV (Promega) followed by RiboMinus transcriptome isolation kit for human/mouse (Invitrogen).  (i) RNA-seq analysis: MeRIP-seq reads of each sample were aligned to the mouse reference genome (mm10) using HISAT2 (version 2.1.0) software. For the paired-end 100 nt sequences, default parameter was set for strand-specific mapping mode. The differential expression analysis was based on data from input sample, which can be treated as a traditional RNA-seq dataset. Quantification of the expression of protein-coding genes was calculated by FeatureCount (version 1.6.0) with default parameters. DESeq2 and edgeR package were applied to identify differentially expressed genes, then their common results were considered as the high-confidential DEgenes. Q-value ≤ 0.05 and log2fold change ≥ 1 were set as the thresholds of significance.

LC-MS
(ii) m 6 A peak calling: m 6 A peaks were identified by comparing the reads count between IP and input data in each continuous region in transcript. Generally, mapping results from IP and input were fed into the R package exomePeak (version 2.13.2) to identify the regions enriched of reads from IP sample comparing to input sample. A graphical model-based method is used to calculate the enrichment score, and then normalized to the sequencing depth. Default parameters of exomePeak were used and cutoff P-value was set to 0.05. The differentially methylated m 6 A peaks (diffPeaks) between GF and SPF sample were identified by the R package MeTDiff. P-value was set to 0.05 as the threshold of significance.
The algorism we used above to identify m 6 A peaks considered all 3 replicates for the input and IP samples simultaneously. m 6 A peaks were called only for the significant peaks that passed our statistical definition above. An alternative method is to first consider the input and IP sequencing data for each sample separately, followed by overlapping the three samples to obtain common m 6 A peaks. We also performed this alternative analysis; the result for the brain samples is shown in Fig. S3. In this case, the GF samples showed a 3.6-fold higher number of m 6 A peaks than the SPF samples, which is very similar to the GF/SPF m 6 A peak ratio of 3.8 in the first analysis method (Fig. 1e).
(iii) m 6 A motif search: Bedtools(version 2.26.0) was used to get the sequences of the peak regions; then the Dreme (from the software suite Homer, version 4.12.0) algorism was used to search for the sequence motif which concurrently appeared in multiple regions; the ratio above random distributions was calculated.
(iv) Metagene plot: The metagene plot exhibits the m 6 A enrichment on mRNA. The midpoint position of each peak was extracted to represent the m 6 A site; then the m 6 A sites were mapped to transcripts, and their relative distances to the 5'UTR, CDS, and 3'UTR region of the mRNA were calculated. The cumulated densities of m 6 A sites belonging to each category were plot by a curve with smoothing process.
(v) Gene ontology and Reactome analysis: Functional annotation and enrichment analysis were performed by KOBAS (version 3.0). First, specific gene list such as DEgenes or m 6 A containing genes were generated from other procedures; then the corresponding gene IDs were mapped to Gene Ontology/Reactome Pathway databases. P-value or a probability of enrichment and enrichment score were calculated based on results of multiple statistic methods. P-value < 0.05 was set as the cutoff to obtain significant enrichment results.
(vi) Pairwise comparison of the m 6 A atlas in GF mouse brain to prenatal mouse: The m 6 A epitranscriptome data was downloaded from public database (GSE99017) 6 . The transcripts containing m 6 A peaks were cross-compared between each two samples (GF vs SPF, GF vs Prenatal, SPF vs Prenatal), and the overlap peak counts and specific peak counts in each sample were depicted by Venn diagrams. Statistical Analysis. Data are presented as mean ± SEM. Asterisks (*) represent significant differences between SPF mice and GF mice, as determined by the Student's t-test (P < 0.05).

Data availability:
The RNA-seq and m 6 A-seq data generated by this study have been deposited in the NCBI GEO database under the accession number GSE120262.  Total m 6 A/A ratio of polyA-selected and ribo-minus treated RNA from brain of 10-week-old mice, and of 2-year-old mice (c). SPF brain samples. The algorism we used to identify m 6 A peaks in Fig. 1e considered all 3 replicates for the input and IP samples simultaneously. Shown here is the m 6 A peak overlap using another method that first considers the input and IP data for each sample separately, followed by overlapping the three samples to obtain common m 6 A peaks. The number of common m 6 A peaks according to the simultaneous method was 13,323 for GF and 3,502 for SPF mice (Fig. 1e). The number of common m 6 A peaks according to the individual sample method was 12,054 for GF and 3,368 for SPF mice as shown here. identifies differentially methylated m 6 A peaks, which was calculated by the R package MeTDiff.

Supplemental
The score represents the enrichment of each GO term by statistical calculation.

Brain
Intestine Liver