Excess folic acid intake increases DNA de novo point mutations

It is well established that folic acid (FA) supplementation can signi ﬁ cantly reduce the risk of birth defects, including neural tube defects (NTDs) 1 and congenital heart defects (CHDs) 2 . More than 80 nations have adop-ted mandatory FA food forti ﬁ cation programs 3 . With additional FA intake from different dietary supplements, a portion of the population is exposed to FA concentrations over the 0.4 mg recommended daily allowance (RDA) 4 . These people include women who had a prior NTD complicated pregnancy and are planning to start a new pregnancy 5 , and men with fertility issues who are treated with high-dose FA (12.5 times of RDA) supplementation to improve sperm counts 6 . However, there is a lack of research on whether excessive FA intake has the potential to harm human beings. Recently, a case with “ pseudo-MTHFR ” syndrome was reported, a wild-type MTHFR Caucasian woman with a history of infertility received a high FA exposure (5 mg/day, 12.5 times of RDA), which resulted in her homocysteine levels being elevated to 17.2 μ mol/L, more than double the reference homocysteine level (7.8 μ mol/L). After treatment with 5-methyltetrahydrofolate

Blood folate levels were measured using a folate competitive analysis as previously reported 1 .
The following day, unbound protein was removed by washing three times with tris-buffered saline (pH 7.6). Ligand in the form of FA is used to compete with labeled (e.g., horse radish peroxidase) folic acid (FA-HRP) for competitive binding to FOLR1. Standard dilution (1:2) of FA (6 to 0.8125ng/mL, 0ng/mL) results in FOLR1-folate binding curves.

DNA extraction
Mouse embryos and parental tail tissues were collected and maintained at -20˚C. Qiagen QIAamp DNA micro-Kit (Cat. No. 56304) was used for DNA extraction with RNase treatment. Qubit assays were performed to validate that all DNA samples met the quality requirements for following sequencing projects.

WGS Data Processing
Extracted genomic DNA from mouse tails or embryos were subjected to WGS on an Illumina NovaSeq. WGS raw FASTQ data (sequencing projects were conducted by Admera Health, LLC) were processed using a pipeline bas ed on GATK Best Practice. Sequence reads were mapped to the reference genome (mm10) with BWA-MEM. After sorting by Samtools, BAM files are further processed with GATK Best Practice workflows, including marking duplicates, realignment of indels, base quality recalibration, as described 2 . Single nucleotide variants and small indels were called with GATK haplotypeCaller.

DNM Calling and Variant Filtering
DNMs were called by TrioDenovo program 3 . An in-software pipeline for filtering DNSNV was used (https://genome.sph.umich.edu/wiki/Triodenovo). We followed the previously welldescribed filtering pipeline for the DNM filtering 4 . The hard filters include: (i) an in-cohort MAF = 0; (ii) a minimum 20 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio in the embryo if alternate allele reads are greater than 10. If alternate reads are less than 10, a minimum 30% ratio is required. (iii) a minimum depth of 15 reference reads and alternative allele < 3.5% in the parents. All the DNMs were annotated by VEP (Ensembl Variant Effect Predictor).

Quality control and validation
All the DNSNVs are then filtered from 'Bamfile' confirmation and filtered with a lowest genotype quality (Qual) > 50. We extracted all of the 'Bamfile' containing the region of each DNSNV and its surrounding 2000bp. We examined each mutation in the three lanes of 'Bamfile' checking process via using Integrative Genomics Viewer. The filters are (i) minimum depth of 15 reference reads and alternative allele < 3.5% in the parents; (ii) mutations in the same read are excluded if it is in the same haplotype. To estimate the true positive rate of our DNSNV dataset, we successfully sequenced 72 DNSNVs, and 70 out of 72 were confirmed by Sanger sequencing (True Positive Rate = 97.2%). Finally, we performed a validation test on our mutation calling pipeline using 'Genome in a bottle' as a reference. Using our pipeline, we obtained a sensitivity of 1039 / (1039 + 5) = 0.9952107, and a specificity of (4121168 -1044 -1) / (4121168 -1044) = 0.9999998.

WGBS data processing
Raw bisulfite sequencing data of mouse embryos is mapped to mm10 mouse reference genome by BSMAP (2.90) with default setting. The contaminated adapters and low-quality bases are removed with the '-q' and '-A' parameters during alignment. We subsequently used the methratio.py script provided by BSMAP to calculate the methylation levels of CpG sites. CpG sites that are covered less than 4 reads are discarded before the downstream analysis.
De novo differentially methylated regions (DMRs) and DNA methylation canyon calling We identify de novo DMRs between two conditions with the 'de novo' mode of Metilene (0.2-7) 5 .
DMRs with mean methylation change greater than 0.2 and q value less than 0.05 are selected for downstream analysis. The identified DMRs are annotated to the nearest gene. KEGG pathway enrichment and gene ontology analyses were performed by using DAVID. For DNA methylation canyon identification, we used the method described by Mira Jeong 6 . Boxplots were used to depict the mean methylation level of the gene regions for FA-deficient, normal, and FA-high groups. The horizontal line of the box refers to the 25th percentiles, median and 75th percentiles, respectively.
Whiskers represents the 1.5X interquartile range. Student's t-test was performed to compare the mean methylation level of gene regions for different FA treatment groups.

Western blotting
Proteins were extracted from NE-4C cells cultured for 72 h in FA control (0.5μΜ) or FA high

Statistical analysis
Differences in DNSNV counts among groups were examined with Wilcoxon signed-rank test or Student's t-test. Fisher's exact test was used to examine the enrichment among different groups.
Statistical analyses were conducted using R Studio 3.5.2. A two-tailed P-value of <0.05 was considered statistical significance. P-adj values were all calculated by FDR correction.

Data access
The raw sequencing data that support the findings of this study were submitted to NIH SRA database (BioProject ID: PRJNA793863 for WGS data and PRJNA791955 for WGBS data).