FecalSeq: methylation-based enrichment for noninvasive population genomics from feces

Obtaining high-quality samples from wild animals is a major obstacle for genomic studies of many taxa, particular at the population level, as collection methods for such samples are typically invasive. DNA from feces is easy to obtain noninvasively, but is dominated by a preponderance of bacterial and other non-host DNA. Because next-generation sequencing technology sequences DNA largely indiscriminately, the high proportion of exogenous DNA drastically reduces the efficiency of high-throughput sequencing for host animal genomics. In order to address this issue, we developed an inexpensive methylation-based capture method for enriching host DNA from noninvasively obtained fecal DNA samples. Our method exploits natural differences in CpG-methylation density between vertebrate and bacterial genomes to preferentially bind and isolate host DNA from majority-bacterial fecal DNA samples. We demonstrate that the enrichment is robust, efficient, and compatible with downstream library preparation methods useful for population studies (e.g., RADseq). Compared to other enrichment strategies, our method is quick and inexpensive, adding only a negligible cost to sample preparation for research that is often severely constrained by budgetary limitations. In combination with downstream methods such as RADseq, our approach allows for cost-effective and customizable genomic-scale genotyping that was previously feasible in practice only with invasive samples. Because feces are widely available and convenient to collect, our method empowers researchers to explore genomic-scale population-level questions in organisms for which invasive sampling is challenging or undesirable.


INTRODUCTION
The past decade has witnessed a rapid transformation of biological studies with the continuing development and implementation of massively parallel sequencing technology. This sequencing revolution, however, has thus far had a relatively muted impact on studies of wild nonmodel organisms due largely to the difficulty of obtaining * These authors contributed equally to this work. Corresponding authors: kenneth.chiou@wustl.edu and cxb585@psu.edu. high-quality samples. This problem is particularly salient for endangered animals, cryptic animals, or animals for which it is otherwise difficult, undesirable, or unethical to obtain samples invasively.
Field researchers working with nonmodel animals have explored several noninvasive sample types for DNA analysis including feces, hair, urine, saliva, feathers, skin, and nails (Kohn and Wayne 1997). Of these, feces may be the most readily available in many taxa (Putman 1984). Indeed, since PCR amplification of DNA from feces was first demonstrated in the 1990s (Höss et al. 1992), noninvasive genetic studies from feces have revolutionized our understanding of the evolution, population structure, phylogeography, and behavior of nonmodel organisms.
PCR amplification, however, is effective only for short sequences of DNA. The ability to generate cost-effective genomic-scale data of animals from feces using massively parallel sequencing would therefore constitute an important methodological advance towards bringing nonmodel organism studies into the genomic age.
Feces presents significant challenges for genetic analysis. DNA in feces is often fragmented and low in quantity. Fecal DNA extractions are further characterized by a frequent presence of co-extracted PCR inhibitors, sometimes complicating PCR detection of genotypes (Kohn and Wayne 1997), particularly with long amplicons. Finally, endogenous (host) DNA in feces constitutes a very low proportion, typically less than 5% (Qin et al. 2010;Perry et al. 2010;Snyder-Mackler et al. 2016), of total fecal DNA. Instead, fecal DNA contains a preponderance of DNA from exogenous (non-host) sources such as gut microbes, digesta, intestinal parasites, coprophagous animals, and other environmental organisms. Gut bacteria pose a particular challenge as they account for the highest proportion of DNA in feces (Perry et al. 2010;Qin et al. 2010).
Because of the high representation of exogenous DNA in feces, shotgun sequencing of fecal DNA would yield only a small proportion of reads matching the host genome. For genomic studies of host organisms, particularly those targeting populations, this represents a crippling obstacle in the presence of typical financial constraints. Without an effective enrichment procedure, sequencing of fecal DNA would be less efficient than that of invasively obtained "high-quality" DNA by at least one order of magnitude regardless of improvements in sequencing throughput or cost.
Attempts to enrich host DNA from feces for genomic analysis (Perry et al. 2010;Snyder-Mackler et al. 2016) have thus far employed targeted sequence capture methodologies. Sequence capture, like PCR, enriches DNA based on sequence specificity but unlike traditional PCR can work at any scale from a single locus (Whitney et al. 2004) to a whole genome (Melnikov et al. 2011;Carpenter et al. 2013;Snyder-Mackler et al. 2016).
This method involves hybridizing DNA or RNA "baits," either affixed to an array (Albert et al. 2007;Okou et al. 2007) or to magnetic beads in solution (Gnirke et al. 2009), to a mixture of target and nontarget sequences, thereby capturing targeted DNA from the mixture. Sequence capture has been used for instance to enrich human exomes (Ng et al. 2009), reduced-representation genomes (Suchan et al. 2016;Ali et al. 2016;Hoffberg et al. 2016), host DNA from ancient or museum specimens (Maricic et al. 2010;Carpenter et al. 2013;Mason et al. 2011;Bi et al. 2013), and pathogen genomes from human clinical samples (Melnikov et al. 2011). While the cost of custom oligonucleotide bait synthesis remains high, methods for transcribing custom baits from existing DNA templates (Melnikov et al. 2011;Carpenter et al. 2013) have driven costs significantly down, increasing sequence capture's appeal. Perry et al. (2010) first successfully enriched host DNA from feces at the genomic scale. Using a modified sequence capture employing custom-synthesized baits, they were able to highly enrich 1.5 megabases of chromosome 21, the X chromosome, and the mitochondrial genome from fecal samples of 6 captive chimpanzees.
Their protocol, however, remains prohibitively expensive for population-level analysis due to the high cost of bait synthesis. More recently, Snyder-Mackler et al. (2016) performed whole-genome capture on fecal DNA, using RNA baits transcribed in vitro from high quality baboon samples to enrich host genomes from 62 wild baboons. Resulting libraries were sequenced to low coverage (mean 0.49×), but nevertheless provided sufficient information for reconstructing pedigree relationships.
Despite these methodological advances, targeted sequence capture has distinct drawbacks. To avoid the high cost of bait synthesis, RNA baits must first be transcribed from high-quality genomic DNA that is consumed by the process, limiting its appeal when working with species for which high-quality DNA is difficult to obtain or in short supply. The processes of both bait generation and hybridization with fecal DNA are labor-intensive and time-consuming, with the hybridization including an incubation step that alone takes 1 -3 days (Snyder-Mackler et al. 2016). Because both RNA baits and the gDNA used to transcribe them are eventually depleted, the composition of RNA baits varies between bait sets, potentially impeding comparison of samples sequenced using different RNA baits and gDNA templates. Trans genomic captures (i.e. capturing DNA using baits from a different species) may complicate enrichment and introduce at least some capture biases (George et al. 2011), which will be a particular impediment for genomic studies for which high-quality DNA from related taxa is not accessible. Sequence capture may also introduce biases toward the capture of of low-complexity, highly repetitive genomic regions, as well as an excess of fragments from the mitochondrial genome (Samuels et al. 2013;Carpenter et al. 2013;Snyder-Mackler et al. 2016).
The present study exploits natural, evolutionarily ancient differences in CpG-methylation densities between vertebrate and bacterial genomes to enrich the host genome from feces, making noninvasive population genomics economically and practically feasible for the first time. This method, which we call FecalSeq, uses methyl-CpGbinding domain (MBD) proteins to selectively bind and isolate DNA with high CpG-methylation density. This enrichment method is inexpensive, de novo, and, crucially, captures target DNA without modification, thereby enabling downstream library preparation techniques including complexity reduction-based sequencing methods such as RADseq. Because of these properties, our method is well-suited for population genomic studies requiring high sequencing coverage, including those of nonmodel organisms for which few resources (e.g., high-quality samples or reference genomes) exist.

RESULTS
Our method is a modification of a previously described technique for enriching the microbiome from vertebrate samples containing a majority of DNA from the host organism (Feehery et al. 2013 While Feehery et al. (2013) developed this method in order to remove contaminating host DNA for analysis of the microbiome, our strategy was to remove the contaminating fecal microbiome for analysis of host DNA. Therefore, after combining DNA with MBD baits, we retained the bound fraction with the goal of optimizing the selective recovery of host DNA (Fig. 1). Because our aim is to genotype populations with high coverage, we used the enriched host DNA to prepare double-digest RADseq libraries (Peterson et al. 2012) though with greater sequencing investment, the method in principle should work equally well for sequencing whole genomes.
To evaluate our approach, we enriched DNA extractions from the feces of 6 captive and 46 wild baboons, which we then used to prepare and sequence ddRADseq libraries. We also prepared ddRADseq libraries from blood-derived genomic DNA of all six captive baboons to facilitate controlled (same-individual) comparisons of blood and fecal libraries. All libraries were sequenced using Illumina sequencing.
Quantitative PCR estimates of starting host DNA proportions in fecal DNA extracts ranged widely, but were substantially lower in samples obtained from the wild (captive samples: mean 5.3%, range <0.01% -17.4%;  Table S5 and Supplemental Table S6).
Using a revised protocol based on our optimization experiments (Supplemental Protocol), we created and sequenced a third library from MBD-enriched fecal DNA. After noting substantial improvements in enrichment, we finally sequenced a fourth library with MBD-enriched fecal DNA from 40 wild baboons.
Despite having similar or even lower starting host DNA proportions, read mapping proportions in the third library were substantially higher than the prior two (mean 49.1%, range 8.9% -75.3%; Fig. S3; Supplemental Overall, the revised protocol produced substantially higher enrichment, measured as fold increases in the proportion of host DNA, particularly for samples with very low starting proportions of host DNA (Fig. 2).While we sometimes were forced to use multiple rounds of extraction, thereby introducing variation in starting host proportions across same-individual trials, the revised protocol nonetheless exhibited robust improvement in read mapping proportions even when starting host proportions were substantially lower.
MBD binding may in principle select for genomic regions with relatively high CpG-methylation density, leading to dropout of other loci. Assessment of the concordance between blood-and feces-derived reads from the same individual was complicated by the correlation in ddRADseq between total reads and expected RADtags recovered and thereby SNPs discovered: a given RADtag is sequenced at a frequency inversely proportional to the deviation of its length from the mean of the size selection. Thus, we had to discern between dropout due to coverage-related stochasticity inherent in ddRADseq (Peterson et al. 2012) and that due to MBD enrichment.
To perform this comparison, we computed the proportion of unique alleles between blood-and feces-derived RADtags from the same individual. For this test, we controlled for variation in sequencing coverage by randomly sampling reads as necessary in order to equalize total coverage among same-individual samples. Allelic dropout due to MBD enrichment would result in a higher proportion of alleles unique to blood-derived libraries relative to feces-derived libraries. We did not find a significant discrepancy (multi-sample-called SNPs: mean proportion unique alleles in blood = 2.3%, mean proportion unique alleles in feces = 2.3%; Wilcoxon signed rank test, p = 0.97; Fig. 3a).
Dropout of entire RADtags is easily detectable given a reference genome or sufficient samples for comparison; dropout of a single allele at heterozygous sites is a more insidious potential bias. Allelic dropout due to MBD enrichment would result in a decrease in heterozygosity in MBD-enriched fecal libraries. Inbreeding co-efficients (F) computed from same-individual RADtags exhibited in some cases higher values for feces-derived samples (Fig. 3b). This difference, however, was not statistically significant (mean F blood = 0.63; mean F feces = 0.71; Wilcoxon signed rank test, p = 0.47), indicating low allelic dropout attributable to the MBD enrichment.
For this test, we also controlled for variation in sequencing coverage as described above.
As investigations of population structure are one potential application of our method, we visualized the wild and captive baboons' identity-by-state via multidimensional scaling (MDS) using PLINK (Purcell et al. 2007;Chang et al. 2015), and confirmed that individuals clustered by their known species or ancestry and that bloodand feces-derived reads from the same individual were close together in the MDS space. 0 out of 5,602, or 0%). Though more work is needed to quantify more exactly the extent and causal factors that lead to missingness, many population genetic analyses are robust to the low level of dropout our analyses reveal in addition to that which is inherent in the RADseq family of techniques (Gautier et al. 2013).

DISCUSSION
Our methylation-based capture method achieves substantial enrichment of host DNA from fecal samples.
Using our revised protocol developed through experimentation, we produced a mean 195-fold enrichment on our final library consisting entirely of fecal DNA obtained noninvasively under remote field conditions, with most samples nearly a decade old. A mean 28.8% of reads mapped to the baboon genome, despite starting with only a mean 0.34% of host DNA. Using fecal and blood DNA obtained from captive animals, we further demonstrate that feces-derived genotyping data following our method are concordant with corresponding data obtained from blood.
Feces are among the most readily accessible sources of information on wild animals (Kohn and Wayne 1997), and are particularly useful for population-level studies or studies of endangered or elusive species for which obtaining high-quality samples is difficult or undesirable. By exploiting methylation differences rather than sequence differences between host and bacterial DNA, FecalSeq is a de novo enrichment strategy that requires neither prior genome sequence knowledge nor the use of high-quality DNA for preparation of capture  Fig. 3: Concordance between blood-and feces-derived genotyping data from the same individuals. Colors symbolize the six captive individuals included in our study. Within these individuals, we did not find significant differences in (a) the proportion of unique alleles or (b) inbreeding coefficients from blood-and feces-derived libraries. The multidimensional scaling plot of identity-by-state shows (c) population structuring concordant with the known ancestry of animals (Supplemental Table S1). Distances between feces-and blood-derived sets of genotypes from the same individual are minimal, indicating that noise added by the enrichment method is dwarfed by the population structure signal in this baboon population dataset.
baits. This results in enrichment which is both inexpensive-we estimate a per-sample enrichment cost of $0.70 (Supplemental Note)-and replicable. The enrichment procedure is also relatively rapid and uncomplicated.
Using a 96-well plate, we performed two sequential rounds of enrichment on all forty samples in our final library within a day (see Supplemental Protocol).
Importantly, FecalSeq is to our knowledge the first genomic-scale fecal DNA enrichment method that is compatible with most downstream library preparation methods for massively parallel sequencing. Through our use of ddRADseq, we demonstrate that our method facilitates low-cost high-capacity genotyping of wild populations without introducing significant bias. Further, because ddRADseq is customizable (Peterson et al. 2012), there is substantial flexibility for researchers to optimize the number of samples and the fraction of the genome sequenced for particular research questions. This is not possible for libraries prepared using targeted sequence capture, which are therefore currently limited mainly to low-coverage analyses at the population level Double-digest RADseq is possible for genotyping species with or without a reference genome. We aligned our sequencing reads to the baboon reference genome for this study, but our approach is likely also applicable to species without a reference genome. In these cases, an additional pre-screening step would be necessary, in which exogenous reads are filtered out through comparison to the nearest available genome, before proceeding to clustering and variant identification as per normal reference-free ddRAD methods We robustly found that sequencing efficiency (percentage of reads assigned to target genome) of MBD- Low starting proportions of host DNA present a challenge not only because they result in lower sequencing efficiency, but also because they correlate with low absolute quantities of DNA belonging to the host organism.
In some cases, particularly in samples collected from wild animals under field conditions, starting proportions of host DNA were so low that only approximately 0.1 ng of target DNA was available in a 1 µg fecal DNA extract. Given the large genome sizes of baboons (approximately 3 Gb) and many other vertebrates, substantial allelic dropout is expected in these cases. Significantly, this challenge cannot be fully addressed by this or any other enrichment method and remains an important consideration for researchers working with feces. It can be minimized, however, by optimizing the enrichment procedures to maximize the recovery of target DNA present in a fecal DNA sample, as well as by increasing the total amount of starting fecal DNA.
Because MBD enrichment partitions DNA based on CpG-methylation density, FecalSeq does not enrich hypomethylated host mitochondrial DNA (Rebelo et al. 2009). While this may be undesirable for studies requiring the matrilineally inherited marker, it also precludes the disproportionately high representation of mitochondrial DNA that is typical in libraries prepared using the targeted sequence capture approach (Perry et al. 2010;Samuels et al. 2013;Carpenter et al. 2013;Snyder-Mackler et al. 2016). FecalSeq may, however, co-enrich nuclear DNA from exogenous eukaryotes such as from plant or animal digesta. Care should therefore be taken to minimize the presence of exogenous eukaryotic tissues or cells, although the degree to which this is a problem in practice is currently unknown. As cell-wall-bound plant cells may be more likely to pass through the digestive tract intact, extraction methods that minimize lysis of cell walls should be preferred. We speculate that prey DNA from carnivorous animals may be more difficult to partition from host DNA.
Since PCR amplification of DNA from feces was first achieved in the 1990s (Höss et al. 1992;Constable et al. 1995;Gerloff et al. 1995), noninvasive genetic studies have revolutionized our understanding of the evolution, ecology, and behavior of nonmodel organisms. By facilitating low-cost genomic-scale sequencing from feces, our method connects a community of field researchers with the benefits of massively parallel sequencing, ushering noninvasive organism studies into the genomic age.

Samples
Blood and fecal samples were collected from six captive baboons (genus Papio) housed at the Southwest National Primate Research Center (SNPRC) at the Texas Biomedical Research Institute. The individuals were of either P. anubis or hybrid ancestry (Supplemental Table S1). All six baboons were fed a diet manufactured by Purina LabDiet ("Monkey Diet 15%") containing 15% minimum crude protein, 4% minimum crude fat, and 10% maximum crude fiber. In separate sedation events, blood and feces were collected from the same individual who was isolated for the duration of the sedation. Following centrifugation, the buffy coat was isolated from blood samples and stored at -80°C. 2 ml of feces were also collected into 8 ml tubes containing 4 ml of RNALater (Ambion). All procedures were conducted under the Texas Biomedical Research Institute IACUC protocol #1403 PC 0. Sedation and blood draws were performed under the supervision of a veterinarian and animals were returned immediately to their enclosures following recovery.
In addition, we collected or obtained fecal samples from 46 wild baboons in Zambia. Samples were collected between 2006 and 2015 from the Luangwa Valley, the Lower Zambezi National Park, Choma, or Kafue National Park and are of P. kindae × P. cynocephalus, P. griseipes, or P. kindae × P. griseipes ancestry (Supplemental Table S1). As with the SNPRC samples, 2 ml of feces were collected into 8 ml tubes containing 4 ml of RNALater.
In contrast to the SNPRC samples, however, these samples were collected noninvasively from unhabituated animals in remote field conditions. Samples therefore could not be attributed to particular animals, although samples were selected to avoid duplication using either field observations or geographic distance. Following collection, samples were stored without refrigeration for 1 -6 months before being frozen at -20°C for longterm storage.
Buffy coat extractions were performed using the QIAamp DNA Blood Mini Kit (Qiagen), following manufacturer's instructions. Fecal extractions were performed using the QIAamp DNA Stool Mini Kit (Qiagen) following manufacturer's instructions for optimizing host DNA yields. DNA concentration and yield were measured using a Qubit dsDNA BR Assay (Life Technologies). In some cases, multiple DNA extractions from the same individuals were necessary when DNA was depleted over the course of this study.
We estimated the proportion of host DNA for each fecal DNA extraction using quantitative PCR (qPCR) by comparing estimates of host DNA concentration obtained by qPCR to estimates of total fecal DNA concentration obtained by Qubit. Amplification was conducted using universal mammalian MYCBP primers ) and evaluated against a standard curve constructed from the liver DNA of an individual baboon. Samples and standards were run in duplicate alongside positive and negative controls (see Supplemental Protocol for full details). MBD2-Fc-bound magnetic beads were prepared according to manufacturer instructions in batches ranging from 40 to 160 µl. For each n µl batch, we prebound 0.1 × n µl MBD2-Fc protein to n µl protein A magnetic beads by incubating the mixture with rotation for 10 min at room temperature. The bound MBD2-Fc magnetic beads were then collected by magnet and washed twice with 1 ml ice-cold 1X bind/wash buffer before being resuspended in n µl ice-cold 1X bind/wash buffer.

DNA enrichment
As a pilot experiment, we prepared two successive libraries, library A and library B, following manufac-  (2) universal bacterial 16S rRNA (16S) (Corless et al. 2000) primers along with standards created from the same respective organisms (experiments and results are described in detail in Supplemental Table   S2).
We We prepared a final library, library E, from independently extracted blood DNA from five SNPRC baboons in order to quantify the stochasticity associated with independent library preparation from independent extracts.
The composition of libraries A-E are described in detail in Supplemental Table S2 and Supplemental Table S3.

Library preparation and sequencing
Library preparation followed standard double-digest restriction site-associated DNA sequencing (ddRADseq) procedures (Peterson et al. 2012) with modifications to accommodate low input as described below.
For all samples, including blood DNA and MBD-enriched fecal DNA, we digested DNA with SphI and MluCI (New England Biolabs), following a ratio of 1 unit of each enzyme per 20 ng of DNA. Enzymes were diluted up to 10X using compatible diluents (New England Biolabs) to facilitate pipetting of small quantities, using an excess of enzyme if necessary to avoid pipetting less than 1 µl of the diluted enzyme mix. As the total amount of post-enrichment fecal DNA is by nature low, we adjusted adapter concentrations in the ligation reaction to~0.1 µM for barcoded P1 and~3 µM for P2, which correspond to excesses of adapters between 1 -2 orders of magnitude. Since adapter-ligated samples are multiplexed into pools in equimolar amounts, we made efforts to combine samples with similar concentrations and enrichment when known. We used the BluePippin (Sage Science) with a 1.5% agarose gel cassette for automated size selection of pooled individuals, with a target of 300 bp (including adapters) and extraction of a "tight" collection range. For PCR amplification, we ran all reactions in quadruplicate to minimize PCR biases and attempted to limit the number of PCR cycles. As the concentration of post-size-selection pools was below the limits of detection without loss of a considerable fraction of the sample, estimation of the required number of PCR cycles was difficult. We therefore iteratively quantified products post-PCR and added cycles as necessary. The total number of PCR cycles per pool is reported in Supplemental

Analysis
We demultiplexed reads by sample and mapped them to the baboon reference genome (papAnu2; Baylor We digested the baboon reference genome in silico, tallied reads within each predicted RADtag, and gathered the following information about each region: length, GC percentage, and CpG count in region ± 5 kb. We also calculated read depth in these simulated RADtags. Distributions of blood and fecal RADtags' length, GC percentage, and local CpG density (Supplemental Fig. S2) were visually inspected for gross distortion due to widespread dropout.
If the fecal enrichment procedure caused widespread allelic dropout, the proportion of alleles unique to the blood samples would be higher than that to the fecal sample. We tallied these unique alleles with VCFtools (Danecek et al. 2011) and tested for an excess with a Wilcoxon signed rank test.
To quantify loss of heterozygosity due to allelic dropout, we computed the inbreeding coefficient, F for all blood-feces pairs with equalized coverage, using both the individually called and multi-sample called SNP sets.
The presence of dropout is expected to inflate F. We tested for differences in paired samples' estimates of F via a Wilcoxon signed rank test. The dataset is not filtered for missingness, so sequencing errors inferred to be true variants may inflate heterozygosity estimates, thus deflating F.
To create a stringently filtered dataset with high genotyping rate, we filtered the multi-sample called SNPs in PLINK (Purcell et al. 2007;Chang et al. 2015), retaining only those genotyped in at least 90% of samples and removing samples with genotypes at fewer than 10% of sites. This filtered set was further pruned for linkage disequilibrium by sliding a window of 50 SNPs across the chromosome and removing one random SNP in each pair with r 2 > 0.5. Using all samples, we performed multidimensional scaling to visualize identity by state (IBS).
Using just the samples that were part of the same-individual blood-feces pairs, we then performed an association test and missingness chi-square test to detect allele frequencies or missingness that correlated with sample type.
We did the same with the unfiltered dataset as well. Though we had few pairs of fecal samples from the same individual, we computed distance between pairs of samples from the same individual using the stringently filtered dataset to compare distance between and within sample types via a Wilcoxon signed rank test.      Artificial "fecal" DNA was prepared by manually mixing DNA samples in controlled proportions.

Extract and prepare DNA samples
While any fecal DNA (fDNA) extraction method should in principle be compatible with the MBD enrichment, methods that maximize the recovery of host DNA are preferable. Bead-beating methods that increase total DNA yield from feces, for example, should be avoided because the mechanical disruption increases the yield of cell-wall-bound DNA (i.e., from bacteria or plants) while fragmenting host DNA.
We suggest aiming for a total yield of 1 µg of DNA for all samples in a maximum volume of 30 µl each, although we have had success with as little as 500 ng (the yield of host DNA is likely more important than the yield of total fDNA). If the volume is greater than 30 µl, the DNA can be concentrated via a bead cleanup (Auxiliary protocol A).
The amount of 1X bind/wash buffer depends on the total volume of MBD beads and the total number of enrichment reactions. MBD beads can be prepared with a maximum volume of 160 µl in a single reaction. As very small volumes (1 -8 µl) of beads are needed for our enrichment method, a single bead preparation reaction is nearly always sufficient. If more beads are needed, increase the number of bead preparation reactions and adjust the volume of 1X bind/wash buffer accordingly. Alternatively, for volumes up to 320 µl, prepare an additional 1 ml of 1X bind/wash buffer per bead preparation reaction and add an extra wash step (see step 14).
2.5 ml of 1X bind/wash buffer are required for a single bead preparation reaction up to 160 µl. Prepare an additional 1.2 ml of 1X bind/wash buffer per enrichment reaction. This number takes into account the volume needed to prepare 2 M NaCl elution buffer in the following step.
Keep 1X bind/wash buffer on ice throughout the MBD bead preparation. For the wash steps following the capture reaction, 1X bind/wash buffer can be at room temperature.
5. Prepare 2 M NaCl elution buffer by diluting 5 M NaCl with 1X bind/wash buffer. 100 µl of 2 M NaCl elution buffer are needed per enrichment reaction.
1X bind/wash buffer has a NaCl concentration of 150 mM. 1 ml of 2 M NaCl elution buffer can be prepared by adding 370 µl of 5 M NaCl with 630 µl of 1X bind/wash buffer.

Preparing MBD beads
6. If preparing 40 µl of MBD beads, add 4 µl of MBD2-Fc protein to 40 µl of protein A magnetic beads in a 1.5 ml microcentrifuge tube. For preparing other volumes (n µl) of MBD beads, add n/10 µl MBD2-Fc protein to n µl of protein A magnetic beads.
As a rule, we do not prepare less than 40 µl of MBD beads due to diminished efficiency of both rotational mixing and magnetic separation at low volumes.
7. Mix the bead-protein mixture by rotating the tube in a rotating mixer for 10 minutes at room temperature.
8. Briefly spin the tube and place on the magnetic rack for 2 -5 minutes until the beads have collected to the wall of the tube and the solution is clear. 9. Carefully remove and discard the supernatant with a pipette without disturbing the beads.
10. Add 1 ml of 1X bind/wash buffer (kept on ice) to the tube to wash the beads. Pipette up and down a few times to mix.
11. Mix the beads by rotating the tube in a rotating mixer for 3 minutes at room temperature.
12. Briefly spin the tube and place on the magnetic rack for 2 -5 minutes until the beads have collected to the wall of the tube and the solution is clear.
13. Carefully remove and discard the supernatant with a pipette without disturbing the beads.
If preparing between 160 µl and 320 µl of beads, repeat steps 10 -13 twice for a total of three washes to ensure the removal of unbound MBD2-Fc protein.
15. Remove the tube from the rack and add n µl (determined in step 6) of 1X bind/wash buffer to resuspend the beads. Mix by pipetting the mixture up and down until the suspension is homogenous.

Capture methylated host DNA
Since reaction volumes are well under 100 µl, multiple enrichment reactions can be processed together in a microplate, with pipetting steps conducted using a multichannel pipettor. Compatible rotating mixers and magnetic separators would also be required. Here, we proceed to describe the capture procedure using a 1.5 ml tube.
The total volume of the capture reaction is an important consideration. We have observed decreased DNA binding efficiency when the concentration of MBD beads or DNA in the capture reaction is low. We therefore recommend maintaining a total reaction volume of approximately 40 µl, as we have experienced consistent success with this volume even when adding as little as 1 µl of MBD beads. Decreasing the reaction volume may result in decreased efficacy of rotational mixing. It is a good idea to keep the volume of all reactions consistent as this facilitates processing of many samples and, if DNA amounts and bead volumes are kept consistent, serves as a control for the effects of bead or DNA concentration on enrichment efficiency. Our subsequent procedures assume a reaction volume of 40 µl (not including MBD beads). If using other reaction volumes, pay particular attention to notes following each step in this section.
16. Aliquot 8 µl of 5X bind/wash buffer to a 1.5 ml microcentrifuge tube For reaction volumes other than 40 µl, tune the volume of 5X bind/wash buffer to maintain 1X concentration and adjust accordingly the volume of DNase-free water added in step 17. The volume of MBD beads should be excluded from this calculation as prepared MBD beads are already at 1X concentration.
We recommend equilibrating 5X bind/wash buffer to room temperature prior to aliquoting for more accurate pipetting.
17. Add up to 30 µl of DNA (prepared in step 1) to the tube. Bring the total volume to 40 µl with DNase-free water.
For reaction volumes other than 40 µl, adjust the volume of DNase-free water added to reach the target volume. Be sure to maintain 1X bind/wash concentration.
18. Add MBD beads to the tube using the volume determined in step 2. Pipette the mixture up and down or swirl a few times to mix.
As an approximate rule and as stated above, add 1 µl of MBD beads for every 6.25 ng of target host DNA in each enrichment reaction. If samples contain less than 6.25 ng of host DNA or if the amount of host DNA is not quantified, add 1 µl of MBD beads.
19. Incubate the reaction for 15 minutes at room temperature with rotation. 20. Following incubation at room temperature, briefly spin the tube and place on the magnetic rack for 5 minutes until the beads have collected to the wall and the solution is clear.

21.
Carefully remove the supernatant with a pipette without disturbing the beads. The supernatant is enriched for microbial DNA and may be saved and purified by bead cleanup (Auxiliary protocol A). Otherwise, discard the supernatant.
22. Add 1 ml of 1 bind/wash buffer (kept at room temperature) to wash the beads.
If processing in a microplate, decrease the volume of wash buffer to 100 µl.
23. Carefully remove and discard the wash buffer with a pipette without disturbing the beads.

24.
Optional. Add 100 µl of 1X bind/wash buffer (kept at room temperature) to the beads. Pipette the mixture up and down a few times to mix.
We have found that an additional wash with 100 µl of 1X bind/wash buffer followed by rotation (steps 24 -27) substantially improved enrichment. To skip this wash, proceed to step 28.
25. Mix the beads by rotating the tube in a rotating mixer for 3 minutes at room temperature.
26. Briefly spin the tube and place on the magnetic rack for 2 -5 minutes until the beads have collected to the wall of the tube and the solution is clear.
27. Carefully remove and discard the supernatant with a pipette without disturbing the beads.

Eluting captured host DNA
The NEBNext Microbiome Enrichment Kit includes an elution protocol for captured DNA that includes digestion of DNA-bound MBD beads with proteinase K and elution with TE buffer. We have found that elution with 2 M NaCl is just as effective, is less time consuming, and conserves proteinase K. Most importantly, we have found that DNA samples eluted with 2 M NaCl and purified by bead cleanup can be further enriched in a repeat enrichment reaction. DNA samples eluted with proteinase K and TE buffer and purified by bead cleanup in contrast produced miniscule yields following a repeat enrichment reaction.
28. Add 100 µl of 2 M NaCl (prepared in step 5 and kept at room temperature) to the beads. Pipette the mixture up and down a few times to mix.
If large numbers of samples are being processed, considering lowering the elution volume such that the combined volume of DNA and SPRI beads (see Auxiliary protocol A; step 1) does not exceed the capacity of microplate wells and thereby preclude the ability to parallelize bead cleanups.
29. Mix the beads by rotating the tube in a rotating mixer for 3 minutes at room temperature.
30. Briefly spin the tube and place on the magnetic rack for 2 -5 minutes until the beads have collected to the wall of the tube and the solution is clear.
31. Carefully remove the supernatant to a fresh microcentrifuge tube and discard beads.
32. Proceed to bead cleanup to purify sample (Auxiliary protocol A).

Auxiliary protocols Auxiliary protocol A: Bead cleanup
Portions of this protocol are modified from Pacific Biosciences protocol # 001-252-177-03.

Materials and reagents
• Pre-washed magnetic SPRI beads, prepared following  • 70% ethanol, freshly prepared • 1X TE buffer • Magnetic stand 1. Run samples and standards at least in duplicate. We also recommend running a positive and negative control with each set of quantifications.
2. Use primers specific to the analysis a. The proportion of host DNA can be quantified by comparing qPCR results using host-specific primers to the absolute quantification estimated by some independent means (e.g., fluorometer or spectrophotometer). For our baboon DNA quantifications, we use universal mammal primers for the MYCBP (c-myc) gene ): b. Enrichment of DNA captured with MBD beads can be quantified as above using host-specific primers with enriched methylated host DNA. Alternatively, enrichment can be estimated by observing the n-fold decrease in quantified levels from unenriched to enriched samples using the universal 16S rRNA primer (Corless et al. 2000). 1 µl of unenriched DNA can be diluted to the concentration of the enriched sample prior to qPCR to standardize concentrations. Because MBD enrichment can in principle be biased towards densely methylated areas of the host genome, we prefer the latter method for estimating enrichment success.