DNA immunoprecipitation semiconductor sequencing (DIP-SC-seq) as a rapid method to generate genome wide epigenetic signatures

Modification of DNA resulting in 5-methylcytosine (5 mC) or 5-hydroxymethylcytosine (5hmC) has been shown to influence the local chromatin environment and affect transcription. Although recent advances in next generation sequencing technology allow researchers to map epigenetic modifications across the genome, such experiments are often time-consuming and cost prohibitive. Here we present a rapid and cost effective method of generating genome wide DNA modification maps utilising commercially available semiconductor based technology (DNA immunoprecipitation semiconductor sequencing; “DIP-SC-seq”) on the Ion Proton sequencer. Focussing on the 5hmC mark we demonstrate, by directly comparing with alternative sequencing strategies, that this platform can successfully generate genome wide 5hmC patterns from as little as 500 ng of genomic DNA in less than 4 days. Such a method can therefore facilitate the rapid generation of multiple genome wide epigenetic datasets.


Supplemental information
Single stranded DNA fragments with specific sequencing adapters are hybridised to a microwell late ("P1 chip") prior to sequencing to create a "template strand". The reaction chamber is then sequentially flooded with a single deoxyribonucleotide triphosphate (dNTP) in the presence of a DNA polymerase. If this dNTP is complimentary to the leading base on the template DNA strand it is incorporated on the complimentary strand and a single hydrogen ion is released which is read by an ion-sensitive field-effect transistor (ISFET) sensor and the reaction continues. If several dNTP molecules are incorporated in a single cycle (i.e if there is a series of particular bases on the template stand) this will lead to the release of multiple hydrogen ions which is read as a proportionally higher electronic signal by the ISFET. Typically 5µg of starting material is used, however lower amounts ~500ng can also be employed.
Depending on the amount of DNA used the sonication conditions must be optimised. These will also differ depending on the device used to fragment the DNA. In our hands we have used the Diagenode Bioruptor to sonicate 20 µg of DNA in 400 µl TE. Genomic DNA (gDNA) is sheared to produced fragments ranging from 150bp-500bp with the majority of fragments 250-300bp. This is confirmed by electrophoresing 500ng out on a 1.5% agarose gel and post-staining with ethidium bromide.
5µg of fragmented gDNA is then taken and denatured at 100 ○ C for 10 minutes (to subsequently allow the antibody to bind). At this stage 10% of the sample is then removed, cleaned up using DNA Clean & Concentrator™ kit (Zymo Research) and stored to be sequenced as the input. The DNA is now ready to be used for the analysis of either 5mC, or 5hmC profiles. For 5hmC analysis, the remaining sample is enriched for 5hmC through antibody enrichment following protocols outlined by [1]. In short, this relies on the binding of a specific 5hmC antibody (Active motif rabbit polyclonal against hydroxymethylation cat#39769) prior to enrichment through magnetic IgG beads (Dynabeads protein G (Invitrogen #100-03D). Following a series of buffer washes the DNA is then released through proteinase K digestion of the antibody and the DNA is subsequently cleaned up using Qiagen QIAquick PCR purification kits (Qiagen) and eluting in 20ul purified water.

ii. Amplification
Following enrichment, 10µl of the purified DNA is taken for whole genome amplification (WGA) using an enhanced amplification kit optimised for next generational sequencing (Sigma-Aldrich SeqPlex DNA Amplification Kit). The remaining 10µl is stored for later qPCR validation. WGA is carried out as per manufacturer's instructions with the exception of the number of cycles applied during amplification. As we wish to limit overall amplification but to obtain sufficient double stranded DNA from the single stranded fragments enriched during hmeDIP, we limit the amplification to only 10 cycles.

iv. Bioinformatic processing
Reads generated by semiconductor sequencing were mapped to the mouse reference genome (mm10) using the Ion Torrent Suite software version 4.0.2, which utilizes the integrated TMAP aligner to produce a binary alignment map (BAM) file. BAM files were then converted first to BED files ("BAM to BED") and then into wiggle files ("In2Wig") using on our local GALAXY server.
Following this we removed any small reads from the dataset as these may be incorrectly annotated due to the lower mapping accuracy scores (filter all reads <50bp in length). In order to directly compare multiple datasets and to smooth out spurious reads we then carry out a sliding window analysis across the entire genome, calculating the average DNA modification scores in a nonoverlapping set window size (Fig 4d). For the majority of epigenetic studies windows of 200bp are sufficient, however for higher density plots windows of 100bp can be applied, but will lead to a significant increase in the final size of the dataset. The end result is a series of genomic coordinates, each 200bp long containing levels of a given DNA modification across the genome. Due to the identical genomic coordinates they are therefore able to be directly interrogated by both Intra-chip (compare IP to input) and inter-chip (compare two IPs on separate chips) analyses.
Normalisation of the datasets can then be carried out. In our study we carried out read length normalisation to account for sequencing reactions which differed in efficiency, although other methods (such as quantile normalisation approaches), may also be used. As the IP and input are both sequenced on the same chip, internal normalisation can then be applied to each dataset. Much like a microarray this allows the user to subtract the values sequenced in the input from the IP, thus removing the background sequencing/DIP noise which allows for more accurate intra-array studies.
As such positive read scores represent a true enrichment over input reads.
Resulting wiggle files were then visualised using software such as the Integrative Genomics Viewer (IGV) found at http://www.broadinstitute.org/software/igv/download. Mm10 derived datasets were also lifted over to mm9 builds to compare directly to published datasets.

v. Bioinformatic analysis
Following preparation and normalisation of the datasets we set out to analyse the files with respect to other sets of published mouse liver 5hmc data -both high resolution quarter genome hmeDIP microarrays produced by ourselves [2] as well as affinity purified and Illumina Hi-seq sequenced 5hmC datasets produced by others [3]. These datasets were first processed so that they were in a similar 200bp window format as our hmeDIP-SC-seq data (see above). In studies whereby we compared the hmeDIP-SC-seq data to microarrays the datasets were restricted to the regions covered by the array. Scatter plots of 5hmC levels at 500,000 random 200bp windows were drawn using the plot & smooth scatter functions in R for biological replicate hmeDIP microarray or hmeDIP-SC-seq whereby R2 values were calculated (Fig 3a). A Pearson's correlation matrix with clustering was then carried out between the microarray datasets (n=2), affinity hmeDIP Illumina seq dataset (n=1), a hmeDIP Hiseq dataset (n=1), the hmeDIP-SC-seq datasets (n=2) and matched input SC-seq datasets (n=2).
Average patterns of 5hmC were plotted across a length normalised gene set for the affinity Illumina seq dataset and the hmeDIP-Hiseq dataset as well as the two hmeDIP-SC-seq datasets. In short gene coordinates were normalised to a percentage length (TSS =0%, TES=100%) and then regions extended to cover 25% upstream and downstream. Genes <2kb in length were excluded from the analysis as were regions which overlapped with a nearby gene. Processed 5hmC patterns (average levels of reads across 200bp windows) were then plotted using the "sliding window over length normalised features" on our local GALAXY server, essentially plotting average patterns across and around genic portions of the genome. Following this analysis we identify "peaks " of 5hmC in the three 5hmC datasets by selecting windows which are above the 95th percentile of 5hmC values (the top 5% of the values) in the dataset for at least 3 consecutive probes (3x200bp windows=600bp). We then map these peaks to one of 5 genomic compartments (promoter core: TSS+/-100bp, promoter proximal: TSS +1kb, promoter distal: TSS+1kb to +2kb, Intra-genic: windows found uniquely within genic regions, intra-genic: windows found uniquely out with the aforementioned 4 regions) to test for similarities in genomic distributions of 5hmC between the two techniques.
Comparisons of the 5hmC patterns generated by hmeDIP-SC-seq were carried out to published Nimblegen deluxe 2.1M promoter microarrays for a mouse liver in which the mouse had been exposed to the non-genotoxic carcinogen , phenobarbital (PB) for 91days (for more info see 5 ) where data was taken from GEO GSE40540.

Glucosylation mediated restriction enzyme sensitive qPCR (gRES-qPCR)
The EpiMark kit (NEB) was used to quantify relative levels of 5hmC and 5mC at select loci in mouse brain and liver DNA. All data was scaled at each locus so that the total % of marks = 100 and as such only relative and not absolute levels of each mark to be calculated. For the full protocol see the manufacturer's instructions. Typically, 10µg of genomic DNA was taken and half treated with T4phage β-glucosyltransferase (BGT) for 12-16 hours at 37°C. Both the BGT treated and untreated samples were then divided into three PCR tubes and digested with either MspI, HpaII or left uncut for a further 12-16 hours at 37°C. Samples were proteinase K treated for 10 minutes at 40°C prior to dilution to 100µl final volume in H20 and heating to 95°C for 5 minutes. qPCR was carried out on 5 µl (~0.8µg DNA) of each sample on a Roche LightCycler 480 PCR machine. Relative enrichments of the modifications were then calculated following formulae provided by NEB.

Animal treatment and sample preparation
All animal procedures and Phenobarbital treatment were according to Thomson et al 6 .

Reprogramming of MEFS
Conversion to induced pluripotent stem cells was carried out on mouse embryonic fibroblasts by established procedures 7 .