Palaeogenomics of the Hydrocarbon Producing Microalga Botryococcus braunii

Botryococcus braunii is a colonial microalga that appears early in the fossil record and is a sensitive proxy of environmental and hydroclimatic conditions. Palaeozoic Botryococcus fossils which contribute up to 90% of oil shales and approximately 1% of crude oil, co-localise with diagnostic geolipids from the degradation of source-signature hydrocarbons. However more recent Holocene sediments demonstrate no such association. Consequently, Botryococcus are identified in younger sediments by morphology alone, where potential misclassifications could lead to inaccurate paleoenvironmental reconstructions. Here we show that a combination of flow cytometry and ancient DNA (aDNA) sequencing can unambiguously identify Botryococcus microfossils in Holocene sediments with hitherto unparalleled accuracy and rapidity. The application of aDNA sequencing to microfossils offers a far-reaching opportunity for understanding environmental change in the recent geological record. When allied with other high-resolution palaeoenvironmental information such as aDNA sequencing of humans and megafauna, aDNA from microfossils may allow a deeper and more precise understanding of past environments, ecologies and migrations.

The purification and analysis of ancient DNA (aDNA) from preserved materials has recently transformed our understanding of the phylogenies and migration patterns of extinct megafauna [22][23][24] and humans 25,26 . Microfossils are much more common in sediment cores and widely used as proxies of environmental change but have not yet been subject to targeted genomic analysis. The application of aDNA sequencing to selected microfossils, in this case Botryococcus, offers considerable potential for enhanced analysis of the Holocene record. Moreover, the unambiguous identification of genetic material from Botryococcus fossils within Holocene-age sediments will allow detailed examination of the phylogenetic relationship between extinct and extant Botyrococcus, potentially providing insights to understand why Botryococcus synthesise and excrete their characteristic long-chain hydrocarbons [6][7][8] .
Here we investigate the potential for purifying Botyrococcus microfossils and aDNA from sediments known to contain the microalga. Conventional palaeoenvironmental analysis was used to determine the composition of the sediment in combination with palynological techniques to identify and quantify the putative B. braunii microfossils throughout a sediment core extracted from Boswell Lake, British Columbia, Canada. We first performed a two-dimensional gas chromatography analysis of the hydrocarbons present in the sediment to verify that the Botryococcus microfossils identified using these conventional techniques are co-localised with their associated source signature geolipids, as seen in oil shales. However, no such co-localisation was observed due to the migration and degradation of the diagnostic Botryococcus derived geolipids 27 . As a result, a key identifier of preserved Botryococcus in the Holocene record is unreliable and alternative corroboration is required.
We therefore used flow cytometry (FC) to rapidly distinguish, sort and collect the putative Botryococcus microfossils from defined sediment horizons, as previously performed for pollen grains 28 and diatom frustules 29 . Preserved DNA was extracted from this material and sequenced using next generation Illumina sequencing. Sequences were aligned against a reference B. braunii genome 30 , providing unequivocal, molecular evidence of the identity of the microfossils. The combination of FC purification and DNA sequencing has wider applications to other microfossil species and the interpretation of their fossil record.

site selection
A complete Holocene sediment record was recovered from Boswell Lake (Fig. 1), a carbonate lake located in British Columbia, Canada (52°32′24.72″N 121°27′5.23″W). Boswell Lake is situated in the Cariboo region of the interior of British Columbia and neighbours Quesnel Lake, the third deepest lake in North America 31 . The site was selected because Botryococcus had previously been morphologically identified throughout the sediment record at Boswell Lake, predominantly from 8,400 cal yrs BP. While pristine environments are arguably rare on Earth 32 , Boswell Lake is extremely remote with minimal human influence on the lake or sediments, making it an ideal site for aDNA analysis.

Hydrocarbons
The total concentration of the C 24 -C 34 hydrocarbons produced by B. braunii were normalised to one gramme of sediment from each horizon and used to generate a hydrocarbon profile of the entire sediment core (Fig. 2).
The concentration of C 24 -C 34 hydrocarbons measured in defined horizons through the sediment core of Boswell Lake (Fig. 2) did not correlate to the concentration of B. braunii colonies visually identified within the matching sediment but show a rapid decrease in concentration after the top metre of the core. The geolipids therefore do not correlate with the counts of visually identified B. braunii, which could be attributed to the compression of sediments over time 27 . This confirms that while geolipids may be used to identify the presence of specific organisms or taxonomic groupings within Holocene sediments generally, potential upwards migration and microbial degradation of hydrocarbons suggests that these biomarkers cannot be used to emphatically prove that the visually identified algal subfossils at a specific horizon are indeed B. braunii.

Purification of putative B. braunii colonies using flow cytometry
Flow cytometry was used to sort 10,000 putative Botrycoccus colonies (Fig. 3) from three horizons at, 0 cm (BL 0 ), 35 cm (BL 35 ) and 125 cm (BL 125 ) from the surface. Morphology of the sorted microfossils were verified by light (Fig. 3A,B) and scanning electron microscopy ( Fig. 3C,D) and compared to those of the modern culture of B. braunii culture and published images of fossil Botryococcus (Fig. 3) 33 . The sorted samples from Boswell Lake were morphologically similar to the images of extinct Botroycoccus and extant B. braunii. However, the characteristic geolipid signature and observed microfossils were not present in the same strata as they are in more ancient sediments, therefore calling into question the precise nature of these microfossils. In addition to the microfossils sorted BL 0 , BL 35 and BL 125 sediment samples, control samples comprising 10,000 Pinus pollen grains from the same horizon as BL 0 , 10,000 and 100,000 10 µm calibration beads, were sorted using flow cytometry, to identify any nucleic acid contamination from the sorting procedure.

phylogenetic Assessment
To determine the relationship between the B. braunii identified in the recently incorporated (BL 0 ) Boswell Lake sediment, extant B. braunii and a proven set of selected algal species 2 , a phylogenetic classification was performed using the 18S rRNA gene coding sequence (Fig. 4). The phylogenetic comparison assigned the sequences from the BL 0 horizon to the same branch as that of extant B. braunii and supports the likelihood that the B. braunii identified in the Boswell Lake sediment is indeed B. braunii, and not another colonial microalga. Although DNA was purified and sequenced from BL 35 and BL 125 , 18S rRNA gene sequences could not be identified in the DNA sequence reads from these horizons, potentially due to the sediments' age and DNA degradation. Consequently, we analysed further the aDNA sequence reads from purified microfossils from BL 0 , BL 35 and BL 125 respective to the draft B. braunii genome.

Ancient DNA Analysis
While the morphology of the putative subfossil Botryococcus is comparable to that of extant B. braunii, in the absence of a reliable geolipid profile, molecular verification is necessary to confirm the genus unequivocally. This is the first time a subfossil microalga has been characterised using aDNA at genomic level. DNA was purified from each of the FACS sorted and control samples, DNA sequencing libraries were prepared and sequenced using an Illumina MiSeq (Supplementary Table 1). DNA sequence reads were quality assured before alignment to the genome of B. braunii 30 Table 1). The cytosine to thymine deamination of the 5′ and 3′ termini of the aligned reads was visualised using the mapDamage 35 package within PALEOMIX (Supplementary Fig. 1). There is no deamination observed in BL 0 which is expected for modern samples. The profiles for samples BL 35 and BL 125 display evidence of 5′ and 3′ cytosine to thymine deamination, indicating that the purified DNA originated from ancient samples. The DNA sequence reads from samples BL 0 , BL 35 and BL 125 were aligned using PALEOMIX to genomes of selected Chlorophyta taxa (B. braunii Showa., Chlorella sp., Chlamydomonas reinhardtii, Tetradesmus obliquus, Pediastrum angulosum, Pediastrum duplex and Ostreococcus tauri), to determine the validity of the assignments (Fig. 4). The maximum number of hits for BL 0 , BL 35 and BL 125 was against the B. braunii reference genome, while the number of alignments for BL 35 and BL 125 is inconclusive when compared to the number of alignments against other taxa this analysis is strongly suggestive that these algae microfossils are most likely B. braunii. The DNA sequence reads from BL 0 , BL 35 and BL 125 were also independently aligned to a custom-built database of the same Chlorophyta taxa using Kraken2 36 , which corroborated the analysis performed by PALEOMIX (Fig. 4) In addition to the alignments to the nuclear genome, the DNA sequence reads were compared to the organelle chloroplast (NC_025545.1) and mitochondria (NC_027722.1) genomes of B. braunii as they are available in the NCBI reference database. The increased copy number per cell of these organelle genomes may result in a greater proportion of positive alignments. There were 18 reads from BL 0 that aligned to the mitochondrion genome and 5  reads which aligned to the chloroplast genome. There were no reads which aligned to either organelle for samples BL 35 and BL 125 or any of the controls ( Table 1). The combination of morphological, chemical, genomic and phylogenetic analysis provides compelling evidence that the microfossils identified in Boswell Lake sediment are indeed B. braunii. However, more aDNA is Phylogenetic tree of the 18S rRNA gene coding sequences from Boswell Lake B. braunii and algal taxa generated using PhyML. The bootstrap percentage values are displayed in red for branches that have credibility values above 50%. The number of DNA sequence reads from samples BL 0 , BL 35 and BL 125 which align to these taxa using PALEOMIX and Kraken2 is displayed. required to confirm the phylogenetic assessment of the B. braunii microfossils at Boswell Lake, necessitating fresh sediment samples and a higher number of B. braunii microfossils to be purified.
Here we demonstrate for the first time that it is possible to obtain B. braunii microfossils for aDNA analysis from sediments dating back to at least 8,400 cal yrs BP. Conventional techniques were unable to obtain a single type of microfossil from lacustrine sediments in sufficient quantity and purity for aDNA sequencing. However, using a combination of two high throughput and sensitive techniques we extracted sufficient microfossils from the sediment to purify enough high-quality aDNA for next-generation sequencing analysis. This technological combination provides evidence that the morphologically identified Botryococous are correctly classified as B. braunii, even in the absence of congruent geolipid signatures. Samples of this age are subject to DNA degradation which may explain the decreasing number of DNA sequence reads that align to the modern reference genome (Fig. 4). An alternative interpretation is differences between the DNA sequence reads and the only available reference genome, B. braunii race B, Showa 30 , could account for the observed reduction in reads aligned to the reference genome for samples extracted from BL 35 and BL 125 . Future genomic assembly of the four reported races of extant B. braunii [37][38][39][40] , in addition to a more robust and diverse reference database, would enable an insight into the success of B. braunii hydrocarbon synthesis and secretion over many millennia.
Our results provide supporting evidence for palaeoenvironmental reconstruction. For instance, following the last glacial period, summer temperatures in British Columbia reached a thermal maximum some 10,500 to 8,400 cal yrs BP [41][42][43] . Intriguingly, B. braunii is thought to prevail in equable environments, especially after a wet period 44 . The unequivocal classification of the microfossils as B. braunii therefore confirms that stratigraphy and enables confident paleoenvironmental analysis. Unfortunately, the DNA sequence data acquired in this study cannot be used to perform significant comparisons between the extant B. braunii genome and that of the microfossils which exceed simple identification of the species. However, as more data become available, understanding the molecular evolution of B. braunii, and the divergence of the four extant B. braunii races from a putative ancestor becomes a real possibility. Moreover, the combination of high throughput, purification of targeted microfossils and DNA sequencing developed here can be applied to other microalgal species and subfossils, such as pollen and spores, and enables unprecedented sensitivity for resolving and understanding the species and environments during periods of abrupt and extreme change which prevailed through the Quaternary 45-47 .

Methods
Core extraction and sub sampling. A complete 3.20 m sequence of deposits were taken using a 5 cm diameter Livingstone corer to give minimal sediment disturbance 48 . Cores were recovered in 1 m sections and the corer cleaned between samples. Cores were sealed in the field and transported back to the Quesnel River Research Centre for processing. The sediment core was subsequently split into 1 cm contiguous subsamples for analysis. Core Reconstruction. Samples were prepared for multi-proxy analysis using the standard preparation method for sub-fossil extraction 49 . B. Braunii, Pediastrum sp. and charcoal concentrations were determined by the addition of tablets containing a known concentration of Lycopodium spores to each sample prior to counting and calculated following standard protocols 50 . Organic, Carbonate and Silicate content of the cores was analysed by loss-on-ignition 51,52 .
Radiocarbon Dating. Pollen was concentrated from defined horizons of sediment by flow cytometry 28 .
Plant and animal macrofossils were also dated from horizons which exhibited suitable samples. Samples were dated at either the University of Waikato or the University of Oxford Radiocarbon Accelerator Unit (ORAU) and dates calibrated using Northern Hemisphere IntCal13, which are expressed in calendar years Before Present (CE 1950) 53  Sediment Hydrocarbon Analysis by Two-Dimensional Gas Chromatography. Hydrocarbons were extracted from freeze-dried sediment horizons using a Dionex ASE-200 (California, U.S.A.). A known mass of sediment was placed into an Accelerated Solvent Extractor (ASE) cartridge and processed using 10 ml of dichloromethane at 1500 psi and 150 °C. Solvent extracts were dried under argon to a volume between 1 and 2 ml. All dried extracts were corrected to volume of 2 ml using dichloromethane in a volumetric flask. Prepared samples were analysed using gas chromatography coupled with a flame ionisation detector (Agilent 7890 GC FID, California, U.S.A.). 1 µl of sample was splitless injected. Samples were also analysed of known standards and each sediment preparation for Botryococcus Microfossil extraction. The conventional chemical preparation of sediment samples for the visual identification of subfossil material involves hydrochloric acid (HCl), hydrofluoric acid (HF), acetic acid (CH 3 COOH), sulphuric acid (H 2 SO 4 ), sodium hydroxide (NaOH) and heating the sample to 95 °C. It was considered likely that the exposure of samples to this chemical preparation would have a detrimental effect on the quantity and quality of any DNA preserved within them. Therefore the following preparation method was devised which minimised both the chemical treatment and removed the necessity for heating the sample. 0.5 g of freeze dried sediment was added to a 50 ml centrifuge tube. Approximately 20 ml of 1 M HCl was slowly added until effervescence ceased. The suspension was left loosely capped for 15 min so any residual CO 2 from the reaction could escape. Samples were centrifuged at 3,000 r.c.f. for 10 min. The supernatant was removed and 20 ml of sterile, milli-Q H 2 O was added to the pellet, which was re-suspended by vortex-mixing for 1 min. The centrifugation and wash process was repeated once more. Samples were passed through a 106 µm steel sieve and collected on a 10 µm nylon sieve cloth. The remaining 10 < 106 µm fraction was passed through a 100 µm nylon cell strainer to ensure all particles that were greater than 100 µm (i.e. larger than the nozzle of the flow cytometer) were removed. Samples were deflocculated in a sonicating water bath for 10 min to remove any aggregates.
High-throughput Extraction of Microfossils from Sediments. Prepared samples were analysed by fluorescence using a BD FACS Aria II (Becton Dickenson, USA) equipped with a 100 µm nozzle. Particle forward scatter and side scatter were obtained using a 488 nm laser and the appropriate detectors. Particle fluorescence was excited at 405 nm and at 488 nm, and fluorescence intensity recorded at 530 ± 30 nm and at 585 ± 42 nm respectively. Samples were sorted into autoclaved glass 5 ml vials, which were capped and stored at 4 °C.
Light and scanning electron Microscopy. Micrographs were acquired using either a Leica DCF300FX digital camera coupled to a Leica MZ16F dissecting microscope (436/20 nm excitation 480/40 nm emission, 470/40 nm excitation, 510 nm long pass emission and 525/50 nm emission, 10x zoom) and controlled with Leica FireCam Software or a Leica DM2500 compound microscope coupled with a Q-Imaging Micropublisher 3.3 camera, controlled with Q-Capture software. All prepared samples were imaged prior to analysis by flow cytometry. Selected sorted samples which were to be subjected to SEM imaging were placed onto a Nuclepore TM polycarbonate filter and washed three times with 10 ml milli-Q H 2 O to remove any residue of PBS that would cause salt crystals to form on the Nuclepore TM filter. The resultant filter paper was placed onto an SEM stub and sputter coated (Q150T-ES, Quorum Technologies, UK). The prepared sample was visualised by SEM (JSM-6390LV SEM, JEOL, Japan) and images captured.
DNA Purification. DNA was purified using the FastDNA for Soil DNA Extraction Kit (MPBio, 116560200) following the manufacturer's instructions with a final elution volume of 100 µl. All manipulations were performed inside a class II laminar flow cabinet. Purified DNA was quantified using the Qubit assay (Thermo-Fisher Scientific, Horsham, UK). The quality of the purified DNA was assured using either the 2100 Bioanalyzer (Agilent, U.S.A.) with High Sensitivity DNA Chips (Agilent, U.S.A.), following the manufacturer's instructions or the D1000 Tapestation (Aglient, U.S.A.) with High Sensitivity ScreenTape (Agilent, U.S.A.) following the manufacturer's instructions. Purified DNA was stored at −20 °C until further analysis was performed. preparation of DNA sequencing Library. Purified DNA was fragmented by using a Covaris E220 focused-ultrasonicator (Covaris, Massachusetts, U.S.A.), which was programmed for a target fragment size of 500 bp; 105 W, 5% duty factor, 200 cycles per burst, 80 s treatment time. Fragmented DNA was purified by bead purification using Agencourt AMPure XP beads (Beckman Coulter, California, U.S.A.). DNA sequencing libraries were prepared using the NuGen Ovation Ultralow DR kit (0330-32 NuGen, California, U.S.A.). DNA sequencing. DNA sequencing of the prepared libraries was performed by the University of Exeter sequencing service, who sequenced the DNA libraries using an Illumina MiSeq, using either 250 bp paired end or 300 bp paired end sequencing. Bioinformatic Methods. Sequence data was accessed and all subsequent analysis was performed using a local server containing 32 3.1 GHz CPUs and 256 Gb RAM. The system was installed with Fedora v.21 Linux operating system. DNA sequence Quality Control and Validation. DNA sequence data quality was analysed and visualised using FastQC 54 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. DNA sequences reads which had a Phred score below 20 were removed using Trim-Galore 55 .
phylogenetic Assessment. Quality controlled and verified DNA sequence reads from BL 0 were aligned to the 18S rRNA gene coding sequence from B. braunii Ayamé AJ581910 using bwa mem 56 . A consensus sequence was extracted using samtools 57 and aligned against 18S rRNA gene coding sequences from a robust and published phylogeny of 57 different algal species 2 in addition to Ostreococcus tauri GQ426346, Chlorella vulgaris KX618655, Tetradesmus obliquus KX618656, Pediastrum angulosum AY663032, Pediastrum duplex AY780662, Volvox tertius FJ610144 and Botryococcus braunii KR869723 using MUSCLE 58 . Maximum likelihood analysis was performed using PhyML 3.1 59 , a GTR substitution model with four substitution rates and a gamma shape parameter of 0.477. A total of 100 bootstrap replications were performed. The resultant tree was drawn using iToL v4 60 .