A chromosome-level genome assembly for the astaxanthin-producing microalga Haematococcus pluvialis

The green microalga Haematococcus pluvialis can synthesize high amounts of astaxanthin, which is a valuable antioxidant that has been utilized in human health, cosmetics, and aquaculture. To illustrate detailed molecular clues to astaxanthin yield, we performed PacBio HIFI along with Hi-C sequencing to construct an improved chromosome-level haplotypic genome assembly with 32 chromosomes and a genome size of 316.0 Mb. Its scaffold N50 (942.6 kb) and contig N50 (304.8 kb) have been upgraded remarkably from our previous genome draft, and a total of 32,416 protein-coding genes were predicted. We also established a high-evidence phylogenetic tree from seven representative algae species, with the main aim to calculate their divergence times and identify expanded/contracted gene families. We also characterized genome-wide localizations on chromosomes of some important genes such as five BKTs (encoding beta-carotene ketolases) that are putatively involved in astaxanthin production. In summary, we reported the first chromosome-scale map of H. pluvialis, which provides a valuable genetic resource for in-depth biomedical investigations on this momentous green alga and commercial astaxanthin bioproduction.

N50 (288.6 kb) and contig N50 (8.4 kb) 6 .Due to fragmental assembly and limited genome resources, details of molecular clues to astaxanthin biosynthesis in H. pluvialis remain elusive.Here, we performed long-read PacBio HIFI and high-resolution chromosome conformation capture (Hi-C) sequencing, with the main aim to obtain a high-quality and chromosome-level genome assembly of H. pluvialis.
Whole-genome sequencing, assembly, and annotation of this economically important microalga were fulfilled with a great improvement.In addition, carotene biosynthetic genes cooperate with β-carotene ketolase (CRTO) and hydroxylase (CRTR-B) to synthesize astaxanthin under high irradiation and salinity stress, which are the most common stressors during cultivation of H. pluvialis [3][4][5] .We therefore conducted additional transcriptome analysis on stressed cells to reveal differential astaxanthin production from certain critical genes, such as those encoding beta-carotene ketolases (BKTs) 6 .In the coming future, these valuable genomic resources will facilitate breeding of novel H. pluvialis strains to obtain higher astaxanthin yield.

Methods
Sample materials, whole genome sequencing and genome assembly.The H. pluvialis strain 192.80 was purchased from the SAG Culture Collection of Algae (Göttingen University, Göttingen, Germany).Microalga cells were cultivated in ESP Ag medium 7 .Genomic DNAs were extracted from cultivated cells using Qiagen GenomicTip100 (Qiagen, Germantown, MD, USA).A routine whole-genome shotgun sequencing strategy was applied.In brief, long SMRTbell libraries were constructed for the HiFi sequencing based on PacBio's standard protocol (Pacific Biosciences, Menlo Park, CA, USA).These libraries were sequenced through a PacBio Sequel II System (Pacific Biosciences).About 1.6 million of consensus reads (26.1 Gb) with a mean length of 16.6 kb were generated.For the Hi-C sequencing, genomic DNAs were fixed with formaldehyde, sheared by a restriction enzyme (MboI) to build a Hi-C library, and then sequenced on a HiSeq-Xten sequencing platform (Illumina Inc., San Diego, CA, USA).A total of 98.0 Gb of 150-bp paired-end Hi-C data were generated.
The sequenced HIFI reads were initially assembled to be contigs by hifiasm (version 0.14-r312) 8 .This primary genome assembly contains 315.1 Mb of contigs, with a N50 length of 304.8 kb (Table 1).Subsequently, we integrated Hi-C data to construct a high-quality de novo assembly at the chromosome level.In brief, quality control was performed to filter the Hi-C raw reads, and then valid Hi-C connected reads were generated by Juicer version 1.5 9 .The 3D de novo assembly (3D-DNA, version 180922) pipeline 10 was employed to link the contig sequences into chromosome-level sequences.Haplotypic 32 chromosomes (chr; Fig. 1b) with a total length of 285.4 Mb (Table 2) were built by using these Hi-C sequences; they account for about 90.3% of the whole genome assembly (Fig. 1c), individually ranging from 1.2 Mb (Chr32) to 21.5 Mb (Chr8) in length (Table 2).The completeness of this final genome assembly was evaluated by Benchmarking Universal Single-Copy Orthologs (BUSCO) 11 v5.2.2, revealing that 93.4% BUSCOs are complete.We also compared this assembly (~300 Mb) with our previous draft assembly 6 (~600 Mb), and found a lot of 1:2 blocks (Fig. 2a) by using the i-ADHoRe v3.0 software 12 .For example, two scaffolds (7 and 81) in the draft genome correspond to the Chr14 in part (see Fig. 2b).

Genome annotation.
Repeat sequences of the assembled H. pluvialis genome were identified by employing several programs including Tandem Repeats Finder 13 , LTR_FINDER 14 , RepeatProteinMask and RepeatMasker 15 .Tandem Repeats Finder 13 was employed to search for tandem repeats using optimized parameters (Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPerid = 2,000).A de novo repeat library was built by the LTR_FINDER (version 1.0.6;parameter: -w 2).RepeatMasker was then utilized to map the H. pluvialis assembly onto the Repbase TE library (version 3.2.9) 16so as to search for known repeat sequences as well as map it onto the de novo repeat libraries to identify novel types of repeat sequences.
We then performed annotation of the H. pluvialis genome assembly with two routine methods, including homology-based and transcriptome-based annotation.Eight representative species, including Chlamydomonas reinhardtii, Paramecium tetraurelia, Saccharomyces cerevisiae, Symbiodinium kawagutii, Symbiodinium minutum, Chlamydomonas eustigma, Chromochloris zofingiensis and Micromonas pusilla, were downloaded from NCBI and were selected for the homology annotation.Their protein sequences were mapped onto the H. pluvialis genome assembly utilizing TblastN 17 with an E-value ≤ 1e −5 .Genewise 2.2.0 18 was subsequently employed to predict gene structures based on these TblastN results.Total RNA was extracted from those control cells (sample ID: LLMT4, 5, and 6; see more details in the followed section on Total RNA Isolation) for subsequent transcriptome sequencing on an Illumina HiSeq4000 platform.These data were mapped onto the assembled genome using HISAT v2.0.4 19 .We then utilized Cufflinks (version 2.2.1) 20 to identify those preliminary genes.Finally, we applied Maker 21 to integrate predicted genes from both annotation procedures.
and NR databases using BlastP with an E-value ≤ 1e −5 .Finally, approximately 97.2% of the predicted genes have at least one functional assignment from these public databases.
Transcriptome analysis.The H. pluvialis strain 192.80 was purchased and cultured by using Bold Basal Medium in 250-mL Erlenmeyer flasks at 22 °C under continuous fluorescent lamps (20 μmol•m −2 •s −1 ) to the logarithmic phase (about 1 × 10 5 cells•mL −1 ) 6 .Algal cells were sub-cultured into 300 mL BBM medium and treated with salicylic acid (SA, 25 mg•L −1 ) and high light (HL, 350 μmol•m −2 •s −1 ).The treatment of salicylic acid and high light was named as SAHL for short.Algal cells after treatments with at 0 h (Control), 1 h (SAHL 1), 6 h (SAHL 6), 12 h (SAHL 12), 24 h (SAHL 24), and 48 h (SAHL 48) were collected and used for RNA-seq sampling, respectively.These transcriptome reads were dowanloaded from previous study 26 with the accession number PRJNA675306.Paired-end raw reads were then processed by removal of adapters and low-quality sequences using SOAPnuke (version 1.5.6) 27with default parameters.These clean data were then mapped onto the assembled genome using HISAT 19 .Transcript quantification of fragments per kilobase per million (FPKM) in each sample was calculated by using Cufflink 28 .Differentially expressed genes (DEGs) between the treatment and control groups were identified using "edgeR" (version 3.15) in the R package 29 with log2 (ratio) ≥ 1 and adjusted P-value ≤ 0.05 as the threshold.Finally, a pathway enrichment analysis was conducted on these up-and down-regulated DEGs according to the KEGG database 25 .
Evolutionary placement of H. pluvialis.A whole-genome phylogenetic analysis on H. pluvialis and other six related microalgae was performed to determine the evolutionary position of H. pluvialis.These examined species, including Chlorella sorokiniana, Chlamydomonas reinhardtii, Cladocopium goreaui, Breviolum minutum, Fugacium kawagutii and Symbiodinium microadriacticum, were downloaded from NCBI.The whole-genome gene sets from H. pluvialis and others were aligned by BLASTp (version 2.2.6) to check homology for generation of a sequence similarity matrix, which was then utilized to identify gene families by using OrthoMCL 30 and Markov Chain Clustering (MCL) with default parameters.We identified single-copy orthologues among the seven species, and then these orthologues were aligned with MUSCLE version 3.7 31 .All alignments were combined to obtain a super alignment sequence.
We first applied the maximum likelihood (ML) method to generate a phylogenetic tree (Fig. 1d), which was implemented in PhyML version 3.0 32 .To confirm this topology, we also utilized Bayesian inference (BI) to establish the same phylogenetic tree by using MrBayes version 3.2.2 33.Meanwhile, we calculated the divergence time by MCMCtree in the PAML package 34 , with calibration references from the TIMETREE 35 .Expanded and contracted gene families were predicted by employing CAFE v4.2.1 36 .In total, 41 single-copy gene families including 287 genes were identified from the seven representative microalgal species.These genes from each species were concatenated together and finally constituted a super-length nucleotide dataset to yield 165,027 aligned sites.Our phylogenetic analysis and divergence results discovered that H. pluvialis is close to C. reinhardtii, and their divergence time is at 520.4 million years ago (Mya).We found 4,592 expanded gene families and 26,300 contracted gene families in the H. pluvialis genome (Fig. 1d).
H. pluvialis is popular for its strong capacity to produce large amounts of astaxanthin that is a strong antioxidant for human health, cosmetics and aquaculture 1 .
Because about 1 kg of dry H. pluvialis cells can produce over 40 g astaxanthin 37 , this species is a great material for production of astaxanthin.Previous study has shown that CRTO and CRTR-b are two key enzymes for the astaxanthin biosynthesis pathway 38 .In this study, three BKT genes were identified to be up-regulated in H. pluvialis cells with diverse stress treatments 39 .However, in our current genome searching, we identified five BKT genes that were distributed in Chr10, Chr26 and scaffold206 (Fig. 3a).More excitingly, a tandem duplication of three BKT genes (BKT1, BKT2 and BKT3) were observed in the Chr26.These BKT genes have similar protein sequences (Figs.3b), 3D structures (Fig. 3c) and gene structures (Fig. 3d), while with potentially different astaxanthin-producing capacity.For example, we qualified BKT3 with the highest transcription level (data not shown) under the SAHLtreatment (see the transcriptome sequencing section and the previous report 26 ).We therefore propose that some of these duplicated BKT genes (BKT1, 2 and 3) are potetntially the major contributors to the rapid synthesis and accumulation of large amounts of astaxanthin.Conversely, the relative C. reinhardtii, without evidence of astaxanthin production, doesn't have any functional BKT gene in its genome.
On the other hand, we compared several other genes from the astaxanthin biosynthesis pathway, and observed that some in H. pluvialis are remarkably expanded than C. reinhardtii.For example, the H. pluvialis genome contains 8 CHYB, 3 PSY, 3 ZDS1, 2 LCYB and 1 PDS genes, corresponding to only one of each in the C. reinhardtii genome; these high copy numbers may provide additional support for the high astaxanthin production in H. pluvialis.
In summary, we reported the first chromosome-level whole-genome assembly for the attractive astaxanthin-producing green microalga H. pluvialis.This genome is a valuable material for deep understanding the molecular clue of algal astaxanthin yield.We found five BKT genes in H. pluvialis genome.These expanded genes may play a key role in the high astaxanthin production.Our genome and transcriptome data sets will facilitate molecular breeding or biosynthesis of novel strains with significantly improved astaxanthin contents.

Data records
The PacBio long reads, Hi-C sequencing reads, and the final genome assembly were deposited at NCBI with the accession number PRJNA964479 40 .The annotation data and protein sequences were deposited at Figshare with doi number https://doi.org/10.6084/m9.figshare.23047088 41.The raw reads were deposited at NCBI Sequence Read Archive with accession number SRR25425436 and SRR25425436 42 .

technical Validation
The extracted DNA quality was examined by using the agarose gel electrophoresis with over 1.8 of the DNA spectrophotometer ratios (260/280) and around 20 kb main band.The Nanodrop ND-1000 spectrophotometer (RIN > 8.0; LabTech, Corinth, MS, USA) was utilized to check the purified RNA quality.The completeness of this H. pluvialis genome was validated by BUSCO v5.2.2.The final BUSCO result showed 93.4% completeness.

Fig. 1 H
Fig. 1 H. pluvialis and its genome.(a) Images of the H. pluvialis life cycle, inclduing macrozooid, palmella, immature aplanospore and aplanospore (from left to right), respectively.(b) The heatmap view of Hi-C result.(c) The circos view of H. pluvialis genome.(d) Phylogeny and gene-family analysis of seven representative microalgae species.

Fig. 2
Fig. 2 Synteny blocks between our current chromosome-level assembly and the previous 600-Mb draft 6 of the H. pluvialis genome.(a) An overview of the total synteny blocks.Only partial blocks are visible due to the fragmental contigs in the previous draft assembly.(b) An example of the solid 1:2 blocks in Chr14.

Fig. 3
Fig. 3 Summary of the five bkt genes in Haematococcus pluvialis.(a) The chromosome location of BKT1-5.(b) Alignment of the five BKT protein sequences.(c) 3D structures of the five BKT proteins.(d) Distribution of exons and introns.

Table 1 .
25KEGG25Summary of the primary genome assembly and annotation.

Table 2 .
Chromosome length and astaxanthin biosynthesis related genes in the assembled H. pluvialis genome.