Transcriptome profiling in the spathe of Anthurium andraeanum ‘Albama’ and its anthocyanin-loss mutant ‘Xueyu’

Li, Zhiying; Wang, Jiabin; Fu, Yunliu; Gao, Yu; Lu, Hunzhen; Xu, Li

doi:10.1038/sdata.2018.247

Download PDF

Data Descriptor
Open access
Published: 13 November 2018

Transcriptome profiling in the spathe of Anthurium andraeanum ‘Albama’ and its anthocyanin-loss mutant ‘Xueyu’

Zhiying Li^1,2,3,4^na1,
Jiabin Wang^1,2,3,4^na1,
Yunliu Fu^1,2,3,4,
Yu Gao^1,5,
Hunzhen Lu^1,5 &
…
Li Xu^1,2,3,4^na1

Scientific Data volume 5, Article number: 180247 (2018) Cite this article

1256 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Anthurium andraeanum is a popular tropical ornamental plant. Its spathes are brilliantly coloured due to variable anthocyanin contents. To examine the mechanisms that control anthocyanin biosynthesis, we sequenced the spathe transcriptomes of ‘Albama’, a red-spathed cultivar of A. andraeanum, and ‘Xueyu’, its anthocyanin-loss mutant. Both long reads and short reads were sequenced. Long read sequencing produced 805,869 raw reads, resulting in 83,073 high-quality transcripts. Short read sequencing produced 347.79 M reads, and the subsequent assembly resulted in 111,674 unigenes. High-quality transcripts and unigenes were quantified using the short reads, and differential expression analysis was performed between ‘Albama’ and ‘Xueyu’. Obtaining high-quality, full-length transcripts enabled the detection of long transcript structures and transcript variants. These data provide a foundation to elucidate the mechanisms regulating the biosynthesis of anthocyanin in A. andraeanum.

Design Type(s)	transcription profiling design • strain comparison design
Measurement Type(s)	transcription profiling assay
Technology Type(s)	RNA sequencing
Factor Type(s)	cultivar
Sample Characteristic(s)	Anthurium andraeanum • bract

Machine-accessible metadata file describing the reported data (ISA-Tab format)

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Single-cell and spatial RNA sequencing reveal the spatiotemporal trajectories of fruit senescence

Article Open access 10 April 2024

The complex polyploid genome architecture of sugarcane

Article Open access 27 March 2024

Background & Summary

Anthurium andraeanum is a popular cut flower and potted plant with a fantastic shape and impressive colours. It is a perennial and evergreen flower that originated in Columbia and Ecuador. The main attraction is its brilliantly coloured heart-shaped spathe and contrasting spadix. The common colours of A. andraeanum include red, pink, orange, white, brown and green. Elibox and Umaharan postulated that three dominant genes, R, O and M, controlled spathe colour. Furthermore, a white anthurium cultivar called ‘Acropolis’ suggested that white phenotypes resulted from regulatory rather than structural mutations^1,2. A somaclonal variant called ‘Xueyu’ was generated during tissue culture of ‘Albama’; this mutant showed anthocyanin loss in the whole plant and a white spathe³.

Anthocyanins are widely found in the flowers, seeds, fruits and vegetative tissues of vascular plants. These soluble flavonoid pigments are responsible for red, blue and orange hues, and they can also participate in defence against a variety of biotic and abiotic stressors in plants. In A. andraeanum, the major colour pigments in the spathe are anthocyanins, particularly cyanidin and pelargonidin derivatives, of which the content and ratio determine the colour and its intensity⁴. The anthocyanin pathway has been extensively studied and is generally conserved over a wide range of plants. Generally, anthocyanin biosynthesis is regulated by the MYB-bHLH-WD40 (MBW) complex⁵. In addition, a complex regulatory network of positive and negative feedback mechanisms controlling anthocyanin synthesis in Arabidopsis has been described⁶. Furthermore, the transport and accumulation of anthocyanins affects the colour phenotypes of plants, but the mechanisms that control transport are unclear. Several anthocyanin pathway genes have been isolated in A. andraeanum. In our previous study, comparative transcriptome analysis was applied to determine the reason for anthocyanin loss in ‘Xueyu’. Moreover, transcriptome analysis was performed on a colour mutant of the anthurium cultivar ‘Sonate’⁷. Although transcriptome information was provided in our previous studies, the mechanisms regulating anthocyanin biosynthesis and spathe colour required further study.

We sequenced 4 cDNA libraries using the Pacific Biosciences RSII platform and 6 libraries using the Illumina HiSeq 4000 to characterize the spathe transcriptomes of ‘Albama’ and ‘Xueyu’ (Table 1). The long read sequencing produced 805,869 reads of insert, which were filtered to obtain 83,073 high-quality transcripts. The short read sequencing produced 347.79 M raw reads, and the results were assembled to yield 111,674 unigenes. The existing information regarding the A. andraeanum genome and transcriptome is limited, and thus, our data provided a valuable overview of additional transcriptome data from two cultivars of A. andraeanum. Moreover, our study identified transcripts differentially expressed between ‘Albama’ and ‘Xueyu’, which may be involved in the regulation of anthocyanin.

Table 1 Metadata of samples submitted to the NCBI Sequence Read Archive.

Full size table

Methods

The A. andraeanum plants were grown in the greenhouse of the Mid Tropical Crop Gene Bank of National Crop Resources located in Danzhou, China. The fully expanded spathes of the cultivars ‘Xueyu’ and ‘Albama’ were sampled. The sequencing work was performed by BGI Life Tech Co., Ltd (Shenzhen, China).

Total RNA extraction was performed using TRIzol (Promega, USA) and DNase I (Takara Bio, Japan). Using a Poly(A)PuristTM Kit (Ambion, now Life Technologies) and oligo-dT beads (Qiagen), the mRNA was isolated. Then the mRNA was fragmented and was used as a template to synthesize cDNA using a PrimeScript 1st Strand cDNA Synthesis Kit (Takara). The cDNA was purified and subjected to end preparation, single nucleotide adenine addition and adaptor ligation. After quality control with an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System, the library was sequenced using Illumina HiSeqTM 4000.

For SMRT Cell libraries construction, first-strand cDNA was synthesized using a SMARTer PCR cDNA Synthesis Kit (Clontech). Phusion High-Fidelity DNA Polymerase (NEB) was used to synthesize second-strand cDNA. The cDNA underwent BluePippin size selection (Sage Science) and then was normalized using the Trimmer-2 cDNA Normalization Kit (Evrogen) and amplified using large-scale PCR. Four fractions with normalized cDNA sizes of <1, 1-2, 2-3, and >3 kb were processed using the DNA Template Prep Kit (Pacific Biosciences of California, Inc.). After V2 primers and SA-DNA polymerase were linked to the templates, the complexes were then bound to magnetic beads for sequencing. Libraries with cDNA sizes <1 and >3 kb were sequenced with two cells, and the other libraries with one cell, using Pacific Bioscience RS II (Pacific Biosciences of California, Inc.).

The classification and filtering of long reads were performed using the SMRT analysis pipeline⁸. The raw long reads were filtered to reads of insert with minimum number of full passes (number of ends of SMRT Cell adapters were observed) of 0 and a minimum accuracy of 0.75. We then filtered the reads to cluster with a minimum length of 300 bp and a minimum phmmer score of 10 to detect the primer. The filtered reads were polished using the ICE algorithm, and the high-quality isoforms had a minimum Quiver⁹ accuracy of 0.99 for the libraries smaller than 3 kb and 0.98 for the libraries larger than 3 kb (Table 2). Then, cd-hit-est was used to remove the redundancy in the high-quality isoforms (Table 3).

Table 2 Summary of long read filtering.

Full size table

Table 3 Cluster of long reads.

Full size table

For the short reads, we removed the noisy reads, which contained adaptors; more than 5% of unknown reads; and those in which the percentage of bases with a quality less than 15 was greater than 50% in a read using Trimmomatic¹⁰ (Table 4). Then, the reads were assembled into unigenes using Trinity¹¹ (Table 5). Gene abundance was estimated by RSEM¹² using the fragments per kb per million fragments (FPKM) method. Then, the differentially expressed genes were detected by NOISeq¹³ with a FDR ≤ 0.001 and fold change ≥ 2.

Table 4 Summary of short read filtering.

Full size table

Table 5 Summary of short read de novo assembly.

Full size table

For functional annotation, the high-quality isoforms and unigenes were blasted against NT, NR, KEGG, COG and Swiss-Prot and subjected to InterProScan 5¹⁴. For the transcripts not mapped to any functional database, we predicted the CDS using ESTScan¹⁵ with Blast-predicted CDS as the model.

These methods above are expanded versions of descriptions in our related work^3,16.

Code availability

Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic (version 0.38)

CD-HIT: http://www.bioinformatics.org/cd-hit/ (version 4.6.6)

Blast2GO: https://www.blast2go.com (version 2.5.0)

InterProScan: http://www.ebi.ac.uk/interpro (version 5.11)

Trinity: https://github.com/trinityrnaseq/trinityrnaseq (version 2.0.6)

Data Records

The sequencing raw data of this study and our previous study³ were deposited in NCBI Sequence Read Archive (Data Citation 1). The project includes reads of insert from the long read sequencing and clean data from the short reads in FASTQ format, of which the four files with accession ID SAMN09296224, SAMN09296225, SAMN09296226 and SAMN09296227 are spathe transcriptome data from our previous study³. After removing of possible vector and NextGen sequencing primers contamination, 110,918 unigenes assembled from short reads were deposited in GenBank database (Data Citation 2). The transcript annotation data were deposited in figshare (Data Citation 3).

Technical Validation

The total RNA used to construct the RNA-seq libraries was analysed, and samples with an RNA integrity number (RIN) more than 9 were used. The 347.79 M raw reads were filtered to 267.71 M clean reads, with a mean ratio of 77.1%. In addition, the short reads were de novo assembled to yield 384,791 unigenes in total; after removing redundancy, we obtained 111,674 unigenes.

Four long read libraries produced a total of 805,869 reads of insert, 387,845 full-length non-chimeric reads and 123,430 reads containing poly-A tails. All reads were clustered into 83,073 high-quality (HQ) transcripts. The length distributions of the HQ transcripts and unigenes are shown in Fig. 1a. The HQ transcripts were also mapped to the unigenes: 53,018 HQ transcripts and 38,348 unigenes shared high similarity (identity > 95%); 27,296 HQ transcripts and 28,991 unigenes showed low similarity; and 2,759 HQ transcripts and 44,335 unigenes had no similarity (Fig. 2b).

**Figure 1: Length distributions of transcripts and CDS.**

**Figure 2: Annotation and Blast results for the HQ transcripts and unigenes.**

The transcripts, including HQ transcripts and unigenes, were mapped to the NR, KEGG, InterPro, COG and Swiss-Prot databases, and 35,744 transcripts could be mapped to all five databases (Fig. 2a). According to the annotations and predictions, 70,603 HQ transcripts and 55,031 de novo-assembled sequences were predicted to contain CDS; the distribution of CDS lengths is shown in Fig. 1b.

We performed differential expression analysis between samples of ‘Xueyu’ and ‘Albama’ of both HQ long reads and unigenes (Fig. 3). The differential expression analysis yielded 1,461 down- and 3,671 up- regulated HQ long reads and 199 down- and 435 upregulated unigenes. The expression and annotation information was deposited in figshare (Data Citation 3).

**Figure 3: Volcano plot of differently expressed genes between ‘Xueyu’ and ‘Albama’.**

Usage Notes

Because no reference genome is available for A. andraeanum, the raw long reads were corrected by clustering with the ICE algorithm. However, high-coverage short reads can also be used to correct errors in the long reads.

In our previous study, we compared the spathe transcriptome of stage 3 (flower protrudes from sheath) and stage 6 (the spathe is fully expanded) between ‘Xueyu’ and ‘Albama’ using Illumina short-read sequencing. To obtain high-quality, full-length transcripts, which enable the detection of long transcript structures and transcript variants, we performed isoform sequencing and Illumina short-read sequencing. The data of this study supplemented the transcripts and expression analysis data of the stage 6 spathe.

Additional information

How to cite this article: Li, Z. et al. Transcriptome profiling in the spathe of Anthurium andraeanum ‘Albama’ and its anthocyanin-loss mutant ‘Xueyu’. Sci. Data. 5:180247 doi: 10.1038/sdata.2018.247 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Elibox, W. & Umaharan, P. Inheritance of major spathe colors in Anthurium andraeanum Hort. is determined by three major genes. Hortscience A Publication of the American Society for Horticultural Science 43, 787–791 (2008).
Google Scholar
Collette, V. E. A molecular analysis of flower color development in an ornamental monocot (Anthurium andraeanum) PhD Diss. Massey University, Palmerston North: New Zealand (2002).
Li, Z. Y., Wang, J. B., Zhang, X. Q. & Xu, L. Comparative transcriptome analysis of Anthurium “Albama” and its anthocyanin-loss mutant. Plos One 10, e0119027 (2015).
Article Google Scholar
Williams, C. A., Harborne, J. B. & Mayo, S. J. Anthocyanin pigments and leaf flavonoids in the family araceae. Phytochemistry 20, 217–234 (1981).
Article CAS Google Scholar
Baudry, A. et al. TT2, TT8, and TTG1 synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant Journal for Cell & Molecular Biology 39, 366 (2004).
Article CAS Google Scholar
Petroni, K . & Tonelli, C. Recent advances on the regulation of anthocyanin synthesis in reproductive organs. Plant Science An International Journal of Experimental Plant Biology 181, 219 (2011).
CAS PubMed Google Scholar
Yuxia, Y . et al. Phenotype and transcriptome analysis reveals chloroplast development and pigment biosynthesis together influenced the leaf color formation in mutants of Anthurium andraeanum ‘Sonate’. Front Plant Sci 6, 139 (2015).
Google Scholar
Gordon, S. P. et al. Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing. Plos One 10, e0132628 (2015).
Article Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Methods in Enzymology 323, 133 (2009).
CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8, 1494–1512 (2013).
Article CAS Google Scholar
Li, B . & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Article CAS Google Scholar
Tarazona, S ., Garcíaalcalde, F ., Dopazo, J ., Ferrer, A . & Conesa, A . Differential expression in RNA-seq: A matter of depth. Genome Research 21, 2213 (2011).
Article CAS Google Scholar
Quevillon, E . et al. InterProScan: protein domains identifier. Nucleic Acids Research 33, W116 (2005).
Article CAS Google Scholar
Iseli, C ., Jongeneel, C. V. & Bucher, P . ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol 99, 138–148 (1999).
Google Scholar
Wang, J. et al. Integrated DNA methylome and transcriptome analysis reveals the ethylene-induced flowering pathway genes in pineapple. Sci Rep 7 (2017).

Data Citations

NCBI Sequence Read Archive SRP128296 (2018)
GenBank GGPS00000000 (2018)
Li, Z. Y. et al. figshare https://doi.org/10.6084/m9.figshare.7012238.v2 (2018)

Download references

Acknowledgements

This work was funded by the Ministry of Agriculture Tropical Species Resource Protection Project (17RZZY-101 and B650) and Innovative Project Funds for CATAS-TCGRI (1630032018010).

Author information

Zhiying Li, Jiabin Wang and Li Xu: These authors contributed equally to this work.

Authors and Affiliations

Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China
Zhiying Li, Jiabin Wang, Yunliu Fu, Yu Gao, Hunzhen Lu & Li Xu
Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China
Zhiying Li, Jiabin Wang, Yunliu Fu & Li Xu
Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
Zhiying Li, Jiabin Wang, Yunliu Fu & Li Xu
Mid Tropical Crop Gene Bank of National Crop Resources, Danzhou, 571700, Hainan, China
Zhiying Li, Jiabin Wang, Yunliu Fu & Li Xu
Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
Yu Gao & Hunzhen Lu

Authors

Zhiying Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiabin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunliu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hunzhen Lu
View author publications
You can also search for this author in PubMed Google Scholar
Li Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.Y.L. and L.X. and J.B.W. conceived and designed the experiments and wrote the paper; J.B.W. and Y.G. and H.Z.L. and Y.L.F. performed the experiments and analysed the data; Y.L.F. and Y.G. and H.Z.L. contributed reagents/materials/analysis tools. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Li Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

ISA-Tab metadata

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

Reprints and permissions

About this article

Cite this article

Li, Z., Wang, J., Fu, Y. et al. Transcriptome profiling in the spathe of Anthurium andraeanum ‘Albama’ and its anthocyanin-loss mutant ‘Xueyu’. Sci Data 5, 180247 (2018). https://doi.org/10.1038/sdata.2018.247

Download citation

Received: 15 June 2018
Accepted: 18 September 2018
Published: 13 November 2018
DOI: https://doi.org/10.1038/sdata.2018.247