Abstract
Tea is one of the world’s oldest crops and is cultivated to produce beverages with various flavours. Despite advances in sequencing technologies, the genetic mechanisms underlying key agronomic traits of tea remain unclear. In this study, we present a high-quality pangenome of 22 elite cultivars, representing broad genetic diversity in the species. Our analysis reveals that a recent long terminal repeat burst contributed nearly 20% of gene copies, introducing functional genetic variants that affect phenotypes such as leaf colour. Our graphical pangenome improves the efficiency of genome-wide association studies and allows the identification of key genes controlling bud flush timing. We also identified strong correlations between allelic variants and flavour-related chemistries. These findings deepen our understanding of the genetic basis of tea quality and provide valuable genomic resources to facilitate its genomics-assisted breeding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The raw data, genome assemblies and annotation have been submitted to the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under accession number PRJCA013847 for the pangenome samples. All the assemblies, annotations, variant VCF files and graph-based genome are also available at the Tea Graph Pangenome Database (https://www.tea-pangenome.cn/).
Code availability
The PanMarker and SRI approach is freely available at GitHub (https://github.com/chaiyuangungun/PanMarker; https://github.com/Yujiaxin419/SRI-pipeline).
References
Neale, D. B. & Kremer, A. Forest tree genomics: growing resources and applications. Nat. Rev. Genet. 12, 111–122 (2011).
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
Li, H. et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 13, 682 (2022).
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).
Tao, Y. et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 7, 766–773 (2021).
Tang, D. et al. Genome evolution and diversity of wild and cultivated potatoes. Nature 606, 535–541 (2022).
Pastoriza, S. et al. Healthy properties of green and white teas: an update. Food Funct. 8, 2650–2662 (2017).
Kingdom-Ward, F. Does wild tea exist? Nature 165, 297–299 (1950).
Xia, E.-H. et al. Tea plant genomics: achievements, challenges and perspectives. Hortic. Res. 7, 7 (2020).
Tan, L. et al. Genetic analysis of the early bud flush trait of tea plants (Camellia sinensis) in the cultivar ‘Emei Wenchun’ and its open-pollinated offspring. Hortic. Res. 9, uhac086 (2022).
Wang, P. et al. Changes in non-volatile and volatile metabolites associated with heterosis in tea plants (Camellia sinensis). J. Agric. Food Chem. 70, 3067–3078 (2022).
Xia, E. et al. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into its genome evolution and adaptation. Mol. Plant 13, 1013–1026 (2020).
Wang, X. et al. Population sequencing enhances understanding of tea plant evolution. Nat. Commun. 11, 4447 (2020).
Zhang, Q. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Mol. Plant 13, 935–938 (2020).
Zhang, W. et al. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat. Commun. 11, 3719 (2020).
Wang, P. et al. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Hortic. Res. 8, 107 (2021).
Zhang, X. et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 53, 1250–1259 (2021).
Yao, M. et al. Diversity distribution and population structure of tea germplasms in China revealed by EST-SSR markers. Tree Genet. Genomes 8, 205–220 (2012).
Huang, X. et al. The integrated genomics of crop domestication and breeding. Cell 185, 2828–2839 (2022).
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. https://doi.org/10.1093/nar/gky730 (2018).
Xia, E.-H. et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10, 866–877 (2017).
Wang, Y. et al. An ancient whole-genome duplication event and its contribution to flavor compounds in the tea plant (Camellia sinensis). Hortic. Res. 8, 176 (2021).
Fedoroff, N. Transposons and genome evolution in plants. Proc. Natl Acad. Sci. USA 97, 7002–7007 (2000).
Lin, G. et al. Chromosome-level genome assembly of a regenerable maize inbred line A188. Genome Biol. 22, 175 (2021).
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
Zhao, Y. et al. A chromosome-level genome assembly and annotation of the maize elite breeding line Dan340. Gigabyte 2022, gigabyte63 (2022).
Haberer, G. et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 52, 950–957 (2020).
Hu, Y. et al. Genome assembly and population genomic analysis provide insights into the evolution of modern sweet corn. Nat. Commun. 12, 1227 (2021).
Yang, N. et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat. Genet. 51, 1052–1059 (2019).
Springer, N. M. et al. The maize W22 genome provides a foundation for functional genomics and transposon biology. Nat. Genet. 50, 1282–1288 (2018).
Jiao, W.-B. & Schneeberger, K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2020).
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).
Schön, A. et al. The RNA required in the first step of chlorophyll biosynthesis is a chloroplast glutamate tRNA. Nature 322, 281–284 (1986).
Wei, K. et al. A coupled role for CsMYB75 and CsGSTF1 in anthocyanin hyperaccumulation in purple tea. Plant J. 97, 825–840 (2019).
Gonzalez, A. et al. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J. 53, 814–827 (2008).
Qin, F. et al. Arabidopsis DREB2A-interacting proteins function as RING E3 ligases and negatively regulate plant drought stress–responsive gene expression. Plant Cell 20, 1693–1707 (2008).
Sakuma, Y. et al. Dual function of an Arabidopsis transcription factor DREB2A in water-stress-responsive and heat-stress-responsive gene expression. Proc. Natl Acad. Sci. USA 103, 18822–18827 (2006).
He, J. et al. CYP72A enzymes catalyse 13-hydrolyzation of gibberellins. Nat. Plants 5, 1057–1065 (2019).
Yu, X. et al. Metabolite signatures of diverse Camellia sinensis tea populations. Nat. Commun. 11, 5586 (2020).
Gradogna, A. et al. Tonoplast cytochrome b561 is a transmembrane ascorbate‐dependent monodehydroascorbate reductase: functional characterization of electron currents in plant vacuoles. New Phytol. https://doi.org/10.1111/nph.18823 (2023).
Xu, W., Dubos, C. & Lepiniec, L. Transcriptional control of flavonoid biosynthesis by MYB–bHLH–WDR complexes. Trends Plant Sci. 20, 176–185 (2015).
Tan, H. et al. A crucial role of GA-regulated flavonol biosynthesis in root growth of Arabidopsis. Mol. Plant 12, 521–537 (2019).
Yao, S. et al. Insights into acylation mechanisms: co‐expression of serine carboxypeptidase‐like acyltransferases and their non‐catalytic companion paralogs. Plant J. https://doi.org/10.1111/tpj.15782 (2022).
Zhao, M. et al. Sesquiterpene glucosylation mediated by glucosyltransferase UGT91Q2 is involved in the modulation of cold stress tolerance in tea plants. New Phytol. 226, 362–372 (2020).
Jing, T. et al. Herbivore‐induced volatiles influence moth preference by increasing the β‐ocimene emission of neighbouring tea plants. Plant Cell Environ. 44, 3667–3680 (2021).
Chen, J. et al. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Hortic. Res. 7, 63 (2020).
Gong, A. et al. Integrated transcriptomics and metabolomics analysis of catechins, caffeine and theanine biosynthesis in tea plant (Camellia sinensis) over the course of seasons. BMC Plant Biol. 20, 294 (2020).
Huang, F. et al. Metabolomic and transcriptomic analyses reveal a MYB gene, CsAN1, involved in anthocyanins accumulation separation in F1 between ‘Zijuan’ (Camellia sinensis var. assamica) and ‘Fudingdabaicha’ (C. sinensis var. sinensis) tea plants. Front. Plant Sci. 13, 1008588 (2022).
Song, S. et al. An integrated metabolome and transcriptome analysis reveal the regulation mechanisms of flavonoid biosynthesis in a purple tea plant cultivar. Front. Plant Sci. 13, 880227 (2022).
Wu, L. et al. Transcriptomic and translatomic analyses reveal insights into the developmental regulation of secondary metabolism in the young shoots of tea plants (Camellia sinensis L.). J. Agric. Food Chem. 68, 10750–10762 (2020).
He, X. et al. Isolation and characterization of key genes that promote flavonoid accumulation in purple-leaf tea (Camellia sinensis L.). Sci. Rep. 8, 130 (2018).
Mei, Y. et al. Metabolites and transcriptional profiling analysis reveal the molecular mechanisms of the anthocyanin metabolism in the ‘Zijuan’ tea plant (Camellia sinensis var. assamica). J. Agric. Food Chem. 69, 414–427 (2021).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Wang, Y., Li, J. & Paterson, A. H. MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics 29, 1458–1460 (2013).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
O’Donnell, S. & Fischer, G. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36, 3242–3243 (2020).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Enright, A. J. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Zhang, J. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
Yang, Y. & Liang, Y. A Record of Chinese Clonal Tea Varieties (Shanghai Scientific & Technical Publishers, 2014).
Chen, C. & Yu, W. A Map of Tea Varieties in Fujian Province (China Agricultural Science and Technology Press, 2016).
Chen, Z. & Yang, Y. The Chinese Classic of Tea (Shanghai Culture Publishing House, 2011).
Liang, M. & Tian, Y. Tea Germplasm Resources in Yunnan Province (China Agricultural Science and Technology Press, 2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Acknowledgements
This work was supported by the Key-Area Research and Development Program of Guangdong Province (grant no. 2020B020220004), the Shenzhen Science and Technology Program (grant no. RCYX20210706092103024) and a National Natural Science Foundation of China grant (no. 32222019).
Author information
Authors and Affiliations
Contributions
X.Z. designed this project and coordinated the research activities. P.W., S.C., N.Y., H.W., K.F., Q.Z., M.G., C.M. and W.S. collected and provided the plant materials. N.Y., H.W. and K.F. participated in the genome sequencing and resequencing. X.Z., S.C., S.Z., J.Y. and Yibin Wang assembled the genomes. S.C., K.C., W.W., M.J., W.L., S.Q., F.W. and Y.G. performed the gene annotation. S.C., P.W., K.C., Yinghao Wang and W.K. analysed the RNA-seq data. S.C. constructed the sequence- and gene-based pangenome. K.C., S.C. and P.W. analysed the metabolomic data. S.C. and W.K. contributed to the population GWAS analysis. S.C., K.C. and S.Z. developed the PanMarker approach. X.C. constructed the database for the pangenome. X.Z., S.C., P.W. and W.K. interpreted the data and contributed to the manuscript writing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Geographical distribution and sample selection based on phylogeny of 736 tea accessions.
a) Geographic locations of the 736 re-sequencing Camellia samples that are used to query the phylogenetic relationship. These samples were mainly collected from eight tea-cultivated countries, namely China, India, Korea, Japan, Laos, Sri Lanka, Georgia, and Kenya. The world map was constructed using the Python script with the Natural Earth dataset (http://www.naturalearthdata.com). b) Population structure analysis of 736 re-sequencing Camellia accession. The highlighted clades on the phylogenetic tree are the pan genomic individuals. The admixture was estimated using standard error with 2000 bootstrap replicates. ‘SEKJ’ represents the tea plant cultivation areas that are distributed in the southeastern provinces of China, including Zhejiang, Anhui, Henan, Fujian, Taiwan, and countries such as South Korea and Japan. On the other hand. ‘CPC’ represents the tea cultivation areas in the central provinces of China.
Extended Data Fig. 2 Characteristics of tea genome assemblies.
a-c) The contig N50 values are influenced by different factors, including (a) sequencing coverage, (b) genome heterozygosity and (c) length of sequencing reads. The correlation was assessed by calculating the Pearson correlation coefficient. The dot represents each of assembled samples (n = 20, Due to the 20× coverage HiFi reads for ‘HD’, and lack of the raw reads for ‘SCZ’, it has not been included in this statistical analysis). The P-value was calculated from the two-sided t-test. d-f) Genome-wide analysis of chromatin interactions at 150-kbp resolution in ZJ (d), LJ43 (e), and ZYQ (f) genome. g) Syntenic plot among 22 tea assemblies. The chromosomes are represented by colorful boxes, and collinear regions among these genomes are shown as grey blocks.
Extended Data Fig. 3 Recent duplicated genes derived by TEs.
a-b) Whole genome duplication (WGD) events were detected in the tea plant, with species names abbreviated as follows: Nn, Nelumbo nucifera, Vv, Vitis vinifera L, Ach, Actinidia chinensis, Cl, C. lanceoleosa, Cs, C. sinensis. The Ks values between each gene pairs. c). Recent LTR burst detected in tea plant genome. d). A schematic diagram of transposable genes carried by LTRs (long terminal repeats). e) Classification of TE-derived genes. f). Two examples of gene duplication originated from LTR transposition events.
Extended Data Fig. 4 Genome-wide sequence variation of the 22 tea genomes.
a) Assessment of the Synteny relationship using Synteny Relationship Index (SRI) among 7 plant pan-genomes. These species include Camellia sinensis (Cs, TE ratio, 78.2%, n = 22), Solanum tuberosum L (St, 63.0%, n = 44), Solanum lycopersicum (Sly, 60.7%, n = 31), Arabidopsis thaliana (At, 17.6%, n = 8), Sorghum bicolor L (Sb, 61.0%, n = 13), Zea mays (Zm, 83.2%, n = 10), Oryza sativa L(Os, 52.9%, n = 33). Each box plot shows the distribution of data, with the median value represented by the bold line at the center of the box. The box itself represents the first (25%) and third (75%) quartiles. The minimum and maximum values are illustrated by the lower and upper whiskers respectively. b) The proportion of genetic variations between any two samples out of the 22 genomes calculated based on pairwise genomic alignments. The white dot in the center of the violin plot represents the median value, and the bounds of each black box indicate first (25%) and third (75%) quartiles. The lower and upper bounds of the whiskers are the minima and maxima, respectively. c) Distribution of SNPs/Indels identified from the 736 re-sequencing samples and SVs identified from 22 genomes along 15 pseudo-chromosomes. The density of SNPs/indels is represented by the colored bands on these pseudo-chromosomes, while the red lines alongside those pseudo-chromosomes indicate the distribution of SVs. d-e) Pearson correlation coefficients, which show the comparisons between LTR or TIR count per window (10 Mbp window size, 5Mbp step) and SV count per window. For d and e, Total of 572 and 598 windows were plotted, respectively. The P-value was calculated by two-sided t-test.
Extended Data Fig. 5 Concentration of metabolites.
a) Concentration of ChlorophyII (Chl) a, Chl b and Chl a + b in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). b) Concentration of carotenoids in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). c) Concentration of anthocyanins in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). The asterisk represents the statistical significance (two-sided Student’s t-test). Data presented in mean ± SEM, n = 3. Three independent experiments are carried out.
Extended Data Fig. 6 Identification of SVs in six key genes in chlorophyll and carotenoid biosynthesis, namely CYP97A3, CAO, ChID, ChIP, NOL and GluTR.
The rectangles illustrate the physical position of SVs. ‘Others’ represents the accessions ‘ZJ’, ‘FDDB’, and ‘JMZ’. These variations are shown in the sequence alignments.
Extended Data Fig. 7 Expression profiles of genes encoding enzymes involved in anthocyanin biosynthesis.
Enzymes abbreviations in each step are highlighted in bold. The black boxes contain the metabolites responsible for anthocyanins production. The gradient color of heatmap represents gene expression levels across different cultivars.
Extended Data Fig. 8 Read coverage along the promoter region of CsMYB114 gene.
The x-axis indicates the genomic coordinates, and y-axis represents the sequencing depth.
Extended Data Fig. 9 3D protein modeling and haplotype analysis of AFS1.
a) Diagrams illustrate 3D protein modeling of the single amino acid mutation (G- > A) in the CsAFS1 protein. the substrate binding center of the GG-AFS1 type is represented by orange spheres, while the substrate binding center of the AA-AFS1 type is depicted by red spheres. b) Distribution of allelic variants (AA, AG and GG) of CsAFS1 gene using 587 re-sequenced tea accessions. c) Distribution of allelic variants (AG and GG) of AFS1 gene in 17 wild tea relatives.
Supplementary information
Supplementary Information
Supplementary Figs. 1–18, Tables 1–15, Note and references.
Supplementary Data 1
Supplementary Tables 16–23.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, S., Wang, P., Kong, W. et al. Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis. Nat. Plants 9, 1986–1999 (2023). https://doi.org/10.1038/s41477-023-01565-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-023-01565-z
This article is cited by
-
Technology-enabled great leap in deciphering plant genomes
Nature Plants (2024)
-
Teatime for pangenomics
Nature Plants (2023)