Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis

Chen, Shuai; Wang, Pengjie; Kong, Weilong; Chai, Kun; Zhang, Shengcheng; Yu, Jiaxin; Wang, Yibin; Jiang, Mengwei; Lei, Wenlong; Chen, Xiao; Wang, Wenling; Gao, Yingying; Qu, Shenyang; Wang, Fang; Wang, Yinghao; Zhang, Qing; Gu, Mengya; Fang, Kaixing; Ma, Chunlei; Sun, Weijiang; Ye, Naixing; Wu, Hualing; Zhang, Xingtan

doi:10.1038/s41477-023-01565-z

Article
Published: 27 November 2023

Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis

Nature Plants volume 9, pages 1986–1999 (2023)Cite this article

5136 Accesses
4 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Tea is one of the world’s oldest crops and is cultivated to produce beverages with various flavours. Despite advances in sequencing technologies, the genetic mechanisms underlying key agronomic traits of tea remain unclear. In this study, we present a high-quality pangenome of 22 elite cultivars, representing broad genetic diversity in the species. Our analysis reveals that a recent long terminal repeat burst contributed nearly 20% of gene copies, introducing functional genetic variants that affect phenotypes such as leaf colour. Our graphical pangenome improves the efficiency of genome-wide association studies and allows the identification of key genes controlling bud flush timing. We also identified strong correlations between allelic variants and flavour-related chemistries. These findings deepen our understanding of the genetic basis of tea quality and provide valuable genomic resources to facilitate its genomics-assisted breeding.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A total of 22 representative cultivars of tea plant were selected for the pangenome analysis.**

**Fig. 2: Detection of SVs and construction of the pangenome based on the 22 de novo assembled tea genomes.**

**Fig. 3: Functional impact of SVs associated with the regulation of tea leaf colour.**

**Fig. 4: Application of graphical pangenome-based GWAS in mining of key genes associated with the TBF trait in tea.**

**Fig. 5: The pangenome empowers genomics-assisted breeding.**

**Fig. 6: Pangenomic basis of the effect of CsAFS1 expression on α-farnesene content.**

Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties

Article Open access 24 July 2020

Tea plant genomics: achievements, challenges and perspectives

Article Open access 01 January 2020

Genomic predictions and genome-wide association studies based on RAD-seq of quality-related metabolites for the genomics-assisted breeding of tea plants

Article Open access 15 October 2020

Data availability

The raw data, genome assemblies and annotation have been submitted to the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under accession number PRJCA013847 for the pangenome samples. All the assemblies, annotations, variant VCF files and graph-based genome are also available at the Tea Graph Pangenome Database (https://www.tea-pangenome.cn/).

Code availability

The PanMarker and SRI approach is freely available at GitHub (https://github.com/chaiyuangungun/PanMarker; https://github.com/Yujiaxin419/SRI-pipeline).

References

Neale, D. B. & Kremer, A. Forest tree genomics: growing resources and applications. Nat. Rev. Genet. 12, 111–122 (2011).
CAS PubMed Google Scholar
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).
CAS PubMed Google Scholar
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
CAS PubMed PubMed Central Google Scholar
Li, H. et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 13, 682 (2022).
CAS PubMed PubMed Central Google Scholar
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).
CAS PubMed PubMed Central Google Scholar
Tao, Y. et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 7, 766–773 (2021).
CAS PubMed Google Scholar
Tang, D. et al. Genome evolution and diversity of wild and cultivated potatoes. Nature 606, 535–541 (2022).
CAS PubMed PubMed Central Google Scholar
Pastoriza, S. et al. Healthy properties of green and white teas: an update. Food Funct. 8, 2650–2662 (2017).
CAS PubMed Google Scholar
Kingdom-Ward, F. Does wild tea exist? Nature 165, 297–299 (1950).
Google Scholar
Xia, E.-H. et al. Tea plant genomics: achievements, challenges and perspectives. Hortic. Res. 7, 7 (2020).
CAS PubMed PubMed Central Google Scholar
Tan, L. et al. Genetic analysis of the early bud flush trait of tea plants (Camellia sinensis) in the cultivar ‘Emei Wenchun’ and its open-pollinated offspring. Hortic. Res. 9, uhac086 (2022).
PubMed PubMed Central Google Scholar
Wang, P. et al. Changes in non-volatile and volatile metabolites associated with heterosis in tea plants (Camellia sinensis). J. Agric. Food Chem. 70, 3067–3078 (2022).
CAS PubMed Google Scholar
Xia, E. et al. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into its genome evolution and adaptation. Mol. Plant 13, 1013–1026 (2020).
CAS PubMed Google Scholar
Wang, X. et al. Population sequencing enhances understanding of tea plant evolution. Nat. Commun. 11, 4447 (2020).
CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Mol. Plant 13, 935–938 (2020).
CAS PubMed Google Scholar
Zhang, W. et al. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat. Commun. 11, 3719 (2020).
CAS PubMed PubMed Central Google Scholar
Wang, P. et al. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Hortic. Res. 8, 107 (2021).
CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 53, 1250–1259 (2021).
CAS PubMed PubMed Central Google Scholar
Yao, M. et al. Diversity distribution and population structure of tea germplasms in China revealed by EST-SSR markers. Tree Genet. Genomes 8, 205–220 (2012).
Google Scholar
Huang, X. et al. The integrated genomics of crop domestication and breeding. Cell 185, 2828–2839 (2022).
CAS PubMed Google Scholar
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
CAS PubMed PubMed Central Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. https://doi.org/10.1093/nar/gky730 (2018).
Xia, E.-H. et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10, 866–877 (2017).
CAS PubMed Google Scholar
Wang, Y. et al. An ancient whole-genome duplication event and its contribution to flavor compounds in the tea plant (Camellia sinensis). Hortic. Res. 8, 176 (2021).
CAS PubMed PubMed Central Google Scholar
Fedoroff, N. Transposons and genome evolution in plants. Proc. Natl Acad. Sci. USA 97, 7002–7007 (2000).
CAS PubMed PubMed Central Google Scholar
Lin, G. et al. Chromosome-level genome assembly of a regenerable maize inbred line A188. Genome Biol. 22, 175 (2021).
CAS PubMed PubMed Central Google Scholar
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
CAS PubMed PubMed Central Google Scholar
Zhao, Y. et al. A chromosome-level genome assembly and annotation of the maize elite breeding line Dan340. Gigabyte 2022, gigabyte63 (2022).
PubMed PubMed Central Google Scholar
Haberer, G. et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 52, 950–957 (2020).
CAS PubMed PubMed Central Google Scholar
Hu, Y. et al. Genome assembly and population genomic analysis provide insights into the evolution of modern sweet corn. Nat. Commun. 12, 1227 (2021).
CAS PubMed PubMed Central Google Scholar
Yang, N. et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat. Genet. 51, 1052–1059 (2019).
CAS PubMed Google Scholar
Springer, N. M. et al. The maize W22 genome provides a foundation for functional genomics and transposon biology. Nat. Genet. 50, 1282–1288 (2018).
CAS PubMed Google Scholar
Jiao, W.-B. & Schneeberger, K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2020).
CAS PubMed PubMed Central Google Scholar
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).
CAS PubMed PubMed Central Google Scholar
Schön, A. et al. The RNA required in the first step of chlorophyll biosynthesis is a chloroplast glutamate tRNA. Nature 322, 281–284 (1986).
Wei, K. et al. A coupled role for CsMYB75 and CsGSTF1 in anthocyanin hyperaccumulation in purple tea. Plant J. 97, 825–840 (2019).
CAS PubMed Google Scholar
Gonzalez, A. et al. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J. 53, 814–827 (2008).
CAS PubMed Google Scholar
Qin, F. et al. Arabidopsis DREB2A-interacting proteins function as RING E3 ligases and negatively regulate plant drought stress–responsive gene expression. Plant Cell 20, 1693–1707 (2008).
CAS PubMed PubMed Central Google Scholar
Sakuma, Y. et al. Dual function of an Arabidopsis transcription factor DREB2A in water-stress-responsive and heat-stress-responsive gene expression. Proc. Natl Acad. Sci. USA 103, 18822–18827 (2006).
CAS PubMed PubMed Central Google Scholar
He, J. et al. CYP72A enzymes catalyse 13-hydrolyzation of gibberellins. Nat. Plants 5, 1057–1065 (2019).
CAS PubMed PubMed Central Google Scholar
Yu, X. et al. Metabolite signatures of diverse Camellia sinensis tea populations. Nat. Commun. 11, 5586 (2020).
CAS PubMed PubMed Central Google Scholar
Gradogna, A. et al. Tonoplast cytochrome b561 is a transmembrane ascorbate‐dependent monodehydroascorbate reductase: functional characterization of electron currents in plant vacuoles. New Phytol. https://doi.org/10.1111/nph.18823 (2023).
Xu, W., Dubos, C. & Lepiniec, L. Transcriptional control of flavonoid biosynthesis by MYB–bHLH–WDR complexes. Trends Plant Sci. 20, 176–185 (2015).
CAS PubMed Google Scholar
Tan, H. et al. A crucial role of GA-regulated flavonol biosynthesis in root growth of Arabidopsis. Mol. Plant 12, 521–537 (2019).
CAS PubMed Google Scholar
Yao, S. et al. Insights into acylation mechanisms: co‐expression of serine carboxypeptidase‐like acyltransferases and their non‐catalytic companion paralogs. Plant J. https://doi.org/10.1111/tpj.15782 (2022).
Zhao, M. et al. Sesquiterpene glucosylation mediated by glucosyltransferase UGT91Q2 is involved in the modulation of cold stress tolerance in tea plants. New Phytol. 226, 362–372 (2020).
Jing, T. et al. Herbivore‐induced volatiles influence moth preference by increasing the β‐ocimene emission of neighbouring tea plants. Plant Cell Environ. 44, 3667–3680 (2021).
CAS PubMed Google Scholar
Chen, J. et al. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Hortic. Res. 7, 63 (2020).
PubMed PubMed Central Google Scholar
Gong, A. et al. Integrated transcriptomics and metabolomics analysis of catechins, caffeine and theanine biosynthesis in tea plant (Camellia sinensis) over the course of seasons. BMC Plant Biol. 20, 294 (2020).
CAS PubMed PubMed Central Google Scholar
Huang, F. et al. Metabolomic and transcriptomic analyses reveal a MYB gene, CsAN1, involved in anthocyanins accumulation separation in F1 between ‘Zijuan’ (Camellia sinensis var. assamica) and ‘Fudingdabaicha’ (C. sinensis var. sinensis) tea plants. Front. Plant Sci. 13, 1008588 (2022).
PubMed PubMed Central Google Scholar
Song, S. et al. An integrated metabolome and transcriptome analysis reveal the regulation mechanisms of flavonoid biosynthesis in a purple tea plant cultivar. Front. Plant Sci. 13, 880227 (2022).
PubMed PubMed Central Google Scholar
Wu, L. et al. Transcriptomic and translatomic analyses reveal insights into the developmental regulation of secondary metabolism in the young shoots of tea plants (Camellia sinensis L.). J. Agric. Food Chem. 68, 10750–10762 (2020).
CAS PubMed Google Scholar
He, X. et al. Isolation and characterization of key genes that promote flavonoid accumulation in purple-leaf tea (Camellia sinensis L.). Sci. Rep. 8, 130 (2018).
PubMed PubMed Central Google Scholar
Mei, Y. et al. Metabolites and transcriptional profiling analysis reveal the molecular mechanisms of the anthocyanin metabolism in the ‘Zijuan’ tea plant (Camellia sinensis var. assamica). J. Agric. Food Chem. 69, 414–427 (2021).
CAS PubMed Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
PubMed PubMed Central Google Scholar
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
CAS PubMed PubMed Central Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
CAS PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
CAS PubMed PubMed Central Google Scholar
Wang, Y., Li, J. & Paterson, A. H. MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics 29, 1458–1460 (2013).
CAS PubMed Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
CAS PubMed PubMed Central Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).
CAS PubMed PubMed Central Google Scholar
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
CAS PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
PubMed PubMed Central Google Scholar
O’Donnell, S. & Fischer, G. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36, 3242–3243 (2020).
PubMed Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Enright, A. J. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Zhang, J. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
CAS PubMed Google Scholar
Yang, Y. & Liang, Y. A Record of Chinese Clonal Tea Varieties (Shanghai Scientific & Technical Publishers, 2014).
Chen, C. & Yu, W. A Map of Tea Varieties in Fujian Province (China Agricultural Science and Technology Press, 2016).
Chen, Z. & Yang, Y. The Chinese Classic of Tea (Shanghai Culture Publishing House, 2011).
Liang, M. & Tian, Y. Tea Germplasm Resources in Yunnan Province (China Agricultural Science and Technology Press, 2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province (grant no. 2020B020220004), the Shenzhen Science and Technology Program (grant no. RCYX20210706092103024) and a National Natural Science Foundation of China grant (no. 32222019).

Author information

These authors contributed equally: Shuai Chen, Pengjie Wang, Weilong Kong, Kun Chai.

Authors and Affiliations

National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
Shuai Chen, Pengjie Wang, Weilong Kong, Shengcheng Zhang, Jiaxin Yu, Yibin Wang, Xiao Chen, Yingying Gao, Shenyang Qu, Fang Wang, Qing Zhang & Xingtan Zhang
Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
Kun Chai, Mengwei Jiang, Wenlong Lei, Wenling Wang & Yinghao Wang
College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou, China
Mengya Gu, Weijiang Sun & Naixing Ye
Tea Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Provincial Key Laboratory of Tea Plant Resources Innovation and Utilization, Guangzhou, China
Kaixing Fang & Hualing Wu
Key Laboratory of Biology, Genetics and Breeding of Special Economic Animals and Plants, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, Hangzhou, China
Chunlei Ma

Authors

Shuai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pengjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weilong Kong
View author publications
You can also search for this author in PubMed Google Scholar
Kun Chai
View author publications
You can also search for this author in PubMed Google Scholar
Shengcheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yibin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengwei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenlong Lei
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Shenyang Qu
View author publications
You can also search for this author in PubMed Google Scholar
Fang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yinghao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mengya Gu
View author publications
You can also search for this author in PubMed Google Scholar
Kaixing Fang
View author publications
You can also search for this author in PubMed Google Scholar
Chunlei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Weijiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Naixing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hualing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingtan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Z. designed this project and coordinated the research activities. P.W., S.C., N.Y., H.W., K.F., Q.Z., M.G., C.M. and W.S. collected and provided the plant materials. N.Y., H.W. and K.F. participated in the genome sequencing and resequencing. X.Z., S.C., S.Z., J.Y. and Yibin Wang assembled the genomes. S.C., K.C., W.W., M.J., W.L., S.Q., F.W. and Y.G. performed the gene annotation. S.C., P.W., K.C., Yinghao Wang and W.K. analysed the RNA-seq data. S.C. constructed the sequence- and gene-based pangenome. K.C., S.C. and P.W. analysed the metabolomic data. S.C. and W.K. contributed to the population GWAS analysis. S.C., K.C. and S.Z. developed the PanMarker approach. X.C. constructed the database for the pangenome. X.Z., S.C., P.W. and W.K. interpreted the data and contributed to the manuscript writing.

Corresponding authors

Correspondence to Naixing Ye, Hualing Wu or Xingtan Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Geographical distribution and sample selection based on phylogeny of 736 tea accessions.

a) Geographic locations of the 736 re-sequencing Camellia samples that are used to query the phylogenetic relationship. These samples were mainly collected from eight tea-cultivated countries, namely China, India, Korea, Japan, Laos, Sri Lanka, Georgia, and Kenya. The world map was constructed using the Python script with the Natural Earth dataset (http://www.naturalearthdata.com). b) Population structure analysis of 736 re-sequencing Camellia accession. The highlighted clades on the phylogenetic tree are the pan genomic individuals. The admixture was estimated using standard error with 2000 bootstrap replicates. ‘SEKJ’ represents the tea plant cultivation areas that are distributed in the southeastern provinces of China, including Zhejiang, Anhui, Henan, Fujian, Taiwan, and countries such as South Korea and Japan. On the other hand. ‘CPC’ represents the tea cultivation areas in the central provinces of China.

Extended Data Fig. 2 Characteristics of tea genome assemblies.

a-c) The contig N50 values are influenced by different factors, including (a) sequencing coverage, (b) genome heterozygosity and (c) length of sequencing reads. The correlation was assessed by calculating the Pearson correlation coefficient. The dot represents each of assembled samples (n = 20, Due to the 20× coverage HiFi reads for ‘HD’, and lack of the raw reads for ‘SCZ’, it has not been included in this statistical analysis). The P-value was calculated from the two-sided t-test. d-f) Genome-wide analysis of chromatin interactions at 150-kbp resolution in ZJ (d), LJ43 (e), and ZYQ (f) genome. g) Syntenic plot among 22 tea assemblies. The chromosomes are represented by colorful boxes, and collinear regions among these genomes are shown as grey blocks.

Extended Data Fig. 3 Recent duplicated genes derived by TEs.

a-b) Whole genome duplication (WGD) events were detected in the tea plant, with species names abbreviated as follows: Nn, Nelumbo nucifera, Vv, Vitis vinifera L, Ach, Actinidia chinensis, Cl, C. lanceoleosa, Cs, C. sinensis. The Ks values between each gene pairs. c). Recent LTR burst detected in tea plant genome. d). A schematic diagram of transposable genes carried by LTRs (long terminal repeats). e) Classification of TE-derived genes. f). Two examples of gene duplication originated from LTR transposition events.

Extended Data Fig. 4 Genome-wide sequence variation of the 22 tea genomes.

a) Assessment of the Synteny relationship using Synteny Relationship Index (SRI) among 7 plant pan-genomes. These species include Camellia sinensis (Cs, TE ratio, 78.2%, n = 22), Solanum tuberosum L (St, 63.0%, n = 44), Solanum lycopersicum (Sly, 60.7%, n = 31), Arabidopsis thaliana (At, 17.6%, n = 8), Sorghum bicolor L (Sb, 61.0%, n = 13), Zea mays (Zm, 83.2%, n = 10), Oryza sativa L(Os, 52.9%, n = 33). Each box plot shows the distribution of data, with the median value represented by the bold line at the center of the box. The box itself represents the first (25%) and third (75%) quartiles. The minimum and maximum values are illustrated by the lower and upper whiskers respectively. b) The proportion of genetic variations between any two samples out of the 22 genomes calculated based on pairwise genomic alignments. The white dot in the center of the violin plot represents the median value, and the bounds of each black box indicate first (25%) and third (75%) quartiles. The lower and upper bounds of the whiskers are the minima and maxima, respectively. c) Distribution of SNPs/Indels identified from the 736 re-sequencing samples and SVs identified from 22 genomes along 15 pseudo-chromosomes. The density of SNPs/indels is represented by the colored bands on these pseudo-chromosomes, while the red lines alongside those pseudo-chromosomes indicate the distribution of SVs. d-e) Pearson correlation coefficients, which show the comparisons between LTR or TIR count per window (10 Mbp window size, 5Mbp step) and SV count per window. For d and e, Total of 572 and 598 windows were plotted, respectively. The P-value was calculated by two-sided t-test.

Extended Data Fig. 5 Concentration of metabolites.

a) Concentration of ChlorophyII (Chl) a, Chl b and Chl a + b in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). b) Concentration of carotenoids in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). c) Concentration of anthocyanins in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). The asterisk represents the statistical significance (two-sided Student’s t-test). Data presented in mean ± SEM, n = 3. Three independent experiments are carried out.

Extended Data Fig. 6 Identification of SVs in six key genes in chlorophyll and carotenoid biosynthesis, namely CYP97A3, CAO, ChID, ChIP, NOL and GluTR.

The rectangles illustrate the physical position of SVs. ‘Others’ represents the accessions ‘ZJ’, ‘FDDB’, and ‘JMZ’. These variations are shown in the sequence alignments.

Extended Data Fig. 7 Expression profiles of genes encoding enzymes involved in anthocyanin biosynthesis.

Enzymes abbreviations in each step are highlighted in bold. The black boxes contain the metabolites responsible for anthocyanins production. The gradient color of heatmap represents gene expression levels across different cultivars.

Extended Data Fig. 8 Read coverage along the promoter region of CsMYB114 gene.

The x-axis indicates the genomic coordinates, and y-axis represents the sequencing depth.

Extended Data Fig. 9 3D protein modeling and haplotype analysis of AFS1.

a) Diagrams illustrate 3D protein modeling of the single amino acid mutation (G- > A) in the CsAFS1 protein. the substrate binding center of the GG-AFS1 type is represented by orange spheres, while the substrate binding center of the AA-AFS1 type is depicted by red spheres. b) Distribution of allelic variants (AA, AG and GG) of CsAFS1 gene using 587 re-sequenced tea accessions. c) Distribution of allelic variants (AG and GG) of AFS1 gene in 17 wild tea relatives.

Supplementary information

Supplementary Information

Supplementary Figs. 1–18, Tables 1–15, Note and references.

Reporting Summary

Supplementary Data 1

Supplementary Tables 16–23.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, S., Wang, P., Kong, W. et al. Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis. Nat. Plants 9, 1986–1999 (2023). https://doi.org/10.1038/s41477-023-01565-z

Download citation

Received: 24 April 2023
Accepted: 20 October 2023
Published: 27 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s41477-023-01565-z

This article is cited by

Technology-enabled great leap in deciphering plant genomes
- Lingjuan Xie
- Xiaojiao Gong
- Longjiang Fan
Nature Plants (2024)
Teatime for pangenomics
- David Edwards
- Jacqueline Batley
Nature Plants (2023)