Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis

Abstract

Tea is one of the world’s oldest crops and is cultivated to produce beverages with various flavours. Despite advances in sequencing technologies, the genetic mechanisms underlying key agronomic traits of tea remain unclear. In this study, we present a high-quality pangenome of 22 elite cultivars, representing broad genetic diversity in the species. Our analysis reveals that a recent long terminal repeat burst contributed nearly 20% of gene copies, introducing functional genetic variants that affect phenotypes such as leaf colour. Our graphical pangenome improves the efficiency of genome-wide association studies and allows the identification of key genes controlling bud flush timing. We also identified strong correlations between allelic variants and flavour-related chemistries. These findings deepen our understanding of the genetic basis of tea quality and provide valuable genomic resources to facilitate its genomics-assisted breeding.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A total of 22 representative cultivars of tea plant were selected for the pangenome analysis.
Fig. 2: Detection of SVs and construction of the pangenome based on the 22 de novo assembled tea genomes.
Fig. 3: Functional impact of SVs associated with the regulation of tea leaf colour.
Fig. 4: Application of graphical pangenome-based GWAS in mining of key genes associated with the TBF trait in tea.
Fig. 5: The pangenome empowers genomics-assisted breeding.
Fig. 6: Pangenomic basis of the effect of CsAFS1 expression on α-farnesene content.

Similar content being viewed by others

Data availability

The raw data, genome assemblies and annotation have been submitted to the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under accession number PRJCA013847 for the pangenome samples. All the assemblies, annotations, variant VCF files and graph-based genome are also available at the Tea Graph Pangenome Database (https://www.tea-pangenome.cn/).

Code availability

The PanMarker and SRI approach is freely available at GitHub (https://github.com/chaiyuangungun/PanMarker; https://github.com/Yujiaxin419/SRI-pipeline).

References

  1. Neale, D. B. & Kremer, A. Forest tree genomics: growing resources and applications. Nat. Rev. Genet. 12, 111–122 (2011).

    CAS  PubMed  Google Scholar 

  2. Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).

    CAS  PubMed  Google Scholar 

  3. Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Li, H. et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 13, 682 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Tao, Y. et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 7, 766–773 (2021).

    CAS  PubMed  Google Scholar 

  7. Tang, D. et al. Genome evolution and diversity of wild and cultivated potatoes. Nature 606, 535–541 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Pastoriza, S. et al. Healthy properties of green and white teas: an update. Food Funct. 8, 2650–2662 (2017).

    CAS  PubMed  Google Scholar 

  9. Kingdom-Ward, F. Does wild tea exist? Nature 165, 297–299 (1950).

    Google Scholar 

  10. Xia, E.-H. et al. Tea plant genomics: achievements, challenges and perspectives. Hortic. Res. 7, 7 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Tan, L. et al. Genetic analysis of the early bud flush trait of tea plants (Camellia sinensis) in the cultivar ‘Emei Wenchun’ and its open-pollinated offspring. Hortic. Res. 9, uhac086 (2022).

    PubMed  PubMed Central  Google Scholar 

  12. Wang, P. et al. Changes in non-volatile and volatile metabolites associated with heterosis in tea plants (Camellia sinensis). J. Agric. Food Chem. 70, 3067–3078 (2022).

    CAS  PubMed  Google Scholar 

  13. Xia, E. et al. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into its genome evolution and adaptation. Mol. Plant 13, 1013–1026 (2020).

    CAS  PubMed  Google Scholar 

  14. Wang, X. et al. Population sequencing enhances understanding of tea plant evolution. Nat. Commun. 11, 4447 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhang, Q. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Mol. Plant 13, 935–938 (2020).

    CAS  PubMed  Google Scholar 

  16. Zhang, W. et al. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat. Commun. 11, 3719 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Wang, P. et al. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Hortic. Res. 8, 107 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, X. et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 53, 1250–1259 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Yao, M. et al. Diversity distribution and population structure of tea germplasms in China revealed by EST-SSR markers. Tree Genet. Genomes 8, 205–220 (2012).

    Google Scholar 

  20. Huang, X. et al. The integrated genomics of crop domestication and breeding. Cell 185, 2828–2839 (2022).

    CAS  PubMed  Google Scholar 

  21. Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

  22. Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. https://doi.org/10.1093/nar/gky730 (2018).

  24. Xia, E.-H. et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10, 866–877 (2017).

    CAS  PubMed  Google Scholar 

  25. Wang, Y. et al. An ancient whole-genome duplication event and its contribution to flavor compounds in the tea plant (Camellia sinensis). Hortic. Res. 8, 176 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Fedoroff, N. Transposons and genome evolution in plants. Proc. Natl Acad. Sci. USA 97, 7002–7007 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Lin, G. et al. Chromosome-level genome assembly of a regenerable maize inbred line A188. Genome Biol. 22, 175 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Zhao, Y. et al. A chromosome-level genome assembly and annotation of the maize elite breeding line Dan340. Gigabyte 2022, gigabyte63 (2022).

    PubMed  PubMed Central  Google Scholar 

  30. Haberer, G. et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 52, 950–957 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Hu, Y. et al. Genome assembly and population genomic analysis provide insights into the evolution of modern sweet corn. Nat. Commun. 12, 1227 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Yang, N. et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat. Genet. 51, 1052–1059 (2019).

    CAS  PubMed  Google Scholar 

  33. Springer, N. M. et al. The maize W22 genome provides a foundation for functional genomics and transposon biology. Nat. Genet. 50, 1282–1288 (2018).

    CAS  PubMed  Google Scholar 

  34. Jiao, W.-B. & Schneeberger, K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Schön, A. et al. The RNA required in the first step of chlorophyll biosynthesis is a chloroplast glutamate tRNA. Nature 322, 281–284 (1986).

  37. Wei, K. et al. A coupled role for CsMYB75 and CsGSTF1 in anthocyanin hyperaccumulation in purple tea. Plant J. 97, 825–840 (2019).

    CAS  PubMed  Google Scholar 

  38. Gonzalez, A. et al. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J. 53, 814–827 (2008).

    CAS  PubMed  Google Scholar 

  39. Qin, F. et al. Arabidopsis DREB2A-interacting proteins function as RING E3 ligases and negatively regulate plant drought stress–responsive gene expression. Plant Cell 20, 1693–1707 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Sakuma, Y. et al. Dual function of an Arabidopsis transcription factor DREB2A in water-stress-responsive and heat-stress-responsive gene expression. Proc. Natl Acad. Sci. USA 103, 18822–18827 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. He, J. et al. CYP72A enzymes catalyse 13-hydrolyzation of gibberellins. Nat. Plants 5, 1057–1065 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Yu, X. et al. Metabolite signatures of diverse Camellia sinensis tea populations. Nat. Commun. 11, 5586 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Gradogna, A. et al. Tonoplast cytochrome b561 is a transmembrane ascorbate‐dependent monodehydroascorbate reductase: functional characterization of electron currents in plant vacuoles. New Phytol. https://doi.org/10.1111/nph.18823 (2023).

  44. Xu, W., Dubos, C. & Lepiniec, L. Transcriptional control of flavonoid biosynthesis by MYB–bHLH–WDR complexes. Trends Plant Sci. 20, 176–185 (2015).

    CAS  PubMed  Google Scholar 

  45. Tan, H. et al. A crucial role of GA-regulated flavonol biosynthesis in root growth of Arabidopsis. Mol. Plant 12, 521–537 (2019).

    CAS  PubMed  Google Scholar 

  46. Yao, S. et al. Insights into acylation mechanisms: co‐expression of serine carboxypeptidase‐like acyltransferases and their non‐catalytic companion paralogs. Plant J. https://doi.org/10.1111/tpj.15782 (2022).

  47. Zhao, M. et al. Sesquiterpene glucosylation mediated by glucosyltransferase UGT91Q2 is involved in the modulation of cold stress tolerance in tea plants. New Phytol. 226, 362–372 (2020).

  48. Jing, T. et al. Herbivore‐induced volatiles influence moth preference by increasing the β‐ocimene emission of neighbouring tea plants. Plant Cell Environ. 44, 3667–3680 (2021).

    CAS  PubMed  Google Scholar 

  49. Chen, J. et al. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Hortic. Res. 7, 63 (2020).

    PubMed  PubMed Central  Google Scholar 

  50. Gong, A. et al. Integrated transcriptomics and metabolomics analysis of catechins, caffeine and theanine biosynthesis in tea plant (Camellia sinensis) over the course of seasons. BMC Plant Biol. 20, 294 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Huang, F. et al. Metabolomic and transcriptomic analyses reveal a MYB gene, CsAN1, involved in anthocyanins accumulation separation in F1 between ‘Zijuan’ (Camellia sinensis var. assamica) and ‘Fudingdabaicha’ (C. sinensis var. sinensis) tea plants. Front. Plant Sci. 13, 1008588 (2022).

    PubMed  PubMed Central  Google Scholar 

  52. Song, S. et al. An integrated metabolome and transcriptome analysis reveal the regulation mechanisms of flavonoid biosynthesis in a purple tea plant cultivar. Front. Plant Sci. 13, 880227 (2022).

    PubMed  PubMed Central  Google Scholar 

  53. Wu, L. et al. Transcriptomic and translatomic analyses reveal insights into the developmental regulation of secondary metabolism in the young shoots of tea plants (Camellia sinensis L.). J. Agric. Food Chem. 68, 10750–10762 (2020).

    CAS  PubMed  Google Scholar 

  54. He, X. et al. Isolation and characterization of key genes that promote flavonoid accumulation in purple-leaf tea (Camellia sinensis L.). Sci. Rep. 8, 130 (2018).

    PubMed  PubMed Central  Google Scholar 

  55. Mei, Y. et al. Metabolites and transcriptional profiling analysis reveal the molecular mechanisms of the anthocyanin metabolism in the ‘Zijuan’ tea plant (Camellia sinensis var. assamica). J. Agric. Food Chem. 69, 414–427 (2021).

    CAS  PubMed  Google Scholar 

  56. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

  57. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

    PubMed  PubMed Central  Google Scholar 

  58. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Wang, Y., Li, J. & Paterson, A. H. MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics 29, 1458–1460 (2013).

    CAS  PubMed  Google Scholar 

  63. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

  65. Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

  68. Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    PubMed  PubMed Central  Google Scholar 

  69. O’Donnell, S. & Fischer, G. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36, 3242–3243 (2020).

    PubMed  Google Scholar 

  70. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

  71. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  72. Enright, A. J. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).

  74. Zhang, J. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

    CAS  PubMed  Google Scholar 

  75. Yang, Y. & Liang, Y. A Record of Chinese Clonal Tea Varieties (Shanghai Scientific & Technical Publishers, 2014).

  76. Chen, C. & Yu, W. A Map of Tea Varieties in Fujian Province (China Agricultural Science and Technology Press, 2016).

  77. Chen, Z. & Yang, Y. The Chinese Classic of Tea (Shanghai Culture Publishing House, 2011).

  78. Liang, M. & Tian, Y. Tea Germplasm Resources in Yunnan Province (China Agricultural Science and Technology Press, 2016).

  79. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province (grant no. 2020B020220004), the Shenzhen Science and Technology Program (grant no. RCYX20210706092103024) and a National Natural Science Foundation of China grant (no. 32222019).

Author information

Authors and Affiliations

Authors

Contributions

X.Z. designed this project and coordinated the research activities. P.W., S.C., N.Y., H.W., K.F., Q.Z., M.G., C.M. and W.S. collected and provided the plant materials. N.Y., H.W. and K.F. participated in the genome sequencing and resequencing. X.Z., S.C., S.Z., J.Y. and Yibin Wang assembled the genomes. S.C., K.C., W.W., M.J., W.L., S.Q., F.W. and Y.G. performed the gene annotation. S.C., P.W., K.C., Yinghao Wang and W.K. analysed the RNA-seq data. S.C. constructed the sequence- and gene-based pangenome. K.C., S.C. and P.W. analysed the metabolomic data. S.C. and W.K. contributed to the population GWAS analysis. S.C., K.C. and S.Z. developed the PanMarker approach. X.C. constructed the database for the pangenome. X.Z., S.C., P.W. and W.K. interpreted the data and contributed to the manuscript writing.

Corresponding authors

Correspondence to Naixing Ye, Hualing Wu or Xingtan Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Geographical distribution and sample selection based on phylogeny of 736 tea accessions.

a) Geographic locations of the 736 re-sequencing Camellia samples that are used to query the phylogenetic relationship. These samples were mainly collected from eight tea-cultivated countries, namely China, India, Korea, Japan, Laos, Sri Lanka, Georgia, and Kenya. The world map was constructed using the Python script with the Natural Earth dataset (http://www.naturalearthdata.com). b) Population structure analysis of 736 re-sequencing Camellia accession. The highlighted clades on the phylogenetic tree are the pan genomic individuals. The admixture was estimated using standard error with 2000 bootstrap replicates. ‘SEKJ’ represents the tea plant cultivation areas that are distributed in the southeastern provinces of China, including Zhejiang, Anhui, Henan, Fujian, Taiwan, and countries such as South Korea and Japan. On the other hand. ‘CPC’ represents the tea cultivation areas in the central provinces of China.

Extended Data Fig. 2 Characteristics of tea genome assemblies.

a-c) The contig N50 values are influenced by different factors, including (a) sequencing coverage, (b) genome heterozygosity and (c) length of sequencing reads. The correlation was assessed by calculating the Pearson correlation coefficient. The dot represents each of assembled samples (n = 20, Due to the 20× coverage HiFi reads for ‘HD’, and lack of the raw reads for ‘SCZ’, it has not been included in this statistical analysis). The P-value was calculated from the two-sided t-test. d-f) Genome-wide analysis of chromatin interactions at 150-kbp resolution in ZJ (d), LJ43 (e), and ZYQ (f) genome. g) Syntenic plot among 22 tea assemblies. The chromosomes are represented by colorful boxes, and collinear regions among these genomes are shown as grey blocks.

Extended Data Fig. 3 Recent duplicated genes derived by TEs.

a-b) Whole genome duplication (WGD) events were detected in the tea plant, with species names abbreviated as follows: Nn, Nelumbo nucifera, Vv, Vitis vinifera L, Ach, Actinidia chinensis, Cl, C. lanceoleosa, Cs, C. sinensis. The Ks values between each gene pairs. c). Recent LTR burst detected in tea plant genome. d). A schematic diagram of transposable genes carried by LTRs (long terminal repeats). e) Classification of TE-derived genes. f). Two examples of gene duplication originated from LTR transposition events.

Extended Data Fig. 4 Genome-wide sequence variation of the 22 tea genomes.

a) Assessment of the Synteny relationship using Synteny Relationship Index (SRI) among 7 plant pan-genomes. These species include Camellia sinensis (Cs, TE ratio, 78.2%, n = 22), Solanum tuberosum L (St, 63.0%, n = 44), Solanum lycopersicum (Sly, 60.7%, n = 31), Arabidopsis thaliana (At, 17.6%, n = 8), Sorghum bicolor L (Sb, 61.0%, n = 13), Zea mays (Zm, 83.2%, n = 10), Oryza sativa L(Os, 52.9%, n = 33). Each box plot shows the distribution of data, with the median value represented by the bold line at the center of the box. The box itself represents the first (25%) and third (75%) quartiles. The minimum and maximum values are illustrated by the lower and upper whiskers respectively. b) The proportion of genetic variations between any two samples out of the 22 genomes calculated based on pairwise genomic alignments. The white dot in the center of the violin plot represents the median value, and the bounds of each black box indicate first (25%) and third (75%) quartiles. The lower and upper bounds of the whiskers are the minima and maxima, respectively. c) Distribution of SNPs/Indels identified from the 736 re-sequencing samples and SVs identified from 22 genomes along 15 pseudo-chromosomes. The density of SNPs/indels is represented by the colored bands on these pseudo-chromosomes, while the red lines alongside those pseudo-chromosomes indicate the distribution of SVs. d-e) Pearson correlation coefficients, which show the comparisons between LTR or TIR count per window (10 Mbp window size, 5Mbp step) and SV count per window. For d and e, Total of 572 and 598 windows were plotted, respectively. The P-value was calculated by two-sided t-test.

Extended Data Fig. 5 Concentration of metabolites.

a) Concentration of ChlorophyII (Chl) a, Chl b and Chl a + b in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). b) Concentration of carotenoids in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). c) Concentration of anthocyanins in five tea plant cultivars (HJY, AJBC, FDDB, JMZ and ZJ). The asterisk represents the statistical significance (two-sided Student’s t-test). Data presented in mean ± SEM, n = 3. Three independent experiments are carried out.

Extended Data Fig. 6 Identification of SVs in six key genes in chlorophyll and carotenoid biosynthesis, namely CYP97A3, CAO, ChID, ChIP, NOL and GluTR.

The rectangles illustrate the physical position of SVs. ‘Others’ represents the accessions ‘ZJ’, ‘FDDB’, and ‘JMZ’. These variations are shown in the sequence alignments.

Extended Data Fig. 7 Expression profiles of genes encoding enzymes involved in anthocyanin biosynthesis.

Enzymes abbreviations in each step are highlighted in bold. The black boxes contain the metabolites responsible for anthocyanins production. The gradient color of heatmap represents gene expression levels across different cultivars.

Extended Data Fig. 8 Read coverage along the promoter region of CsMYB114 gene.

The x-axis indicates the genomic coordinates, and y-axis represents the sequencing depth.

Extended Data Fig. 9 3D protein modeling and haplotype analysis of AFS1.

a) Diagrams illustrate 3D protein modeling of the single amino acid mutation (G- > A) in the CsAFS1 protein. the substrate binding center of the GG-AFS1 type is represented by orange spheres, while the substrate binding center of the AA-AFS1 type is depicted by red spheres. b) Distribution of allelic variants (AA, AG and GG) of CsAFS1 gene using 587 re-sequenced tea accessions. c) Distribution of allelic variants (AG and GG) of AFS1 gene in 17 wild tea relatives.

Supplementary information

Supplementary Information

Supplementary Figs. 1–18, Tables 1–15, Note and references.

Reporting Summary

Supplementary Data 1

Supplementary Tables 16–23.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Wang, P., Kong, W. et al. Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis. Nat. Plants 9, 1986–1999 (2023). https://doi.org/10.1038/s41477-023-01565-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-023-01565-z

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research