The draft genome of sweet orange (Citrus sinensis)

Journal name:
Nature Genetics
Year published:
Published online


Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.

At a glance


  1. Alignment of the genome sequence assembly with the genetic map of C. sinensis.
    Figure 1: Alignment of the genome sequence assembly with the genetic map of C. sinensis.

    Assembled scaffolds (blue; 239 Mb, or 75% of the assembled genome sequence) were anchored in the nine linkage groups (LG1–LG9, yellow) with the corresponding genetic markers (black bars). The pseudochromosome numbers are assigned on the basis of the estimated length of the genetic linkage groups14.

  2. Genome characterization.
    Figure 2: Genome characterization.

    (a) Circular diagram depicting the genomic landscape of the nine sweet orange pseudochromosomes (Chr1 to ~Chr9 on a Mb scale). The denotation of each track is listed on the right. RPKM, reads per kilobase exon model per million mapped reads. (b) Experimental data support for gene model predictions. Left, box plot of RNA-Seq reads aligned over gene features. The top and bottom of the boxes indicate the upper and lower quartiles, respectively. The middle red bars represent the median, and the red dots denote the mean value for each feature. CDS, coding sequence. Right, Venn diagram of the predicted RNA splice junctions supported by various evidences (EST, protein and RNA-Seq data). Numbers in parentheses are the number of splice junctions in each category. (c) Demarcation of gene model boundaries by RNA-PET data. Histogram plot of RNA-PET data in association with aligned gene models in relation to the 5′ (putative transcription start site, TSS) and 3′ (putative poly(A) site, PAS) boundaries.

  3. Heterozygosity and hybrid origin of sweet orange.
    Figure 3: Heterozygosity and hybrid origin of sweet orange.

    (a) The heterozygosity rate of the sweet orange genome. Homozygous and heterozygous states of genic regions were detected using SNP data from the parental diploid genome; repeat regions were evaluated by comparison of SSR marker patterns between the diploid and dihaploid. The number in the top pie chart indicates the gene number, and the number in the bottom pie chart is the number of SSR markers. (b) Chromosome karyotyping of dihaploid (left) and parental diploid (right). The 18 chromosomes (chrs) in each nucleus were easily identified into nine matched pairs in the dihaploid line, whereas the six chromosomes in the diploid nucleus have no clear counterpart, indicating high heterozygosity in the diploid state. The cytological types of chromosomes were traditionally named as Bf, B, C, Df, D and F according to the 4′-6-diamidino-2-phenylindole (blue) and chromomycin A3 (yellow) banding regions, as well as the fragile site (circle), as previously described41 (Supplementary Fig. 3). (c) SSR marker genotyping of two pummelo cultivars (P1 and P2), sweet orange (SO) and two mandarin cultivars (M1 and M2). Of the 307 SSR markers, 105 were shared among pummelo, orange and mandarin (middle), 55 were pummelo hereditary (left) and 147 were mandarin hereditary (right). One marker showed an orange-specific pattern and was excluded in this analysis. (d) The genetic origin of the sweet orange dihaploid genome determined by high-density SNP markers of pummelo and mandarin. OP, origin pattern (pummelo origin in yellow, mandarin origin in orange and undetermined in gray); SC, assembled sequence scaffold; LG, linkage group map. Pseudochromosome 1 is shown here as an example, and the other chromosomes are shown in Supplementary Figure 11. (e) A model of the sweet orange origin. With pummelo (PP) as the female parent crossed with mandarin (MM), the interspecific hybrid was backcrossed with mandarin (MM) and produced the ancient sweet orange.

  4. Evolutionary analysis of the sweet orange genome.
    Figure 4: Evolutionary analysis of the sweet orange genome.

    (a) Schematic representation of major interchromosomal relationships within the 1,294 paralogous gene groups in the sweet orange genome. Syntenic blocks derived from the seven ancestral protochromosomes A1, A4, A7, A10, A13, A16 and A19 (ref. 28) are color coded as indicated. (b) Distribution of synonymous substitution rates (Ks) for homologous gene groups for intrachromosome and interchromosome comparisons. Gene duplication analysis in comparison to apple, Arabidopsis and cacao indicates that no recent WGDs occurred in the sweet orange genome. (c) Schematic representation of the syntenic relationship of sweet orange chromosome 9 and the corresponding chromosomes in Arabidopsis, Theobroma cacao, Malus × domestica, Fragaria vesca and Vitis vinifera. Syntenic regions derived from the seven ancestral protochromosomes are color coded as in a. Among the 59 syntenic blocks between sweet orange chromosome 9 and the Arabidopsis chromosomes, 22 (37%) show primary correspondence to four Arabidopsis blocks, 10 to three blocks, 24 to two blocks and 3 to one block; among the 79 syntenic blocks between sweet orange chromosome 9 and the apple chromosomes, 48 (61%) show primary correspondence to two Arabidopsis blocks, 20 to three blocks, 6 to four blocks and 2 to one block. Therefore, on the basis of these percentages, one-to-four and one-to-two relationships were determined to be the most probable for the orange-to-Arabidopsis and orange-to-apple comparisons, respectively. (d) Citrus phylogeny on the basis of 103 single-copy genes shared between Arabidopsis, cacao, poplar, grape, apple, strawberry, papaya and castor bean from nuclear genome data. MYA, million years ago.

  5. Genes involved in vitamin C metabolism.
    Figure 5: Genes involved in vitamin C metabolism.

    (a) Heat map of the normalized RNA-Seq data for genes involved in AsA metabolism. Genes with a minimum expression level (RNA-Seq RPKM > 1 (ref. 42)) are shown. (b) Phylogenetic analysis of the GalUR gene family in C. sinensis, Arabidopsis thaliana, Carica papaya and T. cacao (top); two recent expansions are shaded with two oval circles (one cluster with GalUR-6, GalUR-11, GalUR-12 and GalUR-13 (orange oval) and the other cluster with GalUR-1, GalUR-2, GalUR-3, GalUR-4, GalUR-8, GalUR-10 and GalUR-14 (gray oval)). Note that GalUR-12 is a recent member of the family and is highly upregulated in orange fruits. The genomic organization of the 18 GalUR genes (black arrows) in the sweet orange genome is shown (bottom). Sc, scaffold.

Accession codes

Primary accessions


NCBI Reference Sequence


  1. Roose, M.L. & Close, T.J. Genomics of citrus, a major fruit crop of tropical and subtropical regions. in Genomics of Tropical Crop Plants (eds. Moore, P.H. & Ming, R.) 187201 (Springer Press, 2008).
  2. Gmitter, F.G. et al. Citrus genomics. Tree Genet. Genomes 8, 611626 (2012).
  3. Webber, H.J. History and development of the Citrus industry. in The Citrus Industry (eds. Reuther, W. et al.) Chap. 1, 139 (University of California Press, 1967).
  4. Scora, R.W. On the history and origin of Citrus. Bull. Torrey Bot. Club 102, 369375 (1975).
  5. Gmitter, F. & Hu, X. The possible role of Yunnan province, China, in the origin of contemporary Citrus species (Rutaceae). Econ. Bot. 44, 267277 (1990).
  6. Legge, J. The shoo king and The tribute of Yu. in The Chinese Classics, Vol. 3, Pt. 1 and Pt. 3, Bk. 1, Chap. 6, 111112 (Trubner & Co., London, 1865).
  7. Moore, G.A. Oranges and lemons: clues to the taxonomy of Citrus from molecular markers. Trends Genet. 17, 536540 (2001).
  8. Nicolosi, E. et al. Citrus phylogeny and genetic origin of important species as investigated by molecular markers. Theor. Appl. Genet. 100, 11551166 (2000).
  9. Arumuganathan, K. & Earle, E.D. Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9, 208218 (1991).
  10. Cao, H. et al. Doubled haploid callus lines of Valencia sweet orange recovered from anther culture. Plant Cell Tissue Organ Cult. 104, 415423 (2011).
  11. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  12. Gao, S., Sung, W.K. & Nagarajan, N. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. 18, 16811691 (2011).
  13. Baig, M.N., Yu, A., Guo, W. & Deng, X. Construction and characterization of two Citrus BAC libraries and identification of clones containing the phytoene synthase gene. Genome 52, 484489 (2009).
  14. Lyon, M.P. A Genomic Genetic Map of the Common Sweet Orange and Poncirus trifoliata. PhD dissertation, University of California, Riverside, (2008).
  15. Iwamasa, M. Reciprocal translocation in the Valencia orange. Chromosome Inform. Service 4, 910 (1963).
  16. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815 (2000).
  17. Goff, S.A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp japonica). Science 296, 92100 (2002).
  18. Du, J. et al. Evolutionary conservation, diversity and specificity of LTR-retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison. Plant J. 63, 584598 (2010).
  19. Gao, L., McCarthy, E.M., Ganko, E.W. & McDonald, J.F. Evolutionary history of Oryza sativa LTR retrotransposons: a preliminary survey of the rice genome sequences. BMC Genomics 5, 18 (2004).
  20. Velasco, R. et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat. Genet. 42, 833839 (2010).
  21. Ming, R. et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991996 (2008).
  22. Dimmer, E.C. et al. The UniProt-GO Annotation database in 2011. Nucleic Acids Res. 40, D565D570 (2012).
  23. Xu, Q. et al. Discovery and comparative profiling of microRNAs in a sweet orange red-flesh mutant and its wild type. BMC Genomics 11, 246 (2010).
  24. Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97100 (2011).
  25. Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101108 (2011).
  26. Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109116 (2011).
  27. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007).
  28. Abrouk, M. et al. Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends Plant Sci. 15, 479487 (2010).
  29. Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 15961604 (2006).
  30. Chan, A.P. et al. Draft genome sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28, 951956 (2010).
  31. The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635641 (2012).
  32. Vrebalov, J. et al. A MADS-box gene necessary for fruit ripening at the tomato ripening-inhibitor (rin) locus. Science 296, 343346 (2002).
  33. Seymour, G.B. et al. A SEPALLATA gene is involved in the development and ripening of strawberry (Fragaria × ananassa Duch.) fruit, a non-climacteric tissue. J. Exp. Bot. 62, 11791188 (2011).
  34. Cruz-Rus, E., Botella, M.A., Valpuesta, V. & Gomez-Jimenez, M.C. Analysis of genes involved in l-ascorbic acid biosynthesis during growth and ripening of grape berries. J. Plant Physiol. 167, 739748 (2010).
  35. Bulley, S.M. et al. Gene expression studies in kiwifruit and gene over-expression in Arabidopsis indicates that GDP-l-galactose guanyltransferase is a major control point of vitamin C biosynthesis. J. Exp. Bot. 60, 765778 (2009).
  36. Agius, F. et al. Engineering increased vitamin C levels in plants by overexpression of a D-galacturonic acid reductase. Nat. Biotechnol. 21, 177181 (2003).
  37. Lippman, Z.B. & Zamir, D. Heterosis: revisiting the magic. Trends Genet. 23, 6066 (2007).
  38. Chinese Society of Citriculture. Citrus Industry in China. (China Agriculture Press, Beijing, China, 2008).
  39. Barrett, H.C. & Rhodes, A.M. A numberical taxonomic study of affinity relationships in cultivated Citrus and its close relatives. Syst. Bot. 1, 105136 (1976).
  40. Mendel, K. Bud mutations in citrus and their potential commercial value. Proc. Int. Soc. Citricult. 1, 8689 (1981).
  41. Guerra, M. Cytogenetics of Rutaceae. V. High chromosomal variability in Citrus species revealed by CMA/DAPI staining. Heredity 71, 234241 (1993).
  42. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621628 (2008).
  43. Kelley, D.R., Schatz, M.C. & Salzberg, S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116 (2010).
  44. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
  45. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589595 (2010).
  46. Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973982 (2007).
  47. Haas, B.J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
  48. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 11051111 (2009).
  49. de Hoon, M.J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software. Bioinformatics 20, 14531454 (2004).
  50. Ruan, X. & Ruan, Y. Genome wide full-length transcript analysis using 5′ and 3′ paired-end-tag next generation sequencing (RNA-PET). Methods Mol. Biol. 809, 535562 (2012).
  51. Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 21782189 (2003).
  52. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  53. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
  54. Chen, C. et al. EST-SSR genetic maps for Citrus sinensis and Poncirus trifoliata. Tree Genet. Genomes 4, 110 (2008).
  55. Lai, J. et al. Genome-wide patterns of genetic variation among elite maize inbred lines. Nat. Genet. 42, 10271030 (2010).
  56. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486488 (2008).
  57. Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456463 (2005).

Download references

Author information

  1. These authors contributed equally to this work.

    • Qiang Xu,
    • Ling-Ling Chen &
    • Xiaoan Ruan


  1. Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, China.

    • Qiang Xu,
    • Andan Zhu,
    • Chunli Chen,
    • Jiongjiong Chen,
    • Hong Lan,
    • Qun Hu,
    • Lun Wang,
    • Shixin Xiao,
    • Manosh Kumar Biswas,
    • Wenfang Zeng,
    • Fei Guo,
    • Hongbo Cao,
    • Xiaoming Yang,
    • Yun-Jiang Cheng,
    • Juan Xu,
    • Ji-Hong Liu,
    • Wen-Wu Guo,
    • Hanhui Kuang &
    • Xiu-Xin Deng
  2. College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China.

    • Ling-Ling Chen,
    • Dijun Chen,
    • Chunli Chen,
    • Wen-Biao Jiao,
    • Bao-Hai Hao,
    • Feng Xing,
    • Ji-Wei Chang,
    • Yang Lei,
    • Yin Miao,
    • Xi-Wen Xu,
    • Zhonghui Tang,
    • Hong-Yu Zhang &
    • Yijun Ruan
  3. Genome Institute of Singapore, Singapore.

    • Xiaoan Ruan,
    • Denis Bertrand,
    • Song Gao,
    • Oscar Junhong Luo,
    • Niranjan Nagarajan &
    • Yijun Ruan
  4. Botany and Plant Sciences, University of California, Riverside, California, USA.

    • Matthew P Lyon &
    • Mikeal L Roose
  5. College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China.

    • Xianhong Ge


X.-X.D. and Y.R. conceived the project and the strategy. Q.X. coordinated the overall project. X.R. directed sequencing data generation. L.-L.C. led the genome annotation analysis. N.N., D.B. and S.G. assembled the draft genome. D.C., W.-B.J., B.-H.H., J.-W.C., Z.T., F.X., O.J.L., Y.L., X.-W.X. and H.-Y.Z. performed gene annotation, SNP analysis, transcriptome analysis and database management. A.Z., Q.X., H.K., J.C., L.W., Q.H., M.K.B., W.Z., J.-H.L., F.G., H.C., Y.-J.C., J.X., X.Y. and W.-W.G. performed repetitive elements, genome anchor, gene synteny and evolutionary analyses. C.C., H.L., X.G., Y.M. and S.X. performed cytological analyses. M.P.L. and M.L.R. developed the sweet orange linkage map. Q.X. and Y.R. wrote the manuscript with contributions from X.R., L.-L.C., D.C., A.Z., N.N., D.B., C.C., W.-B.J., F.X., H.K., M.L.R. and X.-X.D.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3M)

    Supplementary Note, Supplementary Tables 1–8, 11–15, 17 and 18 and Supplementary Figures 1–20

Excel files

  1. Supplementary Tables (877K)

    Supplementary Tables 9,10, 16 and 19-24

Additional data