De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera)

Journal name:
Nature Biotechnology
Year published:
Published online


Date palm is one of the most economically important woody crops cultivated in the Middle East and North Africa and is a good candidate for improving agricultural yields in arid environments. Nonetheless, long generation times (5–8 years) and dioecy (separate male and female trees) have complicated its cultivation and genetic analysis. To address these issues, we assembled a draft genome for a Khalas variety female date palm, the first publicly available resource of its type for a member of the order Arecales. The ~380 Mb sequence, spanning mainly gene-rich regions, includes >25,000 gene models and is predicted to cover ~90% of genes and ~60% of the genome. Sequencing of eight other cultivars, including females of the Deglet Noor and Medjool varieties and their backcrossed males, identified >3.5 million polymorphic sites, including >10,000 genic copy number variations. A small subset of these polymorphisms can distinguish multiple varieties. We identified a region of the genome linked to gender and found evidence that date palm employs an XY system of gender inheritance.

At a glance


  1. Taxonomic tree of selected crops for which genome sequences are available.
    Figure 1: Taxonomic tree of selected crops for which genome sequences are available.

    Date palm is the first member of the order Arecales and the family Arecaceae for which a draft genome sequence is available. Other monocotyledonous plants (class Liliopsida) for which genome sequences are available are mainly grasses (order Poales). The tree was constructed in the Interactive Tree Of Life ( from taxonomy numbers in NCBI (

  2. Date palm SNP analysis.
    Figure 2: Date palm SNP analysis.

    SNPs were compared between parental alleles of the Khalas reference genome and different varieties. (a) The distance between parental allele SNPs in Khalas is not normally distributed. The skewed distribution of adjacent SNP distances demonstrates the occurrence of high and low polymorphism islands in the genome. About 49% of SNPs occur within 50 bp of another SNP. This trend was maintained even after removing SNPs likely to be in repetitive regions (KhlsFilter). (b) Backcrossed varieties of date palm on average show high levels of similarity to their recurrent parent with the number of generations of backcrossing (ranging from backcross 1 to 5 generations) having little effect on similarity levels (error bars are quite small). Intervariety comparisons show significantly more sites with different genotypes. (c) Principal component analysis (PCA) of sequenced genomes based on 3.5 million polymorphic sites. Khalas and backcrossed variants are essentially on top of each other. DN, Deglet Noor; Mdjl, Medjool, BC, backcross; AlrF, AlrijalF; Khls, Khalas; Khlt, Khalt. (d) PCA of sequenced genomes based on 32 decision tree–selected polymorphic sites reveals little loss of discrimination quality with much reduced genotyping required. KhFx, Khalas x Khalas F1.

  3. Analysis of imbalanced sequence count regions (ISCRs) among date palm genomes.
    Figure 3: Analysis of imbalanced sequence count regions (ISCRs) among date palm genomes.

    Numbers of unique ISCRs remaining in each genome after comparison with other genomes are shown. Only nonbackcrossed genomes were considered to avoid bias from inbreeding. Approximately 7% of ISCRs were unique to any single genome, whereas the majority were observed in at least one other genome.

  4. Enrichment of Gene Ontology categories for genes covered by imbalanced sequence count regions (ISCRs).
    Figure 4: Enrichment of Gene Ontology categories for genes covered by imbalanced sequence count regions (ISCRs).

    Gene Ontology categories from genes covered by ISCRs in at least two genomes were analyzed for enrichment. Gene counts in each category were normalized to total gene counts in either the genome or ISCRs. A false discovery rate (FDR) of 0.2 was applied and only categories showing at least twofold enrichment in the ISCRs are reported.

  5. Pedigree and genotype information for gender-discriminating regions.
    Figure 5: Pedigree and genotype information for gender-discriminating regions.

    Date palms of known genealogy were genotyped at multiple gender-discriminating regions. (a) A section of the full pedigree used for linkage analysis showing the complex relationship of the trees. DN, Deglet Noor; Dy, Dayri; Mj, Medjool; BC, backcross; DnPr, initial donor parents. Gray boxes indicate an unknown but theoretically determined genotype. The genotype in each individual is the genotype found at the first gender-discriminating SNP that was genotyped. Segregation of heterozygosity with the male phenotype is clear. (b) Genotypes from four scaffolds (scales with exons annotated as blue ticks and repeats as red rectangles) with the largest number of male-specific SNPs (MS-SNPs). Genotypes from selected regions (tan rectangles) are presented with their scaffold base pair location above each genotype. F and R indicate on which strand (forward or reverse) primers were designed to amplify the selected region. The number observed (both empirically and theoretically) for each gender in each genotype is included. Fem, female; herm., hermaphrodite. Heterozygous SNP calls are shaded gray whereas homozygous calls are shaded blue.


  1. Zohary, D. & Spiegel-Roy, P. Beginnings of fruit growing in the old world. Science 187, 319327 (1975).
  2. Kwaasi, A.A.A. Date palms. in Encyclopedia of Food Sciences and Nutrition. 2nd edn. (ed. Caballero, B.) 17301740 (Elsevier Science, 2003).
  3. Al-Farsi, M.A. & Lee, C.Y. Nutritional and functional properties of dates: a review. Crit. Rev. Food Sci. Nutr. 48, 877887 (2008).
  4. Ainsworth, C., Parker, J. & Buchanan-Wollaston, V. Sex determination in plants. Curr. Top. Dev. Biol. 38, 167223 (1998).
  5. Siljak-Yakovlev, S. et al. Chromosomal sex determination and heterochromatin structure in date palm. Sex. Plant Reprod. 9, 127132 (1996).
  6. Qacif, N., Baaziz, M. & Bendiab, K. Biochemical investigations on peroxidase contents of male and female inflorescences of date palm (Phoenix dactylifera L.). Sci. Hortic. (Amsterdam) 114, 298301 (2007).
  7. Barrett, H.C. Date breeding and improvement in North America. Fruit Varieties Journal 27, 5055 (1973).
  8. Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 7992 (2002).
  9. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009).
  10. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  11. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311317 (2010).
  12. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 15271541 (2009).
  13. Pop, M., Kosack, D.S. & Salzberg, S.L. Hierarchical scaffolding with Bambus. Genome Res. 14, 149159 (2004).
  14. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  15. Parra, G. et al. Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289297 (2009).
  16. Solovyev, V. et al. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 (Suppl 1), S10 (2006).
  17. Conesa, A. & Götz, S. Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int. J. Plant Genomics 2008, 619832 (2008).
  18. Adam, H. et al. MADS box genes in oil palm (Elaeis guineensis): patterns in the evolution of the SQUAMOSA, DEFICIENS, GLOBOSA, AGAMOUS, and SEPALLATA subfamilies. J. Mol. Evol. 62, 1531 (2006).
  19. Jouannic, S. et al. Analysis of expressed sequence tags from oil palm (Elaeis guineensis). FEBS Lett. 579, 27092714 (2005).
  20. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265268 (2007).
  21. McCarthy, E.M. & McDonald, J.F. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362367 (2003).
  22. Han, Y. & Wessler, S.R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. (2010).
  23. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  24. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  25. McNally, K.L. et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. USA 106, 1227312278 (2009).
  26. Hodel, D.R., Johnson, D.V. & Nixon, R.W. Dates—Imported and American Varieties of Dates in the United States (ANR Publications, 2007).
  27. Zhang, H., Wang, M. & Chen, X. Willows: a memory efficient tree and forest construction package. BMC Bioinformatics 10, 130 (2009).
  28. Xie, C. & Tammi, M.T. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 80 (2009).
  29. Ma, J. & Bennetzen, J.L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl. Acad. Sci. USA 101, 1240412410 (2004).
  30. Yu, J. et al. The genomes of Oryza sativa: a history of duplications. PLoS Biol. 3, e38 (2005).
  31. Singh, R., Rastogi, S. & Dwivedi, U.N. Phenylpropanoid metabolism in ripening fruits. Comprehensive Reviews in Food Science and Food Safety 9, 398416 (2010).
  32. Britten, R.J. et al. Majority of divergence between closely related DNA samples is due to indels. Proc. Natl. Acad. Sci. USA 100, 46614665 (2003).
  33. Springer, N.M. et al. Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet. 5, e1000734 (2009).
  34. Ding, J. et al. Highly asymmetric rice genomes. BMC Genomics 8, 154 (2007).
  35. Morgante, M. et al. Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat. Genet. 37, 9971002 (2005).
  36. Charlesworth, B. & Charlesworth, D. A model for the evolution of dioecy and gynodioecy. Am. Nat. 112, 975997 (1978).
  37. Bergero, R. & Charlesworth, D. The evolution of restricted recombination in sex chromosomes. Trends Ecol. Evol. (Personal edition) 24, 94102 (2009).
  38. Carvalho, A.B. & Clark, A.G. Intron size and natural selection. Nature 401, 344 (1999).
  39. Okazaki, N. et al. Novel factor highly conserved among eukaryotes controls sexual development in fission yeast. Mol. Cell. Biol. 18, 887895 (1998).
  40. Haas, M. et al. c-Myb protein interacts with Rcd-1, a component of the CCR4 transcription mediator complex. Biochemistry 43, 81528159 (2004).
  41. Daher, A. et al. Cell cycle arrest characterizes the transition from a bisexual floral bud to a unisexual flower in Phoenix dactylifera. Ann. Bot. (Lond.) 106, 255266 (2010).
  42. Yalovsky, S. et al. Prenylation of the floral transcription factor APETALA1 modulates its function. Plant Cell 12, 12571266 (2000).
  43. Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 7992 (2002).
  44. Ming, R. et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991996 (2008).
  45. Chain, P.S.G. et al. Genomics. Genome project standards in a new era of sequencing. Science 326, 236237 (2009).
  46. Pontaroli, A.C. et al. Gene content and distribution in the nuclear genome of Fragaria vesca. The Plant Genome 2, 93101 (2009).
  47. Lathrop, G.M. & Lalouel, J.M. Easy calculations of lod scores and genetic risks on small computers. Am. J. Hum. Genet. 36, 460465 (1984).

Download references

Author information

  1. These authors contributed equally to this work.

    • Eman K Al-Dous &
    • Binu George


  1. Genomics Core, Weill Cornell Medical College in Qatar, Doha, Qatar.

    • Eman K Al-Dous,
    • Binu George,
    • Maryam E Al-Mahmoud,
    • Moneera Y Al-Jaber,
    • Yasmeen M Salameh,
    • Eman K Al-Azwani &
    • Joel A Malek
  2. Department of Genetics, University of Georgia, Athens, Georgia, USA.

    • Hao Wang,
    • Srinivasa Chaluvadi,
    • Ana C Pontaroli,
    • Jeremy DeBarry &
    • Jeffrey L Bennetzen
  3. Laboratoire de Biogenèse Membranaire, CNRS UMR, Université V. Segalen Bordeaux, Bordeaux, France.

    • Vincent Arondel
  4. Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA.

    • John Ohlrogge
  5. Agricultural and Water Research, Ministry of Environment, Doha, Qatar.

    • Imad J Saie
  6. Biotechnology Centre, Ministry of Environment, Doha, Qatar.

    • Khaled M Suliman-Elmeer
  7. USDA-ARS National Clonal Germplasm Repository for Citrus & Dates, University of California, Riverside, California, USA.

    • Robert R Kruegger
  8. Department of Genetic Medicine, Weill Cornell Medical College in Qatar, Doha, Qatar.

    • Joel A Malek
  9. Present address: EEA Balcarce, Instituto Nacional de Tecnología Agropecuaria, Balcarce, Argentina.

    • Ana C Pontaroli


E.K.A.-D. extracted genomic DNA, created libraries, sequenced the genome and assisted with the manuscript writing. B.G. conducted SNP, CNV and annotation analysis. M.E.A.-M. genotyped gender-discriminating regions. E.K.A.-A. and Y.M.S. assisted in genome sequencing, conducted qPCR validation of CNVs and helped write the manuscript. M.Y.A.-J. cloned, sequenced and analyzed sequences from standard sequencing technology for comparison to the next generation data. I.J.S. and K.M.S.-E. maintained the tree tissue culture and cultivar data on the sequenced trees. H.W., S.C., A.C.P., J.D. and J.L.B. constructed the fosmid library, sequenced date palm fosmids, provided transposable element annotation and generated comparative analyses and genome size predictions. J.O. and V.A. constructed EST libraries and provided DNA sequence from ESTs. H.W. and J.L.B. also helped write the manuscript. R.R.K. maintained and provided the date palm genetic resource including the pedigree information and assisted in phenotyping of the date palms. J.A.M. conceived and planned the project, created libraries, analyzed for gender-specific regions, assembled and annotated the genome and wrote the manuscript.

Competing financial interests

J.A.M. has been named on a patent application by Weill Cornell Medical College in Qatar with regard to date palm gender-specific markers.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (807K)

    Supplementary Tables 1–4,6,7, Supplementary Methods, Supplementary Notes and Supplementary Figs. 1–5

Excel files

  1. Supplementary Table 5 (2M)

    ISCRs overlapping genes in 4 genomes compared to Khalas

Additional data