Introduction

Horticultural plants mostly comprise vegetable-producing, fruit-bearing, ornamental, and beverage-producing plants and herbal medicinal plants. These plants have played important economic and social roles in the human lives and health by providing basic food needs, beautifying urban and rural landscapes, and improving personal esthetics. For example, the Food and Agriculture Organization of the United Nations reported that, while worldwide cereal food together is valued at 125 points (normalized value), vegetables and fruits together are valued at 137 points (http://faostat.fao.org). Horticultural plants also contribute to ecological balance by improving our biological environment by providing oxygen and balancing urban temperatures.

Horticultural plants are distributed among a wide variety of taxonomic plant spectra, which include a large number of flowering plants and a few early-divergent land plants. The sizes of their genomes vary greatly. For example, the vegetable garlic (Allium sativum) has a diploid genome (2n = 16) with an estimated genome size of >30 Gb1, and onion (Allium cepa) has a similar genome size2. In addition, most horticultural plants are domesticated, and their genome sequences have experienced strong artificial selection. For example, grape was found to have been cultivated (via viticulture) for >6000 years3; citrus, >4000 years4. In addition, some horticultural plants are intermediates of domesticated and wild plants, such as medicinal plants including ginseng (Panax ginseng), noto ginseng (Panax notoginseng), and Artemisia (Artemisia annua). Many domesticated horticultural plants have high levels of genetic diversity and heterozygosity, such as sunflower (10% of bases differ between homologous chromosomes)5, grape (7%)6, and potato (4.8%)7.

De novo sequencing of horticultural plant genomes

As of December 31, 2018, the genomes of 181 horticultural species have been sequenced (Table 1). These include 4 beverage, 47 fruit, 44 medicinal, 44 ornamental, and 42 vegetable plants (Fig. 1a). In terms of taxonomic distribution, these plants include 175 angiosperms, 2 gymnosperms, 3 lycophytes, and 1 moss (Fig. 1b). As shown in Fig. 1c, the number of sequenced genomes of horticultural plants completed each year has significantly increased from 1 in 2007 to 40 in 2018. Although most of the horticultural plants are angiosperms, the genome sequencing of non-angiosperm species has also demonstrated steady growth (Fig. 1c). Vegetables and fruits have been a focus of plant research in the past few years. However, only two vegetables and seven fruits had their genomes sequenced in 2018 (Fig. 1d). This is probably because many economically important vegetables and fruits were already sequenced prior to 2018.

Table 1 List of genome-sequenced horticultural plant species and their close relatives
Fig. 1: Statistics of genome-sequenced horticultural plant species.
figure 1

a Distribution of genome-sequenced horticultural plants. b Botanical distribution of genome-sequenced horticultural plants. c Annual increase in the genome-sequenced horticultural plants by botanical taxonomy. d Annual increase in the genome-sequenced horticultural plants by horticultural category. e The reported 181 horticultural plant species fall into 30 angiosperm orders. f List of the released but not reported horticultural plant species

Some angiosperms have a significant role in the economy8. The 181 horticultural plants with sequenced genomes are distributed in 30 of the 64 angiosperm orders. Among these 30 orders, 7 (Fabales, Rosales, Cucurbitales, Brassicales, Sapindales, Solanales, and Laminales) have >10 species whose genomes have been sequenced (Fig. 1e), suggesting their vital importance to humans.

Most of the genome-sequenced plants fall into the Rosaceae family, which is a medium-sized family with approximately 4800 species (http://www.theplantlist.org), including many popular fruit-bearing and ornamental plants. The genome-decoded fruit-producing species include breadnut (Artocarpus camansi)9, ficus (Ficus carica)10, jujube (Ziziphus jujuba)11, strawberry and its close relatives (Fragaria × ananassa, Fragaria iinumae, Fragaria nipponica, Fragaria nubicola, Fragaria orientalis, Fragaria vesca)12,13,14, apple (Malus domestica)15, morus (Morus notabilis)16, sweet cherry (Prunus avium)17, peach (Prunus persica)18, Chinese pear (Pyrus bretschneideri)19, European pear (Pyrus communis)20, and black raspberry (Rubus occidentalis)21. The genome-decoded ornamentals include mei (Prunus mume)22, sakura (Prunus yedoensis)23, and rose and its close relatives (Rosa × damascene, Rosa chinensis, Rosa multiflora, and Rosa roxburghii)24,25,26. However, the genomes of many important fruit-bearing Rosales plants, such as Crataegus pinnatifida, Malus prunifolia, Eriobotrya japonica, Armeniaca vulgaris, and Prunus salicina, and of Rosales ornamentals, such as Photinia serrulata, Spiraea thunbergii, Cotoneaster multiflorus, and Rubus japonicas, have not yet been sequenced. The available genome sequences of Rosales species have largely improved our understanding of the biology of fruits and flowers. For example, the high-quality apple genome sequence showed that a single allele is responsible for red fruit peal coloration27, and the reference genome of rose has provided insights into the floral color and scent pathways25.

The Solanaceae family consists of ~2700 species (http://www.theplantlist.org) that include a number of vegetable, medicinal, and ornamental species. The genomes of several important Solanaceae vegetable species have been sequenced, such as tomato (Solanum lycopersicum, Solanum pimpinellifolium)28,29, potato (Solanum tuberosum)30, pepper (Capsicum annuum, Capsicum baccatum, Capsicum chinense)31,32,33, and eggplant (Solanum melongena)34. Solanaceae ornamentals include ivy morning glory (Ipomoea nil)35, ornamental tobacco (Nicotiana sylvestris)36, and petunia (Petunia axillaris, Petunia inflate)37. Although these genomes have helped to understand the evolution of Solanaceae plants, additional Solanaceae horticultural genomes need to be sequenced. These include the sequences of the medicinal plants Datura arborea, Datura metel, and Datura innoxia and the ornamentals Petunia spp., Nicotiana spp., Lycium spp., Solanum spp., Cestrum spp., Calibrachoa spp., and Solandra spp. These available genome sequences have helped to decipher the evolution and genomic basis of metabolites such as vitamin C (or ascorbic acid)38 in tomato and alkaloids in tobacoo39.

The Fabaceae family, consisting of ~19,000 known species, is the third largest angiosperm family by number of species richness, followed by the Orchidaceae and Asteraceae families. Although only dozens of Fabaceae genomes have been sequenced8, many of them are from horticultural species. The genome-decoded Fabaceae vegetable plants include pigeon pea (Cajanus cajan)40, chickpea and its relative (Cicer arietinum, Cicer reticulatum)41,42, soybean (Glycine max)43, barrelclover (Medicago truncatula)44, common bean (Phaseolus vulgaris)45, faba bean (Vicia faba)46, adzuki bean (Vigna angularis)47, and mung bean (Vigna radiata)48. The genome-sequenced Fabaceae ornamentals include eastern redbud (Cercis canadensis)49, narrowleaf lupin (Lupinus angustifolius)50, and mimosa (Mimosa pudica). The Fabaceae medicinal plants with sequenced genomes include Chinese uralensis (Glycyrrhiza uralensis)51 and red clover (Trifolium pratense)52. Legumes are considered a valuable source of food in the future53; thus the sequencing of their genomes would be valuable. Determining the genomic basis of legume–rhizobium interactions would help not only to solve a classic fundamental problem in biology but also to improve nitrogen utilization in horticultural plants.

The Brassicaceae family is a medium-sized family with ~4000 species, including many horticultural plant species. The Brassicaceae vegetable plants with sequenced genomes include Zhacai (Brassica juncea)54, cabbage (Brassica oleracea)55, napa cabbage (Brassica rapa)56, Capsella (Capsella bursa-pastoris and Capsella rubella)57,58, radish (Raphanus sativus)59, and field pennycress (Thlaspi arvense)60. The genomes of the Brassicaceae medicinal plants Eutrema yunnanense61 and maca (Lepidium meyenii)62 have also been sequenced. With these genome sequences at hand, the genomic features of common ancestors and the subsequent evolution of the Brassicaceae can be clarified, such as the intron evolution within the Brassicaceae63, and gene and genome duplication events within the Brassicaceae64,65. These genomes would also shed light on the evolution of the hypocotyl, as has been reported in maca62 and radish59. Within the Brassicaceae family, we could foresee a growing demand for the genome sequencing of horticultural Brassicaceae plants, both for evolutionary research and for decoding the molecular basis of economically important traits.

The Cucurbitaceae family includes >3700 species belonging to 134 genera (www.theplantlist.org). Within this family, the genome-decoded vegetable plants include silver-seed gourd (Cucurbita argyrosperma)66, winter squash (Cucurbita maxima)67, pumpkin (Cucurbita moschata)67, summer squash (Cucurbita pepo)68, bottle gourd (Lagenaria siceraria)69, and bitter melon (Momordica charantia)70. The genome-decoded fruit species include muskmelon (Cucumis melo)71 and watermelon (Citrullus lanatus)72. The only genome-decoded medicinal plant is monk fruit (Siraitia grosvenorii)73,74. Via analysis of these available genome sequences, it was found that a tetraploid-inducing event occurred in the last common ancestor of the Cucurbitaceae species75. These genome sequences can also help to better understand the domestication history76 and fruit development77. Increasing numbers of the wild relatives of these economically important crop species, as well as those of thousands of plant cultivars, will be sequenced in the near future, providing additional details and surprises.

The Rutaceae or citrus family consists of 158 genera and 6686 species (www.theplatlist.org). The Rutaceae fruit-bearing plants with sequenced genomes include clementine (Citrus clementina)78, pomelo (Citrus grandis)79, Ichang papeda (Citrus ichangensis)79, citrumelo (Citrus paradisi × Poncirus trifoliate)80, mandarin orange (Citrus reticulata)81, sweet orange (Citrus sinensis)82, and cold-hardy mandarin (Citrus unshiu)83. The Rutaceae medicinal plants with sequenced genomes include jiu bing le (Atalantia buxifolia)79 and citron (Citrus medica)79. Via analysis of these genome sequences, the evolutionary origin and evolutionary changes in the Citrus genus during domestication were mapped84. In the future, the genome sequences of Rutaceae fruit-bearing plants including lemon (Citrus limon), calamansi (Citrofortunella microcarpa), lime (Citrus spp. hybrids), kumquat (Citrus japonica), and grapefruit (Citrus × paradisi) will require genome sequencing.

Genome resequencing and the pan-genome of horticultural plants

A single reference genome sequence is not sufficient for identifying the best candidate genes for molecular breeding or for understanding the genomic background of a population due to the prevalence of genomic structural variations. Compared to the construction of a reference genome, genome resequencing usually requires less sequencing coverage. It is feasible to obtain a high-quality resequenced genome via mapping to a reference genome. A pan-genome is the summary of genomes of a species obtained by comparing a large number of resequenced genomes of a species or, occasionally, a genus. A pan-genome can help to understand the size of a core genome (defined as the conserved part among the related genomes), the size of a pan-genome, and the amount and nature of variations within a species or a genus, which improve our understanding of the evolution of a species/genu, as well as of agronomic traits. Currently, a growing number of pan-genomes among horticultural plants have been constructed (Table 2).

Table 2 Pan-genome information of horticultural plants

Soybean is an economically important vegetable crop; in addition to being a source of human protein, it is an important source of vegetable oil. Glycine soja is the closest wild relative to cultivated soybean (Glycine max). The G. soja pan-genome was the first horticultural pan-genome released, which occurred in 2014 and consisted of seven wild accessions85 (Table 2). This pan-genome revealed that, when more genomes were added, the number of shared genes decreased, and in contrast, the number of total genes increased when more genomes were added. In addition, this pan-genome confirmed that a single reference genome does not adequately represent the genomic and genetic diversity of a species. Because the reference genome of G. soja was not previously available, those researchers assembled all seven genomes with the de novo assembly method, but this method was not adopted by subsequent researchers.

Assembly of the B. oleracea pan-genome86 is another early trial in the genomic research of horticultural plants (Table 2). It is relatively small, created using nine morphologically diverse varieties (covering two cabbage, one broccoli, one brussels sprout, one kohlrabi, two cauliflowers, and one kale plant) and a wild relative, Brassica macrocarpa. Through the analyses of this pan-genome, we observed that 20% of genes are absent in some cultivar(s), and there are presence–absence variations (PAVs), including those related to major agronomic traits. This is a pioneering study that provided assembled pan-genome contigs, pan-genome annotations, and the GBrowse tool, available at http://brassicagenome.net.

Pepper plants are important vegetable plants with distinct fruit morphologies. The pepper pan-genome has been generated for the pepper genus Capsicum87. This pan-genome consists of 5 species and 383 cultivars, all of which have 15 chromosomes. In addition to the comparison of PAVs among this large amount of pepper cultivars, the pan-genome is also useful in linking the association between important agronomic traits and corresponding genes. These valuable pan-genome data and JBrowse and other search tools are available (www.pepperpan.org:8012).

Sunflower plants provide seed that can be used for cooking oil and serve as popular ornamentals. The sunflower pan-genome was created by sequencing 493 accessions, including cultivars, landraces, and wild relatives5. A total of 61,205 genes have been identified within the gene set of the sunflower pan-genome. Via the aid of this pan-genome, the understanding of the evolutionary history of sunflower species has significantly improved, and genes linked to biotic stress resistance have been identified5. Although pan-genome data can be found in the sunflower genome database (www.sunflowergenome.org), no publicly accessible tool has been built to date (accessed March 31, 2019).

Reference genome sequences are necessary to identify genes and to understand evolutionary trajectory. However, a pan-genome can help to uncover additional details. For example, relying on the tomato genome sequence, researchers mapped only several genes and pathways controlling fruit ripening28. These flesh- and flavor-related genes are the best targets in breeding. Moreover, genome sequences allow comprehensive and systematic analyses of fruit biology. Furthermore, via the sequencing of a tomato population and analysis of its pan-genome consisting of 725 accessions, the genes selected during domestication and quality improvement were identified88. Thus a pan-genome not only improves our understanding of crop evolution but also is useful for the discovery of novel genes and breeding.

Data storage and visualization

In addition to comprehensive plant-centric databases such as Phytozome (https://phytozome.jgi.doe.gov) and EnsemblPlants (http://plants.ensembl.org), 27 horticultural plant-specific genome databases have been constructed (Table 3). Among these, 22 provide data for downloading. Some databases are freely accessible to all users, while others provide only limited access to specific data or users. For example, the Genome Database for Rosaceae89 requires user registration and a login to access the breeding data.

Table 3 List of horticultural plant-centric genome databases

Visualization of genomic data of horticultural plants is challenging due to the heterogeneous nature of the different types of data. GBrowse90 and JBrowse91,92,93 are powerful tools that provide a visualization of various levels of genomic features. The availability of genomic analysis tools also varies greatly among databases. BLAST-related tools such as NCBI-BLAST94 and viroBLAST95 are provided by some databases for homologous sequence searches and sequence comparisons. Gene query tools can help to obtain details of genes such as their sequence, annotation, and expression. HMMER96 searches allow the inference and extraction of gene families from genomes in the database. Syntenic tools allow the identification and visualization of genome-wide syntenic relationships across genomes. The BioCyc tools (https://biocyc.org) allow users to navigate individual pathways or the whole metabolic map of a genome for functional analyses97.

The Genome Database for Rosaceae (GDR), which was developed by the main bioinformatics laboratory at Washington State University89, is well known among the Rosaceae research community and even the plant research community. It covers the genome sequences of 18 Rosaceae species (Fragaria vesca, F. ananassa, F. iinumae, F. nipponica, F. nubicola, F. orientalis, Malus domestica, Potentilla micrantha, Prunus avium, Prunus domestica, Prunus dulcis, Prunus persica, Prunus yedoensis, Pyrus bretschneideri, Pyrus communis, Rosa chinensis, Rosa multiflora, and Rubus occidentalis), which are categorized into seven genera: Fragaria, Malus, Potentilla, Prunus, Pyrus, Rosa, and Rubus. To facilitate online analyses, a series of tools are provided, including genomic tools (BLAST+, JBrowse, Primer3, Sequence Retrieval, MapViewer, Synteny Viewer), metabolomic tools (GDRcyc, Pathway Inspector), and breeding tools (Breeding information Management System (BMS), Breeders Toolbox). The same team at Washington State University also developed a series of horticultural plant-themed databases, including the Citrus Genome Database, Cool-Season Food Legume Crop Database resources, and Genome Database for Vaccinium (GRIN). All these databases share a similar data process standard and have built-in bioinformatics tools.

The Sol Genomics Network (SGN)98, a database of Solanaceae genomic and phenotypic data and tools, was developed by Mueller’s team from the Boyce Thompson Institute for Plant Research and Cornell University. The SGN includes 11 genomes: those of Solanum lycopersicum, S. lycopersicoides, S. pimpinellifolium, S. tuberosum, S. pennellii, Capsicum annuum, Nicotiana attenuata, N. benthamiana, N. tabacum, Petunia axillaris, and P. inflata. These species are categorized into four economically important genera: Solanum, Capsicum, Nicotiana, and Petunia. For online analyses of genomic sequences, BLAST, Alignment Analyzer, Tree Browser, and VIGS tools are available. For mapping of various data, JBrowse, Comparative Map Viewer, CAPS Designer, and solQTL are provided. Some tools have been developed for common molecular wet laboratory experiments, including In-Silico PCR, the Tomato Expression Atlas, and the Tomato Expression Database. Systems biology tools such as SolCyc Biochemical Pathways99, Coffee Interactome Data, and the SGN Ontology Browser are provided. The Breeders Toolbox was developed for breeders. The same team also developed a series of horticultural plant-themed databases, including the YamBase (https://yambase.org), CassavaBase (https://cassavabase.org), and MusaBase (https://musabase.org) databases. All these databases adhere to the release of genomic data before publication (the Toronto Agreement)100.

The Cucurbit Genomics Database (CuGenDB)101 currently hosts eight high-quality genome sequences corresponding to those of cucumber (Cucumis sativus), water melon (Citrullus lanatus), winter squash (Cucurbita maxima), pumpkin (Cucurbita moschata), summer squash (Cucurbita pepo), muskmelon (Cucumis melo), bottle gourd (Lagenaria siceraria), and silver-seed gourd (Cucurbita argyrosperma). The search and batch query system allow searching for sequences and annotations. To display genomic details, the JBrowse, BLAST, Gene Ontology (GO), Synteny Viewer, CAMP, and expression viewer tools are available. To display metabolic pathways, CucurbitCyc and Pathway enrichment tools are available.

The Brassica Database (BARD)102, a database of important Brassica species, covers the vegetable species Brassica rapa and B. oleracea, as well as the model plant Arabidopsis and Brassicaceae close relatives. In addition to its genomic data, the BRAD database hosts a curated list of genes involved with anthocyanins, resistance, auxin, flowering, and glucosinolates and a full list of gene families that are of considerable importance in Brassica research. BLAST and JBrowse tools were built for visualization of genomic data, and syntenic tools are useful for comparative analyses.

The Herbal Medicine Omics Database103 includes genomic, transcriptomic, pathway, and metabolomic data for medicinal plants, although the medicinal properties of some plants are recognized only in some parts of the world. In this database, hundreds of medicinal plants are included. However, the database currently provides only the BLAST and GBrowse tools for the visualization of omics data. Other collected omic data can be downloaded but cannot be analyzed or visualized online.

There are other tool-specific databases that can be very useful for the visualization and online analyses of horticultural plant genome sequences. The Plant Genome Duplication Database (PGDD)104 offers online analyses of gene synteny and visualization of different results, such as dot plots (macrosynteny) and local genomic comparison plots (microsynteny). The built-in Map-View tool allows mapping of a given sequence to the genomes of 47 species from the PGDD (data accessed on March 31, 2019). The Plant Duplicate Gene Database105 is a collection of 141 plant species and offers online analysis and visualization of duplicated genes in select species.

Discussions and future perspectives

The horticultural plant genome project

It is challenging to determine the exact number of species or cultivars that exist for horticultural plants. In terms of fruit-bearing plants, at least 91 species are economically important and produce fruit that are consumed (https://simple.wikipedia.org/wiki/List_of_fruits). More than 200 vegetable plants are consumed (https://simple.wikipedia.org/wiki/List_of_vegetables). The exact number of ornamentals is also unclear, as novel cultivars are produced each year. However, it has been estimated that there are >6000 ornamental cultivars (https://www.rhs.org.uk/plants/pdfs/agm-lists/agm-ornamentals-(1).pdf), and many cultivars are created and disappear each year. Up to December 2018, genome sequences had been decoded for only 181 species, accounting for only a small proportion of the total horticultural plant species. Hence, there is a strong need to sequence additional genomes for more horticultural plants that would be valuable for comparative genomics, to better understand their evolutionary history, and to possibly make genetic modifications to better utilize these plant species.

Here we propose a horticultural plant genome project (HPGP) with three goals (Fig. 2). The first goal of the HPGP is to generate reference genome sequences for all horticultural plants, after which pan-genomes and core collections would be generated as genetic banks for horticultural plants. Two recently developed genome assembly methods could be applied to decode highly ploidy71 and highly heterozygous106,107,108 horticultural genomes. The second goal is to identify the various genomic variations within a pan-genome. In addition, the mechanistic signatures leading to the variations would be explored. The third goal is to link the phenotypes to the genomic regions. Two methods would be applied: quantitative trait locus methods to correlate genomic variations with a quantitative trait and genome-wide association study methods to associate genomic variation with many genomic variations from different individuals109,110. The good news is that the Earth Genome Project and the 1000-Plant Genome Project will accelerate the genome sequencing process of horticultural plants.

Fig. 2: The proposed roadmap to the horticultural plant genome project (HPGP).
figure 2

The first goal of HPGP is to generate all reference genome sequences for horticultural plants, after which pan-genomes and core collections will be generated as a gene bank for horticultural plants. Two recently developed methods would be applied to decode the highly ploidy and highly heterozygous horticultural genomes. The second goal is to detect the various genomic variations within a pan-genome. In addition, the mechanistic signatures leading to the variations would be explored. The third goal is to link the phenotypes with the genomic regions. Two methods would be applied: the quantitative trait locus (QTL) method to correlate genomic variations with a quantitative trait and the genome-wide association study (GWAS) method to associate genomic variation with many genomic variations from different individuals ***p < 0.001

The timeline for obtaining the genome sequences of all horticultural plants at both draft and reference scales (goal one of the HPGP) would be short—within 3–5 years—because the cost for sequencing is dropping rapidly. However, collecting and sequencing the population definitely requires worldwide collaborations and would take >10 years. The second goal is to analyze the genomic variations to identify the mechanistic signatures within a population, which is also time consuming and would be gradually achieved. The third goal is an advanced step that occurs after or concurrently with the second goal. Although these last two goals appear to be enormous challenges, we are confident in the ability to achieve most of these two goals in model horticultural plants such as the tomato, cucumber, and strawberry in the coming years.

In addition, the quality of assembly and annotation of existing reference genomes of horticultural plants need to be further improved. Although a few tools such as BUSCO111 and CEGMA112 have been widely used to evaluate the quality of genome annotations, a good standard is still not available for the systematic evaluation of the quality of genome assemblies. As a result, the quality of the genome assemblies is very uneven and is sometimes related to the complexity or heterozygosity of the taxa. This situation is changing as sequencing platforms are being upgraded. For example, since the first apple genome sequence was released in 2010 based on next-generation sequencing technology15, an improved version produced by next-generation sequencing (NGS) and PacBio technologies was released in 2016113. The third improved version of the apple genome, which was obtained using a combination of NGS, PacBio, and Bionano technologies, was released in 2017114. The fourth improved version was released in 2019, based on the utilization of NGS, PacBio, and Hi-C technologies27. In the future, the quality of the reference genome should reach certain minimal standards upon which the community can agree, similar to the proposal for bacteria and archaea115, thereby leading to more accurate pan-genome analyses and biotechnology.

Storage and access of genomic data constitute another problem concerning horticultural biologists and bioinformatics scientists. For access to genome sequences and raw sequencing data, a number of public databases are usually the first choice of researchers due to the nature of their stability, low cost, and ease of access. The well-known public databases include the NCBI (https://ncbi.nlm.nih.gov), EMBL (www.embl.org), CNGB (www.cngb.org), BIGD (bigd.big.ac.cn), DDBJ (www.ddbj.nig.ac.jp), GigaDB (gigadb.org), Dryad (www.datadryad.org), and Phytozome (https://phytozome.jgi.doe.gov) databases. To share these data with worldwide researchers, we encourage the release of data before publication, as was suggested by the Toronto Agreement in 2009100.

The need for a horticultural plant-centric database

Unlike agricultural plants, horticultural plants share multiple features. For example, plant growth requires controlled conditions with specific equipment or facilities; plants generally need grafting, postharvest treatment, and a long juvenile phase; and plants usually undergo asexual reproduction and have unique specialized metabolism. All of these concerns make it hard to study these traits in model plants or via regular tools. Uniting the various omic data and the development of novel tools for horticultural plants are needed. Moreover, aside from the comprehensive plant databases and the 27 horticultural plant-specific databases mentioned above, there is still an increasing need to find and compare an increased amount of data for horticultural plants. However, horticultural biologists usually need to frequently deal with breeders; thus the need to create a comprehensive horticultural database to meet the interests of basic biologists and breeders is largely required. Such a database should cover as many horticultural plant genomes as possible and should provide an integrated set of bioinformatics tools. We believe that, in the future, the need for such a comprehensive database of all horticultural plants will satisfy additional horticulture researchers and breeders.

Given the advancement of sequencing technologies and reduced costs, the genome sequencing data of horticultural plants are accumulating rapidly. The storage, analyses, and sharing of large collections of genome sequencing data are becoming even more laborious and time consuming. The integrative analysis of various omic data, such as genomic, transcriptomic, metabolomic, phenomic, and breeding data, have become a major challenge for many horticultural biologists and requires coordinated efforts of scientists from different fields. For data processing and visualization, we recommend using BioMart tools, which could be easily built into a database. For database construction, we suggest following the template of the Tripal series (www.tripal.infor)8. Finally, we believe that, with a fostered collaboration of the horticultural community, the HPGP and subsequent knowledge and experiences will greatly benefit biology researchers and breeders.