The genome of Chenopodium quinoa

Jarvis, David E.; Ho, Yung Shwen; Lightfoot, Damien J.; Schmöckel, Sandra M.; Li, Bo; Borm, Theo J. A.; Ohyanagi, Hajime; Mineta, Katsuhiko; Michell, Craig T.; Saber, Noha; Kharbatia, Najeh M.; Rupper, Ryan R.; Sharp, Aaron R.; Dally, Nadine; Boughton, Berin A.; Woo, Yong H.; Gao, Ge; Schijlen, Elio G. W. M.; Guo, Xiujie; Momin, Afaque A.; Negrão, Sónia; Al-Babili, Salim; Gehring, Christoph; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T.; Gojobori, Takashi; Linden, C. Gerard van der; van Loo, Eibertus N.; Jellen, Eric N.; Maughan, Peter J.; Tester, Mark

doi:10.1038/nature21370

Download PDF

Article
Open access
Published: 08 February 2017

The genome of Chenopodium quinoa

David E. Jarvis¹^na1,
Yung Shwen Ho¹^na1,
Damien J. Lightfoot¹^na1,
Sandra M. Schmöckel¹^na1,
Bo Li¹^na1,
Theo J. A. Borm²,
Hajime Ohyanagi³,
Katsuhiko Mineta⁴,
Craig T. Michell⁵,
Noha Saber¹,
Najeh M. Kharbatia⁶,
Ryan R. Rupper⁷,
Aaron R. Sharp⁷,
Nadine Dally⁸,
Berin A. Boughton⁹,
Yong H. Woo¹,
Ge Gao¹,
Elio G. W. M. Schijlen¹⁰,
Xiujie Guo¹,
Afaque A. Momin³,
Sónia Negrão¹,
Salim Al-Babili¹,
Christoph Gehring¹,
Ute Roessner⁹,
Christian Jung⁸,
Kevin Murphy^na1,
Stefan T. Arold³,
Takashi Gojobori³,
C. Gerard van der Linden²,
Eibertus N. van Loo²,
Eric N. Jellen⁷,
Peter J. Maughan⁷ &
…
Mark Tester¹

Nature volume 542, pages 307–312 (2017)Cite this article

115k Accesses
466 Citations
762 Altmetric
Metrics details

Subjects

A Corrigendum to this article was published on 10 May 2017

This article has been updated

Abstract

Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and genetic maps. We also report the sequencing of two diploids from the ancestral gene pools of quinoa, which enables the identification of sub-genomes in quinoa, and reduced-coverage genome sequences for 22 other samples of the allotetraploid goosefoot complex. The genome sequence facilitated the identification of the transcription factor likely to control the production of anti-nutritional triterpenoid saponins found in quinoa seeds, including a mutation that appears to cause alternative splicing and a premature stop codon in sweet quinoa strains. These genomic resources are an important first step towards the genetic improvement of quinoa.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Main

Quinoa (Chenopodium quinoa Willd., 2n = 4x = 36) is a highly nutritious crop that is adapted to thrive in a wide range of agroecosystems. It was presumably first domesticated more than 7,000 years ago by pre-Columbian cultures and was known as the ‘mother grain’ of the Incan Empire¹. Quinoa has adapted to the high plains of the Andean Altiplano (>3,500 m above sea level), where it has developed tolerance to several abiotic stresses^2,3,4. Quinoa has gained international attention because of the nutritional value of its seeds, which are gluten-free, have a low glycaemic index⁵, and contain an excellent balance of essential amino acids, fibre, lipids, carbohydrates, vitamins, and minerals⁶. Quinoa has the potential to provide a highly nutritious food source that can be grown on marginal lands not currently suitable for other major crops. This potential was recognized when the United Nations declared 2013 as the International Year of Quinoa, this being one of only three times a plant has received such a designation.

Despite its agronomic potential, quinoa is still an underutilized crop⁷, with relatively few active breeding programs⁸. Breeding efforts to improve the crop for important agronomic traits are needed to expand quinoa production worldwide. To accelerate the improvement of quinoa, we present here the allotetraploid quinoa genome. We demonstrate the utility of the genome sequence by identifying a gene that probably regulates the presence of seed triterpenoid saponin content. Moreover, we sequenced the genomes of additional diploid and tetraploid Chenopodium species to characterize genetic diversity within the primary germplasm pool for quinoa and to understand sub-genome evolution in quinoa. Together, these resources provide the foundation for accelerating the genetic improvement of the crop, with the objective of enhancing global food security for a growing world population.

Sequencing, assembly and annotation

We sequenced and assembled the genome of the coastal Chilean quinoa accession PI 614886 (BioSample accession code SAMN04338310) using single-molecule real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) and optical and chromosome-contact maps from BioNano Genomics⁹ and Dovetail Genomics¹⁰. The assembly contains 3,486 scaffolds, with a scaffold N50 of 3.84 Mb and 90% of the assembled genome contained in 439 scaffolds (Table 1). The total assembly size of 1.39 gigabases (Gb) is similar to the reported size estimates of the quinoa genome (1.45–1.50 Gb (refs 11,12)). To combine scaffolds into pseudomolecules, an existing linkage map from quinoa¹³ was integrated with two new linkage maps. The resulting map (Extended Data Fig. 1) of 6,403 unique markers spans a total length of 2,034 centimorgans (cM) and consists of 18 linkage groups (Supplementary Table 7), corresponding to the haploid chromosome number of quinoa. Pseudomolecules (hereafter referred to as chromosomes, which are numbered according to a previously published single-nucleotide polymorphism (SNP) linkage map¹³) were created by anchoring 565 scaffolds to the linkage map, representing 1.18 Gb (85%) of the total assembly length (Table 1, Supplementary Data 1, Supplementary Data 2). This assembly represents a substantial improvement over the previously published quinoa draft genome sequence, which contained more than 24,000 scaffolds with 25% missing data¹⁴.

Table 1 Assembly statistics for quinoa, C. pallidicaule and C. suecicum

Full size table

Predicted protein-coding and microRNA genes (Supplementary Table 4) were annotated using a combination of ab initio prediction and transcript evidence gathered from RNA sequenced from multiple tissues using both RNA-seq and PacBio isoform sequencing (Iso-Seq) approaches (Extended Data Fig. 2a). The annotation contains 44,776 gene models (Supplementary Table 2, Extended Data Fig. 2b), which is in line with sequenced tetraploid species¹⁵, and includes 33,365 genes with annotation edit distance (AED)^16,17 values ≤ 0.3 (Extended Data Fig. 2c). Of the genome, 64% was found to be repetitive, including a large proportion of long terminal repeat (LTR) transposable elements (Supplementary Table 1). A majority (97.3%) of the 956 genes in the Plantae BUSCO dataset¹⁸ were identified in the annotation (Supplementary Table 3), which is suggestive of a complete assembly and annotation. The utility of the assembly, linkage maps, and annotation was demonstrated by mapping the betalain locus and identifying candidate genes underlying stem pigmentation (Supplementary Information 7.1.6), which is often used as a morphological marker in breeding programs.

Evolutionary history of quinoa

Quinoa resulted from the hybridization of ancestral A- and B-genome diploid species¹⁹. Single-gene sequencing studies previously identified pools of North American and Eurasian diploids as candidate sources of the A and B sub-genomes, respectively^20,21,22, with hybridization occurring somewhere in North America. To understand genome structure and evolution in quinoa further, we sequenced, assembled, and annotated the A-genome diploid C. pallidicaule (commonly called cañahua or kañiwa) and the B-genome diploid C. suecicum²¹ (Fig. 1a, Table 1). A high proportion of orthologous gene pairs in quinoa showed similar rates of synonymous substitutions per synonymous site (K_s), indicative of a whole-genome duplication event (Fig. 1b). This probably represents the hybridization of ancestral diploid species, because a similar peak was not observed in C. pallidicaule or C. suecicum (Fig. 1b). Using mutation rates calculated for Arabidopsis thaliana²³ and for core eukaryotes²⁴, we estimate the tetraploidization to have occurred 3.3–6.3 million years ago.

Multiple interfertile tetraploid species have arisen from the ancestral tetraploid following hybridization, including C. berlandieri and C. hircinum, although the evolutionary relationships among quinoa and its diploid and tetraploid relatives remain unclear²⁵. To begin to resolve these issues, we re-sequenced 15 additional quinoa samples representing the two major recognized groups of quinoa: highland and coastal (Supplementary Data 5). We also sequenced five accessions of C. berlandieri and one accession each of C. hircinum from the Pacific and Atlantic Andean watersheds (Supplementary Data 5). Phylogenetic analysis of these taxa indicates that North American C. berlandieri is the basal member of the species complex (Fig. 1c). Quinoa was thought to have been domesticated from C. hircinum in a single event, from which coastal quinoa was later derived (Fig. 1d, arrow 1); however, our sequencing data place a C. hircinum sample basal to coastal ecotypes (Fig. 1c), suggesting the possibility that quinoa was domesticated independently in highland and coastal environments (Fig. 1d, arrows 2a and 2b, respectively). Future analyses with deeper sampling of quinoa and C. hircinum will help clarify the relationship between C. hircinum and quinoa, as well as provide germplasm for breeding broadly adapted coastal quinoa cultivars for warm-season production. The SNPs identified between these accessions and the reference quinoa genome—a total of 7,809,381 (Extended Data Fig. 3, Supplementary Table 5), including 2,668,694 that are specific to quinoa—will be useful in assessing genetic diversity and identifying genomic regions associated with desirable traits.

Analysis of sub-genome structure

By mapping sequencing reads from C. pallidicaule and C. suecicum onto the quinoa scaffold assembly, and by performing BLASTN searches of each diploid against the quinoa assembly, 156 and 410 quinoa scaffolds (totalling 202.6 and 646.3 Mb) were assigned to the A and B sub-genomes, respectively (Fig. 2a, Supplementary Data 6). A mini-satellite repeat (18-24J) previously shown to be more abundant in the B sub-genome²⁶ is over-represented in scaffolds assigned to the B sub-genome (Supplementary Data 6). Nine chromosomes were assigned to each sub-genome (chromosomes hereafter designated as CqA or CqB, followed by the chromosome number), with the B sub-genome accounting for a larger percentage of both the genetic (1,087 cM) and physical (660 Mb) sizes than the A sub-genome (946 cM, 524 Mb). This result was not unexpected, given the differences in the estimated genome sizes of C. suecicum (815 Mb) and C. pallidicaule (452 Mb) based on k-mer analyses.

**Figure 2: Identification and characterization of quinoa sub-genomes.**

Visualization of the chromosomal locations of 5,807 homoeologous gene pairs revealed a high degree of synteny between the A and B sub-genomes (Fig. 2b). A small number of homoeologous pairs (3.1%) mapped within the same sub-genome, suggesting that recombination and chromosomal rearrangements have occurred between the A and B sub-genomes. For example, we identified homoeologous A and B sub-genome regions located in the B sub-genome chromosomes CqB05 and CqB03. The genes in the region of ~70–72 Mb of CqB03 are phylogenetically more similar to the A-genome diploid C. suecicum and therefore probably originated from the A sub-genome chromosome CqA12 (Fig. 2c). We also found evidence of large chromosomal rearrangements within sub-genomes, complicating the assignment of homoeologous chromosome pairs. For example, orthologue analysis clearly identifies CqB05 and CqA12 as homoeologous, although the same analysis is much more complicated with other chromosomes (Fig. 2b). To clarify these relationships, we identified syntenic regions between chromosomes of the diploid Beta vulgaris²⁷ (n = 9) and the A and B sub-genome chromosomes of quinoa (Fig. 2d). These results indicate that CqA02 and CqA04 are orthologous to B. vulgaris chromosomes 8 and 2 (Bvchr8 and Bvchr2), respectively, whereas CqB01 appears to be the result of a chromosome fusion. Likewise, CqA07 appears to be the result of a fusion between ancestral chromosomes orthologous to Bvchr3 and Bvchr7.

Analysis of sub-genome content

We used OrthoMCL²⁸ to identify clusters of orthologous genes in related species in Amaranthaceae (Extended Data Fig. 4), and specifically investigated the retention and loss of orthologous genes in quinoa and the three diploid species C. pallidicaule, C. suecicum and B. vulgaris (Fig. 3a, Supplementary Table 6). Using sets of genes for which only one copy could be found in quinoa and in each of the diploid species, we found a similar number (1,031 and 849) of genes lost from the A and B sub-genomes, respectively (Fig. 3b). The previously discussed set of 5,807 homoeologous gene pairs in quinoa are present as single copies in each of the diploids (Figs 2b, 3b) and therefore represent a core set of single-copy genes retained in each genome and sub-genome. We also investigated genes retained in multiple copies. We found that quinoa, like C. rubrum²⁹ and B. vulgaris³⁰, contains two genes that are orthologous to the A. thaliana gene FT, which regulates flowering time. Quinoa contains two homoeologous copies of FT2 and three of FT1, owing to a local tandem duplication (Fig. 3c, Extended Data Fig. 5). FT is known to promote flowering in A. thaliana, and functional orthologues have been found in other species³¹; however, B. vulgaris was found to contain a second FT gene that acts antagonistically by repressing flowering before vernalization. Future functional studies will help determine the function of these duplicated genes in quinoa, which, unlike B. vulgaris, does not require vernalization.

**Figure 3: Sub-genome gene loss and retention.**

Mechanisms underlying saponin production

Quinoa seeds contain a mixture of triterpene glycosides called saponins³². Although saponins may be beneficial for plant growth (for example, by deterring herbivory^33,34), they must be removed before human consumption as they are haemolytic and produce a bitter flavour. Because this process is costly, is often water-intensive, and can reduce the nutritional value of the seeds³⁵, the development of saponin-free lines is a major breeding objective in quinoa⁸. We found that saponins accumulate in the seed pericarp (Fig. 4a, Extended Data Fig. 6) between 20 and 24 days after anthesis (Fig. 4b), eventually accounting for 4% (w/w) of the mature seed mass (Supplementary Information 8.1). We identified and annotated 43 different saponins in the seeds of the reference sample (Supplementary Table 9), adding to the almost 100 different saponins that have been previously identified in different samples of quinoa^32,36.

**Figure 4: Candidate gene underlying saponin production.**

Naturally occurring sweet quinoa strains that contain very low levels of saponins are present within the quinoa germplasm³⁷, although the underlying genes regulating the absence of saponins in these lines are unknown. To identify these genes, we performed linkage mapping and bulk segregant analysis (BSA) using two populations segregating for the presence of saponins in the seeds: Kurmi (sweet) × 0654 (bitter), and Atlas (sweet) × Carina Red (bitter). Consistent with reports from other populations³⁸, segregation ratios in these populations indicate that the presence of seed saponins is controlled by a single gene, with the presence of saponins being dominant (71 bitter and 21 sweet in Kurmi × 0654; 567 bitter and 175 sweet in Atlas × Carina Red). We note that quantitative and qualitative differences exist in the saponins identified in bitter lines (Extended Data Fig. 7, Supplementary Table 8) and that the presence and absence of saponins was correlated with differences in seed coat thickness, with bitter lines having significantly thicker seed coats than sweet lines (Extended Data Fig. 8).

Linkage mapping and BSA in each population identified the same region on CqB16 that distinguishes the bitter and sweet lines (Fig. 4c). Frequencies of the sweet allele in both populations reached 100% for markers located in CqB16 on scaffold 3489. We investigated the genes in a 700-kb window surrounding this region of 100% sweet allele frequency. Of the 54 annotated genes in this region (Supplementary Data 7), two are similar to genes previously shown to play a role in saponin biosynthesis. Specifically, AUR62017204 and AUR62017206 are neighbouring genes annotated as basic helix–loop–helix (bHLH) transcription factors sharing homology (Extended Data Fig. 9a) with the class IVa bHLH genes that are known to regulate triterpenoid biosynthesis in Medicago trancatula³⁹. In M. truncatula, overexpression of the triterpene saponin biosynthesis activating regulator 1 (TSAR1) and TSAR2 bHLH transcription factors was recently shown to increase the expression of genes in the triterpenoid biosynthetic pathway, resulting in increased accumulation of triterpene saponins³⁹. TSAR1 and TSAR2 were also found to bind to the DNA motif 5′-CACGHG-3′ (where H can be A, C, or T)³⁹. In quinoa, we found that AUR62017206 (hereafter TSAR-like 2, TSARL2) was expressed in root tissue but not in flowers or immature seeds, whereas AUR62017204 (hereafter TSARL1) was almost exclusively expressed in seeds, with significantly lower expression levels in sweet lines (Supplementary Data 8). We identified the DNA motif bound by M. truncatula TSAR1 and TSAR2 within 2 kb upstream of the start codon in several saponin biosynthetic pathway genes in quinoa (Fig. 4d). Expression levels of these genes and several other genes in the saponin biosynthetic pathway were significantly downregulated in sweet lines (Fig. 4d, Supplementary Data 8). Together, these results suggest that TSARL1 might be a functional TSAR orthologue, although whether this is due to shared ancestry or convergent evolution is unclear.

The TSARL1 transcript was alternatively spliced in the sweet progeny of Kurmi and 0654. A SNP in the last position of exon 3 (G2078C) co-segregates with the presence of saponins in the Kurmi × 0654 progeny. The G2078C SNP alters the canonical intron/exon splice boundary (Fig. 4e), probably leading to the alternative splicing at an upstream cryptic splice site in the sweet lines (Fig. 4e). This alternative splicing of TSARL1 results in a premature stop codon (Extended Data Fig. 9b) and a truncated protein that modelling predicts to be compromised in its ability to form homodimers and/or to bind DNA (Extended Data Fig. 9b–d, Supplementary Information 8.8), which are both necessary for regulation of transcription. All bitter strains in our re-sequencing pool share the same allele (G) found in the bitter progeny of Kurmi and 0654, whereas all sweet strains (Chucapaca, G205-95DK, Salcedo INIA, as well as the mapping population parents Kurmi and Atlas) but one (Pasankalla) contain the same allele (G2078C) as the sweet progeny. Notably, however, although the G2078C allele is present in the sequenced Atlas line, none of the sweet progeny in the Atlas × Carina Red population were found to have the G2078C allele. Additional sequencing of individual plants of the Atlas variety revealed a low level of heterogeneity within the variety for the TSARL1 gene, with some plants containing the G2078C allele and others containing sequence insertions (Supplementary Information 7.2.4). Thus, it is likely that the Atlas plant used in the cross with Carina Red—which, importantly, was not the same plant used for sequencing—possessed a mutation other than the G2078C allele. Indeed, we found strong evidence of insertions in and around the TSARL1 gene in all the sweet progeny of the Atlas × Carina Red population (Fig. 4e, Extended Data Fig. 10). In particular, two exonic insertions in TSARL1 in the sweet progeny probably inactivate the gene and result in a sweet phenotype. The identification of multiple, independent mutations in TSARL1 that co-segregate with the sweet phenotype strongly suggests that this gene regulates the presence and absence of saponins in quinoa seeds. The early steps in the saponin biosynthetic pathway that are presumably regulated by TSARL1 are shared by other pathways, including sterol biosynthesis. Inactivation of these other pathways could be detrimental to normal plant growth, although we see no noticeable phenotypic differences between bitter and sweet siblings. It is possible that precursors needed for sterol biosynthesis are provided by the methylerythritol 4-phosphate (MEP) pathway in the plastid^40,41. Future functional studies will help clarify these issues.

Conclusions

As an emerging international crop, quinoa has great potential to enhance global food security. The high-quality reference genome assembly presented here will accelerate improvements of quinoa. Major breeding objectives for quinoa improvement include the development of shorter plants with fewer branches and more compact seed heads, increased heat and biotic stress tolerance, and the introgression of the sweet phenotype into commercial varieties. The identification of the likely causative mutation underlying the sweet phenotype not only provides insights into triterpenoid saponin biosynthesis, but also enables accelerated breeding of sweet commercial varieties using marker-assisted selection. The diversity present in the primary gene pool of quinoa, which we have begun to characterize, will also help direct future breeding strategies. The resources presented here also help to make quinoa a useful model for studying polyploid genome evolution and mechanisms of abiotic stress tolerance, in particular salinity tolerance.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Quinoa sequencing and assembly

We sequenced Chenopodium quinoa Willd. (quinoa) accession PI 614886 (BioSample accession code SAMN04338310; also known as NSL 106399 and QQ74). DNA was extracted from leaf and flower tissue of a single plant, as described in the “Preparing Arabidopsis Genomic DNA for Size-Selected ~20 kb SMRTbell Libraries” protocol (http://www.pacb.com/wp-content/uploads/2015/09/Shared-Protocol-Preparing-Arabidopsis-DNA-for-20-kb-SMRTbell-Libraries.pdf). DNA was purified twice with Beckman Coulter Genomics AMPure XP magnetic beads and assessed by standard agarose gel electrophoresis and Thermo Fisher Scientific Qubit Fluorometry. 100 Single-Molecule Real-Time (SMRT) cells were run on the PacBio RS II system with the P6-C4 chemistry by DNALink (Seoul). De novo assembly was conducted using the smrtmake assembly pipeline (https://github.com/PacificBiosciences/smrtmake) and the Celera Assembler, and the draft assembly was polished using the quiver algorithm.

DNA was also sequenced using an Illumina HiSeq 2000 machine. For this, DNA was extracted from leaf tissue of a single soil-grown plant using the Qiagen DNeasy Plant Mini Kit. 500-bp paired-end (PE) libraries were prepared using the NEBNext Ultra DNA Library Prep Kit for Illumina. Sequencing reads were processed with Trimmomatic (v0.33)⁴², and reads <75 nucleotides in length after trimming were removed from further analysis. The remaining high-quality reads were assembled with Velvet (v1.2.10)⁴³ using a k-mer of 75.

Integrating BioNano optical maps with the PacBio assembly

High-molecular-weight DNA was isolated and labelled from leaf tissue of three-week old quinoa plants according to standard BioNano protocols, using the single-stranded nicking endonuclease Nt.BspQI. Labelled DNA was imaged automatically using the BioNano Irys system and de novo assembled into consensus physical maps using the BioNano IrysView analysis software. The final de novo assembly used only single molecules with a minimum length of 150 kb and eight labels per molecule. PacBio-BioNano hybrid scaffolds were identified using IrysView’s hybrid scaffold alignment subprogram.

Chicago library preparation and sequencing

Using the same DNA prepared for PacBio sequencing, a Chicago library was prepared as described previously¹⁰. The library was sequenced on an Illumina HiSeq 2500.

Scaffolding the PacBio and BioNano assemblies with HiRise

Chicago sequence data (in FASTQ format) was used to scaffold the PacBio-BioNano hybrid assembly using HiRise, a software pipeline designed specifically for using Chicago data to assemble genomes¹⁰. Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analysed by HiRise to produce a likelihood model, and the resulting likelihood model was used to identify putative mis-joins and score prospective joins.

Kurmi × 0654 population mapping and genetic marker analysis

A population was developed by crossing Kurmi (green, sweet) and 0654 (red, bitter). Homozygous high- and low-saponin F₂ lines were identified by planting 12 F₃ seeds derived from each F₂ line, harvesting F₄ seed from these F₃ plants, and then performing foam tests on the F₄ seed. Phenotyping was validated using gas chromatography/mass spectrometry (GC/MS). RNA was extracted from inflorescences containing a mixture of flowers and seeds at various stages of development from the parents and 45 individual F₃ progeny. RNA extraction and Illumina sequencing were performed as described above. Sequencing reads from all lines were trimmed using Trimmomatic and mapped to the reference assembly using TopHat⁴⁴, and SNPs were called using SAMtools mpileup (v1.1)⁴⁵.

For linkage mapping, markers were assigned to linkage groups on the basis of the grouping by JoinMap v4.1. Using the maximum likelihood algorithm of JoinMap, the order of the markers was determined; using this as start order and fixed order, regression mapping in JoinMap was used to determine the cM distances.

Genes differentially expressed between bitter and sweet lines and between green and red lines were identified using default parameters of the Cuffdiff function of the Cufflinks program⁴⁶.

Atlas × Carina Red population mapping and genetic marker analysis

A second mapping population was developed by crossing Atlas (sweet) and Carina Red (bitter). Bitter and sweet F₂ lines were identified by performing foam and taste tests on the F₃ seed. DNA sequencing was performed with DNA from the parents and 94 sweet F₂ lines, as described above, and sequencing reads were mapped to the reference assembly using BWA. SNPs were called in the parents and in a merged file containing all combined F₂ lines.

Genotype calls were generated for the 94 F₂ genotypes by summing up read counts over a sliding window of 500 variants, at all variant positions for which the parents were homozygous and polymorphic. Over each 500-variant stretch, all reads with Atlas alleles were summed, and all reads with the Carina Red allele were summed. Markers were assigned to linkage groups using JoinMap, with regression mapping used to obtain the genetic maps per linkage group.

Integrated linkage map

The Kurmi × 0654 and Atlas × Carina Red maps were integrated with the previously published quinoa linkage map¹³, with the Kurmi × 0654 map being used as the reference for the positions of anchor markers and scaling. We selected markers from the same scaffold that were in the same 10,000-bp bin in the assembly. The anchor markers on the alternative map received the position of the Kurmi × 0654 map anchor marker in the integrated map. This process was repeated with anchor markers at the 100,000-bp bin level. The assumption is that at the 100,000-bp bin level recombination should essentially be zero. On this level, a regression of cM position on both maps yielded R² values >0.85 and often >0.9, so the regression line can easily be used for interpolating the positions of the alternative map towards the corresponding position on the Kurmi × 0654 map. All Kurmi × 0654 markers went into the integrated map on their original position.

Chromosome pseudomolecules

Pseudomolecules were assembled by concatenating scaffolds based on their order and orientation as determined from the integrated linkage map. An AGP (‘A Golden Path’) file was made that describes the positions of the scaffold-based assembly in coordinates of the pseudomolecule assembly, with 100 ‘N’s inserted between consecutive scaffolds. Based on these coordinates, custom scripts were used to generate the pseudomolecule assembly and to recoordinate the annotation file.

Sequencing and assembly of C. pallidicaule and C. suecicum

DNA was extracted from C. pallidicaule (PI 478407) and C. suecicum (BYU 1480) and was sent to the Beijing Genomic Institute (BGI, Hong Kong) where one 180-bp PE library and two mate-pair libraries with insert sizes of 3 and 6 kb were prepared and sequenced on the Illumina HiSeq platform to obtain 2 × 100-bp reads for each library. The generated reads were trimmed using the quality-based trimming tool Sickle (https://github.com/najoshi/sickle). The trimmed reads were then assembled using the ALLPATHS-LG assembler⁴⁷, and GapCloser v1.12⁴⁸ was used to resolve N spacers and gap lengths produced by the ALLPATHS-LG assembler.

Genome annotation

Repeat families found in the genome assemblies of quinoa, C. pallidicaule and C. suecicum (see Supplementary Information 3) were first independently identified de novo and classified using the software package RepeatModeler⁴⁹. RepeatMasker⁵⁰ was used to discover and identify repeats within the respective genomes.

AUGUSTUS⁵¹ was used for ab initio gene prediction, using model training based on coding sequences from Amaranthus hypochondriacus, Beta vulgaris, Spinacia oleracea and Arabidopsis thaliana. RNA-seq and isoform sequencing reads generated from RNA of different tissues were mapped onto the reference genome using Bowtie 2 (ref. 52) and GMAP⁵³, respectively. Hints with locations of potential intron–exon boundaries were generated from the alignment files with the software package BAM2hints in the MAKER package⁵⁴. MAKER with AUGUSTUS (intron–exon boundary hints provided from RNA-seq and isoform sequencing) was then used to predict genes in the repeat-masked reference genome. To help guide the prediction process, peptide sequences from B. vulgaris and the original quinoa full-length transcript (provided as EST evidence) were used by MAKER during the prediction. Genes were characterized for their putative function by performing a BLAST search of the peptide sequences against the UniProt database. PFAM domains and InterProScan ID were added to the gene models using the scripts provided in the MAKER package.

Re-sequencing

The following quinoa accessions were chosen for DNA re-sequencing: 0654, Ollague, Real, Pasankalla (BYU 1202), Kurmi, CICA-17, Regalona (BYU 947), Salcedo INIA, G-205-95DK, Cherry Vanilla (BYU 1439), Chucapaca, Ku-2, PI 634921 (Ames 22157), Atlas and Carina Red. The following accessions of C. berlandieri were sequenced: var. boscianum (BYU 937), var. macrocalycium (BYU 803), var. zschackei (BYU 1314), var. sinuatum (BYU 14108), and subsp. nuttaliae (‘Huauzontle’). Two accessions of C. hircinum (BYU 566 and BYU 1101) were also sequenced. All sequencing was performed with an Illumina HiSeq 2000 machine, using either 125-bp (Atlas and Carina Red) or 100-bp (all other accessions) paired-end libraries. Reads were trimmed using Trimmomatic and mapped to the reference assembly using BWA (v0.7.10)⁵⁵. Read alignments were manipulated with SAMtools, and the mpileup function of SAMtools was used to call SNPs.

Identification of orthologous genes

Orthologous and paralogous gene clusters were identified using OrthoMCL²⁸. Recommended settings were used for all-against-all BLASTP comparisons (Blast+ v2.3.0⁵⁶) and OrthoMCL analyses. Custom Perl scripts were used to process OrthoMCL outputs for visualization with InteractiVenn⁵⁷.

Phylogenetic inference

Using OrthoMCL, orthologous gene sets containing two copies in quinoa and one copy each in C. pallidicaule, C. suecicum, and B. vulgaris were identified. In total, 7,433 gene sets were chosen, and their amino acid sequences were aligned individually for each set using MAFFT⁵⁸. The 7,433 alignments were converted into PHYLIP format files by the seqret command in the EMBOSS package⁵⁹. Individual gene trees were then constructed using the maximum likelihood method using proml in PHYLIP⁶⁰.

In addition, the genomic variants of all 25 sequenced taxa (Supplementary Data 5) relative to the reference sequence were called based on the mapped Illumina reads in 25 BAM files using SAMtools. To call variants in the reference genome (PI 614886), Illumina sequencing reads were mapped to the reference assembly. Variants were then filtered using VCFtools⁶¹ and SAMtools, and the qualified SNPs were combined into a single VCF file which was used as an input into SNPhylo⁶² to construct the phylogenetic relationship using maximum likelihood and 1,000 bootstrap iterations.

To identify FT homologues, the protein sequence from the A. thaliana flowering time gene FT was used as a BLAST query. Filtering for hits with an E value <1 × e⁻³ and with RNA-seq evidence resulted in the identification of four quinoa proteins. One quinoa protein (AUR62013052) appeared to be comprised of two tandem repeats which were separated for the purposes of phylogenetic analysis. For the construction of the phylogenetic tree, protein sequences from these five quinoa FT homologues were aligned using Clustal Omega⁶³ along with two B. vulgaris (gene models: BvFT1-miuf.t1, BvFT2-eewx.t1) and one A. thaliana (AT1G65480.1) homologue. Phylogenetic analysis was performed with MEGA⁶⁴ (v6.06). The JTT model was selected as the best fitting model. The initial phylogenetic tree was estimated using the neighbour joining method (bootstrap value = 50, Gaps/ Missing Data Treatment = Partial Deletion, Cutoff 95%), and the final tree was estimated using the maximum likelihood method with a bootstrap value of 1,000 replicates. The syntenic relationships between the coding sequences of the chromosomal regions surrounding these FT genes were visualized using the CoGE⁶⁵ GEvo tool and the Multi-Genome Synteny Viewer⁶⁶.

The alignment of bHLH domains was performed with Clustal Omega⁶³, using sequences from Mertens et al.³⁹. The phylogeny was inferred using the maximum likelihood method based on the JTT matrix-based model⁶⁷. Initial trees for the heuristic search were obtained automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. All positions containing gaps and missing data were eliminated.

Distinguishing and analysing the quinoa sub-genomes

Trimmed PE Illumina sequencing reads that were used for the de novo assembly of C. suecicum and C. pallidicaule were mapped onto the reference quinoa genome using the default settings of BWA. For every base in the quinoa genome, the depth coverage of properly paired reads from the C. suecicum and C. pallidicaule mapping was calculated using the program GenomeCoverage in the BEDtools package⁶⁸. A custom Perl script was used to calculate the percentage of each scaffold with more than 5× coverage from both diploids. Scaffolds were assigned to the A or B sub-genome if >65% of the bases were covered by reads from one diploid and <25% of the bases were covered by reads from the other diploid. The relationship between the quinoa sub-genomes and the diploid species C. pallidicaule and C. suecicum was presented in a circle proportional to their sizes using Circos⁶⁹. Orthologous regions in the three species were identified using BLASTN searches of the quinoa genome against each diploid genome individually. Single top BLASTN hits longer than 8 kb were selected and presented as links between the quinoa genome assembly (arranged in chromosomes, see Supplementary Information 7.3) and the two diploid genome assemblies on the Circos plot (Fig. 2a).

Sub-genome synteny was analysed by plotting the positions of homoeologous pairs of A- and B-sub-genome pairs within the context of the 18 chromosomes using Circos. Synteny between the sub-genomes and B. vulgaris was assessed by first creating pseudomolecules by concatenating scaffolds which were known to be ordered and oriented within each of the nine chromosomes. Syntenic regions between these B. vulgaris chromosomes and those of quinoa were then identified using the recommended settings of the CoGe SynMap tool⁷⁰ and visualized using MCScanX⁷¹ and VGSC⁷². For the purposes of visualization, quinoa chromosomes CqB05, CqA08, CqB11, CqA15 and CqB16 were inverted.

Saponin analyses

Quinoa seeds were embedded in a 2% carboxymethylcellulose solution and frozen above liquid nitrogen. Sections of 50 μm thickness were obtained using a Reichert-Jung Frigocut 2800N, modified to use a Feather C35 blade holder and blades at −20 °C using a modified Kawamoto method⁷³. A 2,5-dihydroxybenzoic acid (Sigma-Aldrich) matrix (40 mg ml⁻¹ in 70% methanol) was applied using a HTX TM-Sprayer (HTX Technologies LLC) with attached LC20-AD HPLC pump (Shimadzu Scientific Instruments). Sections were vacuum dried in a desiccator before analysis. The optical image was generated using an Epson 4400 Flatbed Scanner at 4,800 d.p.i. For mass spectrometric analyses, a Bruker SolariX XR with 7T magnet was used. Images were generated using Bruker Compass FlexImaging 4.1. Data were normalized to the TIC, and brightness optimization was employed to enhance visualization of the distribution of selected compounds. Individual spectra were recalibrated using Bruker Compass DataAnalysis 4.4 to internally lock masses of known DHB clusters: C₁₄H₉O₆ = 273.039364 and C₂₁H₁₃O₉ = 409.055408 m/z. Accurate mass measurements for individual saponins and identified compounds were run using continuous accumulation of selected ions (CASI) using mass windows of 50–100 m/z and a transient of 4 megaword generating a transient of 2.93 s providing a mass resolving power of approximately 390,000 at 400 m/z. Lipids were putatively assigned by searching the LipidMaps database⁷⁴ (http://www.lipidmaps.org) and lipid class confirmed by collision-induced dissociation using a 10 m/z window centred around the monoisotopic peak with collision energy of between 15–20 V.

Quinoa flowers were marked at anthesis, and seeds were sampled at 12, 16, 20 and 24 days after anthesis. A pool of five seeds from each time point was analysed using GC/MS.

Quantification of saponins was performed indirectly by quantifying oleanolic acid (OA) derived from the hydrolysis of saponins extracted from quinoa seeds. Derivatized solution was analysed using single quadrupole GC/MS system (Agilent 7890 GC/5975C MSD) equipped with EI source at ionisation energy of 70 eV. Chromatography separation was performed using DB-5MS fused silica capillary column (30m × 0.25 mm I.D., 0.25 μm film thickness; Agilent J&W Scientific), chemically bonded with 5% phenyl 95% methylpolysiloxane cross-linked stationary phase. Helium was used as the carrier gas with constant flow rate of 1.0 ml min⁻¹. The quantification of OA in each sample was performed using a standard curve based on standards of OA.

Specific, individual saponins were identified in quinoa using a preparation of 20 mg of seeds performed according a modified protocol from Giavalisco et al.⁷⁵. Samples were measured with a Waters ACQUITY Reversed Phase Ultra Performance Liquid Chromatography (RP-UPLC) coupled to a Thermo-Fisher Exactive mass spectrometer, which consists of an electrospray ionisation source and an Orbitrap mass analyser. A C18 column was used for the hydrophilic measurements. Chromatograms were recorded in full-scan MS mode (mass range, 100 −1,500). Extraction of the LC/MS data was accomplished with the software REFINER MS 7.5 (GeneData).

SwissModel⁷⁶ was used to produce homology models for the bHLH region of AUR62017204, AUR62017206 and AUR62010677. RaptorX⁷⁷ was used for prediction of secondary structure and disorder. QUARK⁷⁸ was used for ab initio modelling of the C-terminal domain, and the DALI server⁷⁹ was used for 3D homology searches of this region. Models were manually inspected and evaluated using the PyMOL program (http://pymol.org).

Data availability statement

The genome assemblies and sequence data for C. quinoa, C. pallidicaule and C. suecicum were deposited at NCBI under BioProject codes PRJNA306026, PRJNA326220 and PRJNA326219, respectively. Additional accessions numbers for deposited data can be found in Supplementary Data 9. The quinoa genome can also be accessed at http://www.cbrc.kaust.edu.sa/chenopodiumdb/ and on the Phytozome database (http://www.phytozome.net/).

Change history

09 February 2017
The author affiliations were corrected.
13 February 2017
The Creative Commons license information was updated in the HTML.

References

Risi, C. & Galwey, N. W. The Chenopodium grains of the Andes: Inca crops for modern agriculture. Adv. Appl. Biol . 10, 145–216 (1984)
Google Scholar
Adolf, V. I., Shabala, S., Andersen, M. N., Razzaghi, F. & Jacobsen, S.-E. Varietal differences of quinoa’s tolerance to saline conditions. Plant Soil 357, 117–129 (2012)
Article CAS Google Scholar
Hariadi, Y., Marandon, K., Tian, Y., Jacobsen, S.-E. & Shabala, S. Ionic and osmotic relations in quinoa (Chenopodium quinoa Willd.) plants grown at various salinity levels. J. Exp. Bot. 62, 185–193 (2011)
Article CAS Google Scholar
Jacobsen, S.-E., Mujica, A. & Jensen, C. R. The resistance of quinoa (Chenopodium quinoa Willd.) to adverse abiotic factors. Food Rev. Int . 19, 99–109 (2003)
Article Google Scholar
Gordillo-Bastidas, E., Díaz-Rizzolo, D. A., Roura, E., Massanés, T. & Gomis, R. Quinoa (Chenopodium quinoa Willd.), from nutritional value to potential health benefits: an integrative review. J. Nutr. Food Sci. 6, 497 (2016)
Google Scholar
Vega-Gálvez, A. et al. Nutrition facts and functional potential of quinoa (Chenopodium quinoa Willd.), an ancient Andean grain: a review. J. Sci. Food Agric. 90, 2541–2547 (2010)
Article Google Scholar
Massawe, F., Mayes, S. & Cheng, A. Crop diversity: an unexploited treasure trove for food security. Trends Plant Sci . 21, 365–368 (2016)
Article CAS Google Scholar
Zurita-Silva, A., Fuentes, F., Zamora, P., Jacobsen, S.-E. & Schwember, A. R. Breeding quinoa (Chenopodium quinoa Willd.): potential and perspectives. Mol. Breed . 34, 13–30 (2014)
Article Google Scholar
Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol . 30, 771–776 (2012)
Article CAS Google Scholar
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res . 26, 342–350 (2016)
Article CAS Google Scholar
Palomino, G., Hernández, L. T. & Torres, E. C. Nuclear genome size and chromosome analysis in Chenopodium quinoa and C. berlandieri subsp. nuttalliae. Euphytica 164, 221–230 (2008)
Article CAS Google Scholar
Kolano, B., Siwinska, D., Pando, L. G., Szymanowska-Pulka, J. & Maluszynska, J. Genome size variation in Chenopodium quinoa (Chenopodiaceae). Plant Syst. Evol . 298, 251–255 (2012)
Article CAS Google Scholar
Maughan, P. J. et al. Single nucleotide polymorphism identification, characterization, and linkage mapping in quinoa. Plant Genome 5, 114–125 (2012)
Article CAS Google Scholar
Yasui, Y. et al. Draft genome sequence of an inbred line of Chenopodium quinoa, an allotetraploid crop with great environmental adaptability and outstanding nutritional properties. DNA Res . 23, 535–546 (2016)
Article CAS Google Scholar
Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol . 33, 524–530 (2015)
Article Google Scholar
Eilbeck, K., Moore, B., Holt, C. & Yandell, M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67 (2009)
Article Google Scholar
Yandell, M. & Ence, D. A beginner’s guide to eukaryotic genome annotation. Nat. Rev. Genet . 13, 329–342 (2012)
Article CAS Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015)
Article Google Scholar
Kolano, B. et al. Molecular and cytogenetic evidence for an allotetraploid origin of Chenopodium quinoa and C. berlandieri (Amaranthaceae). Mol. Phylogenet. Evol . 100, 109–123 (2016)
Article CAS Google Scholar
Brown, D. C., Cepeda-Cornejo, V., Maughan, P. J. & Jellen, E. N. Characterization of the granule-bound starch synthase I gene in Chenopodium. Plant Genome 8, 1 (2014)
Google Scholar
Štorchová, H., Drabešová, J., Cháb, D., Kolár˘, J. & Jellen, E. N. The introns in flowering locus T-like (FTL) genes are useful markers for tracking paternity in tetraploid Chenopodium quinoa Willd. Genet. Resour. Crop Evol . 62, 913–925 (2015)
Article Google Scholar
Walsh, B. M., Adhikary, D., Maughan, P. J., Emshwiller, E. & Jellen, E. N. Chenopodium polyploidy inferences from salt overly sensitive 1 (SOS1) data. Am. J. Bot . 102, 533–543 (2015)
Article Google Scholar
Koch, M. A., Haubold, B. & Mitchell-Olds, T. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol . 17, 1483–1498 (2000)
Article CAS Google Scholar
Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000)
Article CAS ADS Google Scholar
Wilson, H. D. Quinua biosystematics II: free-living populations. Econ. Bot . 42, 478–494 (1988)
Article MathSciNet Google Scholar
Kolano, B. et al. Chromosomal localization of two novel repetitive sequences isolated from the Chenopodium quinoa Willd. genome. Genome 54, 710–717 (2011)
Article CAS Google Scholar
Dohm, J. C. et al. The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505, 546–549 (2014)
Article CAS ADS Google Scholar
Li, L., Stoeckert, C. J. Jr & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res . 13, 2178–2189 (2003)
Article CAS Google Scholar
Cháb, D., Kolár, J., Olson, M. S. & Storchová, H. Two flowering locus T (FT) homologs in Chenopodium rubrum differ in expression patterns. Planta 228, 929–940 (2008)
Article Google Scholar
Pin, P. A. et al. An antagonistic pair of FT homologs mediates the control of flowering time in sugar beet. Science 330, 1397–1400 (2010)
Article CAS ADS Google Scholar
Pin, P. A. & Nilsson, O. The multifaceted roles of FLOWERING LOCUS T in plant development. Plant Cell Environ . 35, 1742–1755 (2012)
Article CAS Google Scholar
Kuljanabhagavad, T., Thongphasuk, P., Chamulitrat, W. & Wink, M. Triterpene saponins from Chenopodium quinoa Willd. Phytochemistry 69, 1919–1926 (2008)
Article CAS Google Scholar
de Geyter, E., Lambert, E., Geelen, D. & Smagghe, G. Novel advances with plant saponins as natural insecticides to control pest insects. Pers. Technol . 1, 96–105 (2007)
Google Scholar
Kuljanabhagavad, T. & Wink, M. Biological activities and chemistry of saponins from Chenopodium quinoa Willd. Phytochem. Rev . 8, 473–490 (2009)
Article CAS Google Scholar
Konishi, Y., Hirano, S., Tsuboi, H. & Wada, M. Distribution of minerals in quinoa (Chenopodium quinoa Willd.) seeds. Biosci. Biotechnol. Biochem . 68, 231–234 (2004)
Article CAS Google Scholar
Madl, T., Sterk, H., Mittelbach, M. & Rechberger, G. N. Tandem mass spectrometric analysis of a complex triterpene saponin mixture of Chenopodium quinoa. J. Am. Soc. Mass Spectrom. 17, 795–806 (2006)
Article CAS Google Scholar
Mastebroek, D. H., Limburg, H., Gilles, T. & Marvin, H. J. P. Occurrence of sapogenins in leaves and seeds of quinoa (Chenopodium quinoa Willd). J. Sci. Food Agric. 80, 152–156 (2000)
Article CAS Google Scholar
Ward, S. M. A recessive allele inhibiting saponin synthesis in two lines of Bolivian quinoa (Chenopodium quinoa Willd.). J. Hered . 92, 83–86 (2001)
Article CAS Google Scholar
Mertens, J. et al. The bHLH transcription factors TSAR1 and TSAR2 regulate triterpene saponin biosynthesis in Medicago truncatula. Plant Physiol . 170, 194–210 (2016)
Article CAS Google Scholar
Laule, O. et al. Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 100, 6866–6871 (2003)
Article CAS ADS Google Scholar
Rodríguez-Concepción, M. & Boronat, A. Breaking new ground in the regulation of the early steps of plant isoprenoid biosynthesis. Curr. Opin. Plant Biol . 25, 17–22 (2015)
Article Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)
Article CAS Google Scholar
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res . 18, 821–829 (2008)
Article CAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009)
Article CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)
Article Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol . 28, 511–515 (2010)
Article CAS Google Scholar
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA 108, 1513–1518 (2011)
Article CAS ADS Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)
Article Google Scholar
Smit, A. F. A. & Hubley, R. RepeatModeler Open-1.0. http://www.repeatmasker.org (2008–2015)
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. http://www.repeatmasker.org (2013–2015)
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008)
Article CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Article CAS Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005)
Article CAS Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res . 18, 188–196 (2008)
Article CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010)
Article Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)
Article Google Scholar
Heberle, H., Meirelles, G. V., da Silva, F. R., Telles, G. P. & Minghim, R. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 16, 169 (2015)
Article Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol . 30, 772–780 (2013)
Article CAS Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet . 16, 276–277 (2000)
Article CAS Google Scholar
Felsenstein, J. PHYLIP (Phylogeny Inference Package) Version 3.6a3. http://evolution.genetics.washington.edu/phylip.html
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011)
Article CAS Google Scholar
Lee, T.-H., Guo, H., Wang, X., Kim, C. & Paterson, A. H. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics 15, 162 (2014)
Article Google Scholar
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol . 7, 539 (2011)
Article Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol . 30, 2725–2729 (2013)
Article CAS Google Scholar
Lyons, E. & Freeling, M. How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J . 53, 661–673 (2008)
Article CAS Google Scholar
Revanna, K. V. et al. A web-based multi-genome synteny viewer for customized data. BMC Bioinformatics 13, 190 (2012)
Article Google Scholar
Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci . 8, 275–282 (1992)
CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Article CAS Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res . 19, 1639–1645 (2009)
Article CAS Google Scholar
Lyons, E., Pedersen, B., Kane, J. & Freeling, M. The value of nonmodel genomes and an example using SynMap within CoGe to dissect the hexaploidy that predates the rosids. Trop. Plant Biol . 1, 181–190 (2008)
Article CAS Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res . 40, e49 (2012)
Article CAS ADS Google Scholar
Xu, Y. et al. VGSC: a web-based vector graph toolkit of genome synteny and collinearity. BioMed Res. Int . 2016, 7823429 (2016)
PubMed PubMed Central Google Scholar
Kawamoto, T. Use of a new adhesive film for the preparation of multi-purpose fresh-frozen sections from hard tissues, whole-animals, insects and plants. Arch. Histol. Cytol . 66, 123–143 (2003)
Article Google Scholar
Sud, M. et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res . 35, D527–D532 (2007)
Article CAS Google Scholar
Giavalisco, P., Köhl, K., Hummel, J., Seiwert, B. & Willmitzer, L. ¹³C isotope-labeled metabolomes allowing for improved compound annotation and relative quantification in liquid chromatography-mass spectrometry-based metabolomic research. Anal. Chem . 81, 6546–6551 (2009)
Article CAS Google Scholar
Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)
Article CAS Google Scholar
Källberg, M., Margaryan, G., Wang, S., Ma, J. & Xu, J. RaptorX server: a resource for template-based protein structure modeling. In Protein Structure Prediction (ed. Kihara, K. ) 17–27 (Springer, New York, 2014)
Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012)
Article CAS Google Scholar
Holm, L. & Rosenström, P. Dali server: conservation mapping in 3D. Nucleic Acids Res . 38, W545–W549 (2010)
Article CAS Google Scholar

Download references

Acknowledgements

Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST), by USDA/NIFA-REEIS grant #2012-51300-20100 (WSU/BYU), by NSF grant #1339412, and by a grant from the German Research Foundation, DFG (grant no. JU205/24-1). We thank T. Ramaraj from the National Center for Genomic Resources (Santa Fe) for his bioinformatics assistance, O. Mohammed Eid Alharbi at the KAUST Imaging and Characterization Laboratory for generating SEM images, K. Zemmouri at the KAUST Greenhouse for plant care, and H. Ho (Heno) Hwang and I. D. Gromicho at the KAUST Academic Writing Department for assistance with quinoa illustrations and photographs. We also thank H. Štorchová, F. Morton, D. Bertero, F. Fuentes and D. Brenner of USDA-ARS-NPGS for their assistance in providing seeds. Metabolite imaging was conducted at Metabolomics Australia (School of BioSciences, The University of Melbourne, Australia), a NCRIS initiative under Bioplatforms Australia Pty Ltd.

Author information

David E. Jarvis, Yung Shwen Ho, Damien J. Lightfoot, Sandra M. Schmöckel, Bo Li and Kevin Murphy: These authors contributed equally to this work.
Washington State University, Department of Crop and Soil Sciences, College of Agricultural, Human, and Natural Resource Sciences, Pullman, Washington 99164-6420, USA

Authors and Affiliations

Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
David E. Jarvis, Yung Shwen Ho, Damien J. Lightfoot, Sandra M. Schmöckel, Bo Li, Noha Saber, Yong H. Woo, Ge Gao, Xiujie Guo, Sónia Negrão, Salim Al-Babili, Christoph Gehring & Mark Tester
Wageningen University and Research, Wageningen UR Plant Breeding, Wageningen, The Netherlands
Theo J. A. Borm, C. Gerard van der Linden & Eibertus N. van Loo
Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
Hajime Ohyanagi, Afaque A. Momin, Stefan T. Arold & Takashi Gojobori
Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Thuwal, 23955-6900, Saudi Arabia
Katsuhiko Mineta
Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC), Thuwal, 23955-6900, Saudi Arabia
Craig T. Michell
King Abdullah University of Science and Technology (KAUST), Analytical Core Lab, Thuwal, 23955-6900, Saudi Arabia
Najeh M. Kharbatia
Department of Plant and Wildlife Sciences, Brigham Young University, College of Life Sciences, Provo, 84602, Utah, USA
Ryan R. Rupper, Aaron R. Sharp, Eric N. Jellen & Peter J. Maughan
Plant Breeding Institute, Christian-Albrechts-University of Kiel, Olshausenstr. 40, D-24118 Kiel, Germany
Nadine Dally & Christian Jung
Metabolomics Australia, The School of Biosciences, The University of Melbourne, Parkville, 3010, Victoria, Australia
Berin A. Boughton & Ute Roessner
PRI Bioscience, Plant Research International, Wageningen UR, Wageningen, The Netherlands
Elio G. W. M. Schijlen

Authors

David E. Jarvis
View author publications
You can also search for this author in PubMed Google Scholar
Yung Shwen Ho
View author publications
You can also search for this author in PubMed Google Scholar
Damien J. Lightfoot
View author publications
You can also search for this author in PubMed Google Scholar
Sandra M. Schmöckel
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Theo J. A. Borm
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Ohyanagi
View author publications
You can also search for this author in PubMed Google Scholar
Katsuhiko Mineta
View author publications
You can also search for this author in PubMed Google Scholar
Craig T. Michell
View author publications
You can also search for this author in PubMed Google Scholar
Noha Saber
View author publications
You can also search for this author in PubMed Google Scholar
Najeh M. Kharbatia
View author publications
You can also search for this author in PubMed Google Scholar
Ryan R. Rupper
View author publications
You can also search for this author in PubMed Google Scholar
Aaron R. Sharp
View author publications
You can also search for this author in PubMed Google Scholar
Nadine Dally
View author publications
You can also search for this author in PubMed Google Scholar
Berin A. Boughton
View author publications
You can also search for this author in PubMed Google Scholar
Yong H. Woo
View author publications
You can also search for this author in PubMed Google Scholar
Ge Gao
View author publications
You can also search for this author in PubMed Google Scholar
Elio G. W. M. Schijlen
View author publications
You can also search for this author in PubMed Google Scholar
Xiujie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Afaque A. Momin
View author publications
You can also search for this author in PubMed Google Scholar
Sónia Negrão
View author publications
You can also search for this author in PubMed Google Scholar
Salim Al-Babili
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Gehring
View author publications
You can also search for this author in PubMed Google Scholar
Ute Roessner
View author publications
You can also search for this author in PubMed Google Scholar
Christian Jung
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Stefan T. Arold
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Gojobori
View author publications
You can also search for this author in PubMed Google Scholar
C. Gerard van der Linden
View author publications
You can also search for this author in PubMed Google Scholar
Eibertus N. van Loo
View author publications
You can also search for this author in PubMed Google Scholar
Eric N. Jellen
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Maughan
View author publications
You can also search for this author in PubMed Google Scholar
Mark Tester
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.T. and D.E.J. conceived the project. M.T. supervised the research. M.T., D.E.J., Y.S.H., D.J.L., S.M.S., P.J.M., E.N.J., E.N.v.L., C.G.v.d.L. and T.G. conceived and designed the experiments and managed particular components of the project. Y.S.H. led the bioinformatics analyses. Y.S.H., D.J.L., D.E.J. and B.L. did most of the compilation of the genome scaffolds and the genomic analyses. Y.S.H. annotated the genome and undertook the analysis of repetitive elements. E.N.v.L. did the final genetic mapping. E.N.J. and P.J.M. provided germplasm, oversaw sequencing of the diploid genomes, oversaw the BioNano mapping, led the comparative genomics work and shared their knowledge of quinoa. S.M.S. oversaw all saponin-related analyses. C.T.M., A.R.S., T.B., E.S., K.M., E.N.J. and P.J.M. prepared materials and undertook sequencing activities. K.M., H.O. and S.N. oversaw all phylogenetic analyses. Y.H.W. and G.G. analysed the microRNAs. N.S., N.M.K., R.R.R., X.G. and S.A.-B. did saponin analyses. N.D. and C.J. analysed the genes related to flowering time. S.T.A. and M.A.M. did the computational structure–function analysis. B.A.B. and U.R. did the metabolomics imaging. All authors contributed to the writing of the paper. D.E.J., Y.S.H., D.J.L., S.M.S., B.L. and M.T. organised the manuscript. D.E.J. and M.T. coordinated the project. D.E.J., Y.S.H., D.J.L., S.M.S. and B.L. contributed equally.

Corresponding author

Correspondence to Mark Tester.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks S. Bak, A. Paterson and N. Stein for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Quinoa linkage map.

Linkage map generated by integrating maps from three independent quinoa populations. Black bands in each linkage group represent mapped markers. Letters next to each linkage group name indicate putative assignments of the linkage group to the A or B sub-genome. Numbers of each linkage group correspond to the numbering of chromosomes (for example, LG01(B) is equivalent to CqB01). Scale bar, cM.

Extended Data Figure 2 Quinoa genome annotation.

a, Representative gene model showing mapped RNA sequencing reads generated using Illumina or isoform sequencing technologies. The top and middle panels show isoform sequencing and RNA-seq reads, respectively, that have been mapped to the chromosomal location containing the AUR62017258 gene model, which is shown on the bottom panel. Light grey lines in the top two panels indicate regions where reads were split to indicate introns positions. Full-length isoform sequencing reads were able to span the 5′ untranslated region, all exons, and the 3′ untranslated region in a single read. b, Gene density and GC% in 100-kb windows in quinoa chromosomes. c, The frequency of annotation edit distance (AED) scores for the assemblies of quinoa (blue), C. pallidicaule (red) and C. suecicum (green).

Extended Data Figure 3 Distribution of SNPs in the quinoa genome.

Frequency of SNPs in the sequenced quinoa accessions, relative to the reference quinoa genome assembly, in a 1-Mb window size. y axis scale is from 0 to 10,000 SNPs. The innermost track shows scaffolds arranged according to their placement in the linkage groups, with scaffolds coloured according to sub-genome assignment based on mapping sequencing reads from C. pallidicaule and C. suecicum, as in Fig. 2a. From inside to outside, the remaining tracks show SNPs in PI 634921, Atlas, CICA-17, Carina Red, Cherry Vanilla, Chucapaca, G-205-95DK, Ku-2, Kurmi, 0654, Ollague, Pasankalla, Real, Regalona and Salcedo INIA.

Extended Data Figure 4 Orthologous genes in quinoa, Amaranthus hypochondriacus, Beta vulgaris and Spinacia oleracea.

Venn diagram representing the number of protein-coding gene clusters shared between, or distinct to, the indicated species. The number in parentheses indicates the total number of genes contained within the associated clusters.

Extended Data Figure 5 Syntenic relationships supporting the orthologous relationships between quinoa and Beta vulgaris FT genes.

a, Chromosomal regions containing BvFT1 and the quinoa genes AUR62013052 (CqFT1B) and AUR62010060 (CqFT1A). The annotation for AUR62013052 appears to contain two duplicate genes, which we have separated and designated as CqFT1B-1 and CqFT1B-2. b, Chromosomal regions containing BvFT2 and the quinoa genes AUR62000271 (CqFT2A) and AUR62006619 (CqFT2B). Green and grey arrows represent FT1 and FT2 genes, respectively, and coloured lines connect orthologous coding sequences surrounding the FT genes.

Extended Data Figure 6 Imaging mass spectrometry visualization of saponins in the pericarp of a quinoa seed.

Table lists saponin annotation. ^aPA, phytolaccagenic acid; Hed, hederagenin; SA, serjanic acid; OA, oleanolic acid; AG489, AG515, AG487 refer to new aglycones with a specific m/z. Pen, pentose; Hex, hexose; HexA, corresponding sugar acid. ^bPairs of saponins with the same chemical formula but different retention times. Image label indicates saponin ID. Scale bars, 500 μm.

Extended Data Figure 7 Saponin content in 0654 × Kurmi population.

Twelve individual seeds of 17 bitter lines, and 12 pooled seeds from each of 16 sweet lines were analysed for derivatized saponins using gas chromatography/mass spectrometry. Data for sweet lines, including parent 0654, were consolidated into one box plot (Sweet). Box plots show median values (solid horizontal lines), 25th and 75th percentile values (box), 90th percentile values (whiskers) and outlier values (open circles). Quantification was performed using standards of oleanolic acid. Letters above each box plot represent statistically significant (P < 0.05) differences between groups based on Games Howell post hoc test.

Extended Data Figure 8 Variation in the thickness of seed coat layers in bitter and sweet lines.

a, Representative scanning electron microscopy image of a quinoa seed cross-section, showing an example of a region (white box) from which measurements were taken. Scale bar, 1 mm. b, Representative scanning electron microscopy image showing measurements of inner and outer seed coat layers. Scale bar, 30 μm c, Thickness of the internal seed coat layer in bitter and sweet lines. d, Thickness of the external seed coat layer in bitter and sweet lines. Letters above each box plot represent statistically significant (P < 0.001) differences between groups based on ANOVA. Box plots show median values (solid horizontal lines), 25th and 75th percentile values (box), 90th percentile values (whiskers) and outlier values (open circles). n = 91, 160, 54, and 129 for internal bitter, internal sweet, external bitter, and external sweet, respectively.

Extended Data Figure 9 Prediction of TSARL1 function using phylogenetics and computational modelling.

a, Maximum likelihood tree of select bHLH peptide sequences from quinoa, Arabidopsis thaliana and Medicago truncatula, showing the close evolutionary relationship between the quinoa bHLH TSARL1 (AUR62017204) and the M. truncatula bHLHs TSAR1 and TSAR2 (MEDTR7G080780 and MEDTR4G066460, respectively). Branch values represent the percentage of 500 bootstrap replicates that support the topology. Subclades of the bHLH family, as defined in A. thaliana, are indicated on the right. b, Sequence alignment of TSARL bHLH sequences. Underlined blue, bHLH homology domain (solid line, helix; dashed line, loop). Underlined red, C-terminal domain. Boxed, residues that confer specificity to E box DNA binding. Green boxed Arg, residue that selects for the central CG dinucleotide. AUR62017204_AS designates the alternatively spliced protein. c, Schematic drawing of full-length TSARL1 (AUR62017204). Dashed lines indicate regions predicted to be flexible. Boxed region shows C-terminal domain lost in alternative splice variant. d, Zoomed-in view of the black dashed box in c, showing TSARL1 specifically binding the CACGHG motif.

Extended Data Figure 10 Structural variation in TSARL1 in the sweet progeny of Atlas and Carina Red.

a, Screenshot from Integrative Genomics Viewer (IGV) showing read alignment results in a 3-kb region (track 1, top) around TSARL1 (track 5, bottom). Shown are alignments for Atlas (track 2), Carina Red (track 3), and the merged F₂ sweet lines (track 4). In tracks 2–4, the top portion shows coverage, and the bottom portion shows individual reads. The F₂ merged data (track 4) shows evidence of three structural variants (labelled A, B and C) relative to the reference, whereas Atlas (track 2) only shows evidence of the G2078C SNP (labelled D). b, c, Screenshots from IGV showing before (a) and after (b) local re-assembly of the reference sequence to include the B insertion site shown in panel a, illustrating the effect on read mapping from the merged F₂ sweet lines. The dips in coverage and disconcordantly mapped reads (indicated by colours assigned to the reads) around this insertion site are resolved after re-assembly.

Supplementary information

Supplementary Information

This file contains Supplementary text, Supplementary Tables 1-9 and Supplementary References. This file was replaced on the 15 February 2017 to correct the numbering of the reference list. (PDF 1107 kb)

Supplementary Data 1

This file contains information for the integrated linkage map, including Excel spreadsheet tabs for markers in the map and their positions, scaffolds assigned to each linkage group, scaffolds mapping to more than one linkage group, unmapped scaffolds, and average nucleotides per centimorgan for each linkage group. (XLSX 826 kb)

Supplementary Data 2

Linkage group plots for each of the two new linkage maps, in comparison to linkage groups from the previously published quinoa linkage map, showing alignments between the three maps. (PDF 5933 kb)

Supplementary Data 3

A table of genes that are differentially expressed between green- and red-stemmed plants, as output from the CuffDiff program. (XLSX 23 kb)

Supplementary Data 4

A portion of the quinoa genome annotation, containing genes in a window surrounding the mapped betalain (stem colour) locus. (XLSX 16 kb)

Supplementary Data 5

Information about the lines used for sequencing. (XLSX 11 kb)

Supplementary File 6

The sub-genome assignment of scaffolds, based on the number of bases onto which reads from the A (C. pallidicuale) and B (C. suecicum) diploids were mapped, as well as the number of BLAST hits from the 18-24J repeat which is overrepresented in B genomes. (XLSX 332 kb)

Supplementary File 7

A portion of the quinoa genome annotation, containing genes in a window surrounding the mapped saponin production locus. (XLSX 14 kb)

Supplementary Data 8

This table shows genes that are differentially expressed between bitter and sweet plants, as output from the CuffDiff program. (XLSX 68 kb)

Supplementary Data 9

Accession numbers for all data deposited on public databases. (XLSX 13 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Source data

Source data to Fig. 1

Source data to Fig. 2

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons licence, users will need to obtain permission from the licence holder to reproduce the material. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jarvis, D., Ho, Y., Lightfoot, D. et al. The genome of Chenopodium quinoa. Nature 542, 307–312 (2017). https://doi.org/10.1038/nature21370

Download citation

Received: 21 June 2016
Accepted: 08 January 2017
Published: 08 February 2017
Issue Date: 16 February 2017
DOI: https://doi.org/10.1038/nature21370

This article is cited by

Mining genomic regions associated with agronomic and biochemical traits in quinoa through GWAS
- Hifzur Rahman
- Prashant Vikram
- Rakesh Kumar Singh
Scientific Reports (2024)
Association study of morpho-phenological traits in quinoa (Chenopodium quinoa Willd.) using SSR markers
- Ebrahim Souri Laki
- Babak Rabiei
- Andreas Börner
Scientific Reports (2024)
The role of tissue succulence in plant salt tolerance: an overview
- Ranran Liu
- Tong Wang
- Jie Song
Plant Growth Regulation (2024)
Wild foods, woodland fuels, and cultivation through the Ceramic and Early Historical periods in Araucanía, Southern Chile (400–1850 ce)
- Constanza Roa Solís
- Ayelen Delgado Orellana
- Aylén Capparelli
Vegetation History and Archaeobotany (2024)
Multilayered regulation of secondary metabolism in medicinal plants
- Yan Zhao
- Guanze Liu
- Shengchao Yang
Molecular Horticulture (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.