The absence of the caffeine synthase gene is involved in the naturally decaffeinated status of Coffea humblotiana, a wild species from Comoro archipelago

Raharimalala, Nathalie; Rombauts, Stephane; McCarthy, Andrew; Garavito, Andréa; Orozco-Arias, Simon; Bellanger, Laurence; Morales-Correa, Alexa Yadira; Froger, Solène; Michaux, Stéphane; Berry, Victoria; Metairon, Sylviane; Fournier, Coralie; Lepelley, Maud; Mueller, Lukas; Couturon, Emmanuel; Hamon, Perla; Rakotomalala, Jean-Jacques; Descombes, Patrick; Guyot, Romain; Crouzillat, Dominique

doi:10.1038/s41598-021-87419-0

Download PDF

Article
Open access
Published: 14 April 2021

The absence of the caffeine synthase gene is involved in the naturally decaffeinated status of Coffea humblotiana, a wild species from Comoro archipelago

Nathalie Raharimalala¹^na1,
Stephane Rombauts^2,11^na1,
Andrew McCarthy³,
Andréa Garavito^4,12,
Simon Orozco-Arias^5,6,
Laurence Bellanger⁸,
Alexa Yadira Morales-Correa⁴,
Solène Froger⁸,
Stéphane Michaux⁸,
Victoria Berry⁸,
Sylviane Metairon⁷,
Coralie Fournier⁷^nAff13,
Maud Lepelley⁸,
Lukas Mueller⁹,
Emmanuel Couturon¹⁰,
Perla Hamon¹⁰,
Jean-Jacques Rakotomalala¹,
Patrick Descombes⁷,
Romain Guyot^6,10 &
…
Dominique Crouzillat⁸

Scientific Reports volume 11, Article number: 8119 (2021) Cite this article

5046 Accesses
15 Citations
11 Altmetric
Metrics details

Subjects

Abstract

Caffeine is the most consumed alkaloid stimulant in the world. It is synthesized through the activity of three known N-methyltransferase proteins. Here we are reporting on the 422-Mb chromosome-level assembly of the Coffea humblotiana genome, a wild and endangered, naturally caffeine-free, species from the Comoro archipelago. We predicted 32,874 genes and anchored 88.7% of the sequence onto the 11 chromosomes. Comparative analyses with the African Robusta coffee genome (C. canephora) revealed an extensive genome conservation, despite an estimated 11 million years of divergence and a broad diversity of genome sizes within the Coffea genus. In this genome, the absence of caffeine is likely due to the absence of the caffeine synthase gene which converts theobromine into caffeine through an illegitimate recombination mechanism. These findings pave the way for further characterization of caffeine-free species in the Coffea genus and will guide research towards naturally-decaffeinated coffee drinks for consumers.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Introduction

Coffea humblotiana Baill., also called “Caféier de Humblot”, is the sole Coffea species endemic to the Comoro archipelago. It was probably consumed, even planted in the past on Grande Comore, a neighboring island of Mayotte in the archipelago, although the documentation on this subject remains very poor¹. Due to local expansion of agricultural land, this species is now classified as endangered^2,3. Currently, there are probably fewer than 110 trees surviving on Mayotte Island^3,4 (https://www.iucnredlist.org) while its presence on the other islands of the archipelago remains unsure (Fig. 1). C. humblotiana, belongs to the Coffea genus, comprising 124 admitted species^3,5,6, with a natural distribution covering tropical Africa, Madagascar, Comoros, Mauritius and the Reunion Islands, extending to southern and southeast Asia, and Australasia. It forms, with other Madagascan Coffea species, a large monophyletic clade separated from the African Coffea species an estimated 11.15 Mya⁷. All Coffea species are diploids with a chromosome number of x = 11, with the exception of C. arabica, which is an allotetraploid resulting from a recent cross between C. eugenioides and C. canephora⁸. The remarkable feature of C. humblotiana, is the complete absence of caffeine in seeds and leaves^9,10, shared by most species from Madagascar, the Mascarene Islands and some species from East and Central Africa⁷. Caffeine is produced in the young leaves and immature fruits¹¹ of African coffee species mainly, with a maximum level in C. canephora^10,12 (2.4–3.3% dmb). The caffeine biosynthesis pathway involves three methylation steps catalyzed by different N-methyltransferase genes (NMT); the XMT gene (xanthosine 7 N-methyltransferase), the MXMT gene (7-methylxanthine methyltransferase) and the DXMT gene^13,14, (3,7-dimethylxanthine methyltransferase or caffeine synthase). The three NMT genes are located on two distinct regions in C. canephora, on chromosome 1 for the DXMT gene and on chromosome 9 for the XMT and MXMT genes¹⁵. As output trait for the industry, caffeine as well as chlorogenic acid (CGA) compounds are of a great interest since they participate in producing metabolites which activates five human bitter taste receptors, therefore contributing to an inferior final cup quality^16,17. More particularly, the CGA degradation into phenol derivatives during roasting of coffee seeds contributes significantly to bitterness¹⁸.

In order to limit the amount of caffeine, different approaches have been followed such as RNA interference¹⁹ or interspecific crosses^20,21, but with contrasting success. By investigating the natural variability of caffeine contents in C. arabica, three plants were identified with almost no caffeine²². Here, molecular analyses have suggested both a combination of transcriptional regulation and a mutation in the DXMT gene as likely responsible for the very low caffeine content of these mutants²³, indicating that control of the caffeine synthesis pathway appears to be particularly complex²⁴. However, as of yet, despite molecular investigations, no genomic characterization had been undertaken in naturally caffeine-free coffee species to discover the origin of this absence of caffeine, which limits any efforts to develop strategies for transferring this trait into the cultivated species.

Here we report the nearly complete assembly of the Coffea humblotiana genome, a wild endangered caffeine-free species from the Comoro archipelago, comprising 422 Mb of genomic sequences and 32,874 predicted genes. Comparative analyses with the African C. canephora (C. canephora Pierre ex A.Froehner) genome revealed extensive genome conservation, despite an estimated 11 million years of divergence and wide variation in genome size. We postulate here that it is the loss of the Caffeine Synthase (DXMT) gene (which converts theobromine into caffeine) through an illegitimate recombination mechanism, which is likely involved in the absence of caffeine. This loss corroborates with our findings of the presence of theobromine in the leaves. Our analyses encompass unprecedented information characterizing the genome of a wild caffeine-free species in the Coffea genus and bring forth a significant contribution towards developing a naturally-decaffeinated coffee drink.

Results

Genome sequencing, chromosome-level assembly and annotation

The size of the C. humblotiana genome was previously estimated to be 475 Mb using flow cytometry²⁵. Based on 21 k-mer distribution of Illumina reads, a size of 406 Mb was predicted, with a heterozygous rate of 0.6% and a repeat frequency of 40% (Supplementary Figure S1). A de novo genome assembly was performed using 48 Gb of PacBio SMRT reads representing a coverage of about 102X (mean length = 7.6 Kb). This assembly produced 783 contigs for a total of 422 Mb with an N50 of 1.5 Mb (Table 1). To improve the contiguity of the assembly, Hi-C scaffolding was performed using 90 million paired-end reads of 150 bp. Finally, we obtained 390 scaffolds with a N50 of 29.6 Mb. The 11 largest scaffolds correspond to 88.7% (374.5 Mb) of the total size of the assembly and 92.5% of the k-mer-based genome size with pseudo-chromosome sizes ranging from 26,432,012 to 57,522,413 bp. Among the 390 scaffolds, 53 correspond to large and redundant fragments of the chloroplast genome, accounting all together for 2.8 Mb.

Table 1 Statistics for the C. humblotiana genome and gene annotation.

Full size table

The BUSCO score revealed a completeness of 90.3% with only 88 missing genes. The C. humblotiana assembly was also evaluated for its contiguity by estimating the LTR Assembly Index (LAI) from LTR Retriever²⁶. The C. humblotiana assembly shows a score of 10.73 indicating good contiguity of the assembly, while the C. canephora assembly has an LAI of 3.64, suggesting a greater contiguity for the C. humblotiana assembly as compared to C. canephora assembly. Finally, mapping genomic and transcriptomic Illumina reads on the final assembly using respectively Bowtie2²⁷ and Hisat2²⁸ produced an assembly completement estimate of 86.5–86.7% of alignment rates, while the RNA reads range between 86.8 and 88.6% (Supplementary Table S1). All estimations suggest good overall completeness of the assembly.

The C. humblotiana genome comprised 32,874 predicted genes (Fig. 2, Table 1), relatively more than the C. canephora gene composition (25,574) (Supplementary Table S2). The proteomes from C. canephora (Denoeud et al.¹⁵ 25,574), Arabidopsis (27,910) and the predicted gene set from C. humblotiana (32,874) were compared using OrthoFinder. This platform assigned 59,113 genes (68.5% of total) to 16,350 orthogroups. Fifty percent of all the genes were in orthogroups of 3 or more genes (G50 was 3) and were contained in the 9761 largest orthogroups (O50 was 9761). There were 11,382 orthogroups with all species present and 7021 of these consisted entirely of single-copy genes (Supplementary Figure S2).

Transposable elements account for approximately 35% of the genome (143,851,980 bp). LTR retrotransposons Gypsy and Copia represent 19% and 4% respectively. The Del family represents 10.3% of the genome. Besides Del, CRM accounts for 3.5% and Tork 1.7%. This is significantly much lower than for the C. canephora genome for which almost 60% of the genome is represented by transposable elements, of which about 25% is represented by the Del family (Supplementary Figure S3). The diversity of LTR retrotransposons lineages was studied using reverse transcriptase (RT)-based phylogenetic analysis. We recovered 1992 RT domains from the sequenced genome that were aligned and analyzed. At the overall lineage level, all Copia and Gypsy elements observed in C. canephora¹⁵ were also present in C. humblotiana (Supplemental Figure S3, S4). In order to compare the difference in LTR retrotransposon content in detail, we performed an RT-based phylogenetic analysis grouping together recovered RT domains of C. canephora and C. humblotiana. The phylogenetic trees show major differences in the number of RT domains and LTR retrotransposon families of different lineages (Supplemental Figure S5). Specific amplification of Del, TAT and CRM for Gypsy and SIRE, Tork and Bianca for Copia was evident for C. canephora (in blue, Supplementary Figure S5), while little specific amplification could be observed for C. humblotiana (in orange). This observation suggests that the differential amplification of several LTR retrotransposon families in C. canephora may have occurred since the divergence of the two species, although some eliminations may also have occurred via unequal homologous and illegitimate recombination²⁹.

The timing of LTR retrotransposon insertions was studied in the two genomes (Supplementary Figure S6). C. canephora and C. humblotiana genomes show different trends. A recent insertion of elements is observable between 0 and 1 My for C. canephora while a more gradual insertion of elements is visible between 0.5 and 5 My for C. humblotiana. In C. canephora, this activity is mainly due to Del, CRM, SIRE, Athila and TAT lineages. In C. humblotiana, most of the recent insertions are due to Del, while the insertion of CRM, Athila and TAT decrease in the last 0.5 My. Almost no recent insertion of SIRE is detected.

Comparative genomic analysis of C. humblotiana

The C. humblotiana and C. canephora assemblies were globally aligned using i-ADHoRe³⁰ and D-genies³¹ (Fig. 3; Supplementary Figure S7). The overall chromosomal structure appears particularly well conserved at this level, suggesting a high degree of synteny, despite the lower quality of the C. canephora assembly¹⁵ (80% of the 710 Mb genome is assembled with an N50 scaffold of 26 Mb and only 51% of the scaffolds anchored onto pseudo-chromosomes). At the level of pseudo-chromosomes, conservation is restricted to the distal part of the sequences, while the putative centromere and pericentromeric regions showed few occurrences of conservation or fragmented ones (Fig. 3; Supplementary Figure S7). In order to make a comparison with a better-known genomic sequence of interest, the orthologous region bearing the Coffee Leaf Rust resistance locus S_H3 in C. liberica (C. liberica W.Bull ex Hiern) and characterized in C. arabica (C. arabica L.), was retrieved from the reference (C. canephora) and from the C. humblotiana genomes, using the reported BAC clone sequences³² as a guide. A 727-Kb sequence comprised between positions 3,118,174 and 3,846,040 bp, and an 820-Kb sequence comprised between positions 3,208,179 and 4,028,477 bp were identified at the S_H3 orthologous region and were extracted respectively from the C. humblotiana and the C. canephora genomes. A graphical comparison shows a very high collinearity of the two segments, with most genes having an orthologous pair and a high-protein identity, with the notable exception of the genes from C. humblotiana present in the corresponding C. canephora gaps (Fig. 3D).

Evolution of N-methyltransferase genes related to caffeine synthase in C. humblotiana

By using BLAST analysis with NMT proteins related to caffeine synthase sequences annotated from C. canephora^13,15 as queries against the C. humblotiana-annotated genome, we identified and manually inspected six C. humblotiana genes which were highly similar to these NMTs (Table 2; Supplementary Data S1; FASTA protein sequences of recovered genes). NMT genes were also searched and manually corrected in the assembled genome of Gardenia jasminoides³³, (Gardenia jasminoides J.Ellis) a Rubiaceae species in the sub-family Ixoroideae³⁴ for which only one complete gene was recovered: Gj9, similar to GjNMT2 (Gj9A1032T108; pseudomolecule Gardenia 9). Two other genes were also located on pseudomolecule 1 (similar to GjA458T26 and GjA458T26), but precise manual annotation revealed stop codons and frameshifts for both of them. Besides the poorer quality of the genome sequence, one probable explanation is that these genes are probably non-functional.

Table 2 List of annotated and classified NMT genes.

Full size table

The protein sequences of all recovered genes were used for a phylogenetic analysis (Fig. 4A,B). It showed that each NMT protein in C. canephora was conserved in C. humblotiana (i.e. XMT, MXMT, MTL, NMT3 and NMT2) except for the DXMT protein. This gene is absent from the C. humblotiana assembled genome, and BLAST analysis against the sequence outputs of each step of the genome-assembly procedure failed to identify this gene. Except for DXMT, the phylogenetic analysis indicated that the NMT2 proteins assumed a basal position for the XMT, MXMT, MTL and NMT3 groups. Interestingly, Gardenia has only one complete NMT-like gene, NMT2.

It is peculiar to note how few RNA-seq reads from leaf tissue map to XMT and MXMT genes while numerous reads map to MTL in C. humblotiana (data not shown). This situation is different from C. canephora since XMT, MXMT and MTL were found highly transcribed in leaves¹⁵.

A sequence alignment of the C. canephora and C. humblotiana NMT proteins, and their comparison with the three-dimensional structures of C. canephora DXMT and XMT (PMID: 17434991) confirmed that all the genes identified contain the highly-conserved SAM binding motifs necessary for catalysis (Fig. 4C). However, only two of the C. humblotiana NMT proteins identified, XMT (cohum09g08800) and MXMT (cohum09g08760), are likely to be functional. The first NMT in the pathway, XMT (cohum09g08800), is clearly identifiable by the presence of a highly conserved serine (Ser-316 in CcXMT1) which is required for xanthosine recognition. The second NMT gene in the pathway, MXMT (cohum09g08760), can be annotated as an MXMT gene by the presence of a phenylalanine (Phe-266 in CcDXMT1), which prevents theobromine binding at the active site. The other four potential NMT genes are not likely to be involved in the caffeine biosynthesis. NMT2 (cohum02g11490) and NMT3 (cohum09g08830) are probably inactive due to large destabilizing deletions in their core protein fold, while NMT3 (cohum09g8820) contains a large and poorly conserved C-terminal extension that would also impact overall protein stability and activity. Lastly, the variable N-terminal sequence of MTL (cohum09g08730), compared with functional NMTs, and the absence of a highly-conserved tyrosine (Y18 in CcDXMT1) positioned adjacent to the active site and proposed to facilitate catalysis, both suggest that this NMT is not able to support methyl-transferase activity. To summarize, only two NMT proteins in the caffeine biosynthesis pathway of C. humblotiana are likely to be functional from a sequence comparison, suggesting that the biosynthesis pathway would stop at theobromine, corroborating the observed presence of theobromine.

We conducted a microsynteny analysis between C. canephora and C. humblotiana to identify the evolutionary mechanisms that have shaped the regions of the NMT genes. In C. canephora, three loci carry the DXMT, NMT2, XMT, MXMT, MTL and NMT3 genes on chromosomes 1, 2 and 9. The DXMT gene (cc01_g00720) is located on C. canephora chromosome 1 (positions 1,210,274–1,212,695) but it is absent in C. humblotiana. Microsynteny between the DXMT locus C. canephora (CC; the chromosome 1 position 1.1–1.3 Mb) and C. humblotiana (CH; chromosome 1 position 0.4–0.6 Mb) revealed the complete absence of a segment of 76 kb in C. humblotiana (Fig. 5). Interestingly, this segment carrying the DXMT gene in C. canephora, is flanked by a duplication of about 10 kb in C. canephora, while this region is present in only one copy in C. humblotiana. This observation suggests that the presence or absence of the DXMT locus might be due to either a large insertion (in C. canephora) or a deletion (in C. humblotiana) mechanism. However, the microsynteny between the G. jasminoides ortholog locus (Pseudochromosome 5, positions 2.2–2.5 Mb) and C. canephora also showed the insertion in C. canephora flanked by a duplication. Although the locus and gene are absent in G. jasminoides, the duplication is still kept in G. jasminoides, albeit degenerated. These observations may suggest several possible scenarios for the presence or absence of the DXMT locus, including an insertion in the ancestor of C. canephora after the C. canephora/C. humblotiana divergence, or an insertion in the ancestor of both C. canephora and C. humblotiana followed by a deletion in C. humblotiana (Supplementary Figure S8). Deletion in C. humblotiana could be due to a mechanism of illegitimate recombination (IR). IR produces a specific signature, with the presence of direct repeats flanking the duplicated regions or the presence of only one direct repeat associated with the deletion³⁵.

A similar microsynteny analysis was performed at the MXMT/XMT locus on the C. humblotiana chromosome 9 (position 6.5–6.9 Mb). This locus is syntenic to the caffeine gene synthase cluster in C. canephora chromosome 9 (positions 8.1–8.35 Mb). However, the microsynteny is largely altered by numerous and different tandem duplications of NMT genes and insertion of transposable elements (Fig. 5). In the C. humblotiana MXMT/XMT locus, nine NMT genes constitute a cluster of tandem duplication of which four genes may be classified as pseudogenes. Pseudogenes correspond either to isolated exons of NMT genes lacking starting codons or to a complete gene (located at 6,789,033–6,795,587 bp) but displaying stop codons and a large insertion of a full-length LTR retrotransposon in the third intron of the pseudogene. The NMT tandem gene cluster in C. humblotiana is interrupted by frequent insertions of transposable elements. Moreover, the percentage of transposable elements reaches 63.4% in this region. The C. canephora syntenic regions follow the same trend with five NMT genes in tandem, two pseudogenes and 60.4% of transposable elements. It should be noted, however, that the MXMT gene in the C. canephora genome is not anchored to this locus but is still in an unanchored contig in the publicly available C. canephora genome assembly¹⁵. Furthermore, the C. canephora locus showed two large gaps due to anchoring difficulties during the genome-assembly step (Fig. 5). Recent improvements of the C. canephora genome assembly indicates the presence of the MXMT gene on the chromosome 9 locus (data not shown, personal communication).

The conservation of the NMT2 locus was also investigated, despite the fact that the NMT2 protein was not reported as participating in the caffeine-biosynthesis pathway. The C. humblotiana NMT2 locus (chromosome 2 position 7.5–7.7 Mb) was found highly syntenic with C. canephora (chromosome 2 position 7.2–7.4 Mb) (Supplementary Figure S9). In addition, a high degree of conservation was also observed between the C. humblotiana NMT2 locus and G. jasminoides chromosome 9 (positions: 103,200,000–103,300,000 bp)³³ (Supplementary Figure S10).

Biochemical analysis of caffeine and theobromine contents

Biochemical analyses were carried out on young leaves to characterize the chemical composition of the coffee species C. canephora, C. arabica and C. humblotiana using three different accessions for each. The results show significant contrasting quantities of caffeine and chlorogenic acids (CGA) among these three coffee species (Table 3 and Supplementary Table S3). The data confirms the absence of caffeine and the lower quantities of CGA in C. humblotiana (Table 3). These findings confirm previous studies based on biochemical analysis on seed¹⁰, which underlined that the biochemical diversity is related to the species analyzed. In this regard, the confirmed absence of caffeine in C. humblotiana was the main interest for our study.

Table 3 Caffeine, theobromine and total chlorogenic acids contents on young leaves of C. arabica, C. canephora and C. humblotiana.

Full size table

Discussion

The C. humblotiana genome is highly collinear with C. canephora

The C. humblotiana genome sequence reported here is the second published complete genome in the genus Coffea after C. canephora¹⁵ and third in the large Rubiaceae family (13,000 species³⁶) after Gardenia jasminioides³³. This assembly reveals overall good conservation of genomic collinearity with the African species C. canephora, despite an estimated genome divergence⁷ of more than 11 million years and a significant difference in genome-size. A small number of blocks show local inversions or rearrangements, but due to the different technologies used for genome sequencing and assembly, it is premature to conclude on the occurrence of these evolutionary events. A more complete assembly of C. canephora will be needed to allow a more precise synteny analysis. Whole genome duplications are expected to contribute to the evolution of gene function, but also promote genome modifications including gene losses and chromosome rearrangements³⁷. This overall conservation of the collinearity between the two genomes is probably attributable to the absence of any recent whole genome duplication since the γ triplication at the origin of the Eudicots, in coffee trees and probably in Rubiaceae¹⁵, explaining its ancestral synteny with grapevine, two species that diverged 114–125 Mya³⁸. The direct implication of the good conservation of the genomic collinearity is the ease of microcolinearity analyses in regions of interest between C. canephora and C. humblotiana, and a better understanding of their evolution and (re)organization over 11 My of divergence. We show that the S_H3 region, which confers resistance to Coffee Leaf Rust³² (Hemileia vastatrix) in Coffea liberica and recently characterized in C. arabica, is strongly collinear in a syntenic block. This indicates that comparative genome analysis could provide crucial information in the evolution of the region, since the gene responsible for this resistance had not yet been characterized.

LTR retrotransposons are mainly responsible of genome size variations

The genome size of Coffea species varies from 0.96 to 1.84 pg per 2C³⁹. The smallest genomes come from East African species, Comoros and Indian Ocean islands, while the largest genomes are from West Africa and southeast Madagascar, suggesting gradients of genome size increase from East to West in Africa and North to Southeast in Madagascar³⁹.

Previous studies, based on partial genome sequencing of eleven Coffea species, with 454 technology, have suggested the role of two LTR retrotransposon lineages, Del and Sire, in genome size variations and divergences⁴⁰. However, the mechanisms underlying the variation could not be investigated due to the small amount of data produced. Here the comparison of the genome content in terms of transposable elements clearly indicates a higher number of LTR Retrotransposon of the Gypsy superfamily in C. Canephora with recent insertions of some TE lineages such as the Del, TAT, Athila and CRM. Our RT-based phylogenetic analysis highlights the expansion of these lineages that are accompanied by a diversification into new subfamilies for each lineage in C. canephora. Particularly, Del shows an interesting expansion in C. canephora, representing nearly 25% of the genome while only 10% in C. humblotiana. Interestingly, recent insertions of Del are also found in C. humblotiana, highlighting the activity of this TE lineage is accompanied by less accumulation of new copies, probably by mechanisms counter-acting its expansion. Of particular interest is the activity of the SIRE (Copia LTR retrotransposon) lineage. Showing few new insertions and few copies in C. humblotiana, SIRE demonstrates recent insertions and a diversification of new subfamilies in C. canephora. Preliminary studies have suggested that this lineage is virtually absent in Madagascar and Indian Ocean island Coffea species, marking its association mainly with the genomes of African Coffea species.

The C. humblotiana genome also shed light on the evolution of genes involved in the caffeine biosynthesis pathway

Gene annotation and detailed microsynteny analyses with C. canephora revealed the absence of a segment of 76 kb containing the DXMT gene on chromosome 1 that converts theobromine to caffeine, probably due to an illegitimate recombination. This deletion is likely responsible for the lack of caffeine in C. humblotiana tissues, although it may also be the evolutionary consequence of a more complex mechanism. The absence of the caffeine synthase activity is supported by the detection of theobromine—the precursor of caffeine—in leaves, since the two first NMT genes located on the caffeine pathway (i.e. XMT and MXMT) appear functional, converting xanthosine to theobromine. However, in caffeine-free C. arabica mutants, higher levels of theobromine contents were detected in leaves²². Similarly, a high accumulation of theobromine was also detected in young shoots of Hongyacha, a naturally caffeine-free tea plant⁴¹. Unlike C. humblotiana, the DXMT gene is present in both C. arabica mutants and Hongyacha. In C. arabica mutants with low caffeine content, a mutation in the DXMT gene apparently dramatically reduces its caffeine synthase activity. A low enzymatic activity of caffeine synthase has even been detected in fruits, suggesting that some transcripts were able to be translated to functional proteins²⁴. In Hongyacha, the Tea caffeine synthase gene (TCS) is unable to produce caffeine but keeps a reduced activity of theobromine synthase since the TCS gene has a dual activity, being able to add methyl groups on both 7-methyxanthosine and theobromine. One possible explanation for the reduced theobromine content in C. humblotiana leaves compared to C. arabica mutants leaves would be a similar dual-functional activity of the caffeine synthase protein as the TCS gene⁴². Hence, a dual activity of the caffeine synthase gene (DXMT) has been reported in C. arabica, where it is able to add methyl groups on both 7-methyxanthosine and theobromine. Therefore, the simple absence of the DXMT gene in the C. humblotiana genome would lead to both the reduction of the theobromine accumulation and the absence of caffeine, while the mutation of the DXMT gene in C. arabica would only lead to a partial loss of its function. Contradictory to these studies, Maluf and coworkers identified a decrease of theobromine synthase transcripts accumulation in different fruit stages of C. arabica mutants, hypothesizing a mechanism of feedback regulation activated by the accumulation of theobromine²³. Such feedback regulation of the caffeine biochemical pathway was recently highlighted in tea leaves (Camelia sinensis) in which the pathway is fully functional⁴³. In C. humblotiana a possible feedback regulation could participate to the reduction of theobromine accumulation since very few transcriptions of the XMT and MXMT genes was observable using leaves RNA-seq read mapping. Further analysis of co-expression of transcript/metabolic networks including theobromine degradation metabolites may be needed to decipher the complex regulation of the caffeine pathway in C. humblotiana.

Apart from the absence of the DXMT gene, there are few alternative hypotheses able to explain the absence of caffeine in C. humblotiana. In Coffea millotii J.-F.Leroy and Coffea perrieri Drake ex Jum. & H. Perrier, two Mascarocoffea species, the biosynthesis of purine alkaloid stops at the 7-methylxanthine step formation and both the theobromine synthase and caffeine synthase activities are missing⁴⁴. These absences are accompanied by the reduction of the total purine alkaloid biosynthesis activity with a shift of purine metabolites toward purine catabolism. Similar results were observed with anti-sense and RNA interference transgenic plants of C. canephora MXMT gene⁴⁵. A such initial shift in the purine metabolites pathway would have left the possibilities of subsequent accumulation of neutral mutations such as a deletion of the DXMT gene.

Based on our comparative genome data, we hypothesized that the absence of caffeine in C. humblotiana would derive from a direct ancestor that was capable of producing this alkaloid. This is intriguing, since the complete absence of caffeine is shared by geographical groups of the wild coffee species from Indian Ocean islands, East Africa and a group of ancestral species formally called the genus Psilanthus, suggesting that the ancestor of all coffee trees would be caffeine free^7,46. However, some exceptions exist in geographical groups such as in Africa and in the Indian Ocean island, but not in the former Psilanthus genus. For example, the Central African Coffea charrieriana Stoff. & F.Anthony does not synthetize caffeine⁴⁷. On the other hand, a very low content (0.07%) of caffeine was reported in Coffea mauritiana Lam. seeds⁴⁷, and two species from Madagascar (Coffea kianjavatensis J.-F.Leroy and Coffea lancifolia A.Chev. subsp. auriculata) showed significant amounts of caffeine (0.55% and 0.81% dmb)⁴⁸. These findings do not question the caffeine-free ancestor hypothesis at this stage, but may suggest more complex mechanisms to explain the absence of caffeine in coffee species in Africa and Madagascar and possibly different genetic causes. Further genomic studies in the Coffea genus are necessary to evaluate the presence of functional NMT genes. This will help to construct a robust hypothesis on the origin and timing of evolutionary mechanisms targeting NMT genes in caffeine-free Coffea species and build an evolutionary model.

The microcolinearity of the XMT/MXMT locus on chromosome 9 revealed a specific pattern of evolution linked to the NMT genes, with tandem duplication and pseudogenization of copy in a transposable element-rich environment (60%). The orthologous regions in the G. jasminoides genome³³ show a comparable organization, suggesting a common pattern of evolution similar to the Birth-and-Death model of resistance-gene clusters (NBS-LRR type) in plants⁴⁹. It involves unequal intergenic crossing-over mechanisms, generating losses and duplications as well as mutations that might explain genetic changes on chromosome 9 NMT regions. Similar to resistance gene clusters, NMT tandem gene copies are associated with a rich transposable-element environment.

In conclusion, the assembly of C. humblotiana generated in the frame of the present study provides the first high-quality reference genome for the Coffea genus. It provides valuable information for promoting the preservation of the diversity of this wild species in its environment and it represents a perfect resource for genomic and evolutionary studies on Coffea and Rubiaceae. It is also of interest in helping to develop new strategies for characterizing coffee cup-quality traits. However, additional and substantial efforts will be necessary to develop strategies for improving coffee cultivars. In this regard, the absence of caffeine and the low CGA content of C. humblotiana beans, which together strongly reduce the bitterness of coffee beverage, is an interesting avenue to follow. However, this species may contain another bitter compound: cafamarine (glycosidic diterpene)^50,51, the content of which will need to be understood and controlled to open the door to new sensory experiences for consumers in the future.

Methods

Plant material and biochemical analysis

All the plant material used in this study is compliant with the Nagoya protocol and with the Access and Benefit-sharing Clearing-House (ABSCH-IRCC-FR-254781-1). Three genotypes were analyzed for each Coffea species (C. arabica, C. canephora and C. humblotiana). For C. arabica we selected GPFA03, ET39 and GPFA107, for C. canephora the samples were coded as 16M, FRT95 and FRT141-8 and for C. humblotiana three plants from bulk seedlings were randomly chosen (Supplementary Table S3). Samples were growing in greenhouse. Samples from young leaves (2–3 weeks old) were ground in liquid nitrogen and 500 mg of the resulting powder were add to 70 ml of a solution of 70% methanol for 30 min at 40 °C. Caffeine, theobromine and chlorogenic acids (CGA) were analyzed using the Dionex HPLC U3000. Two mobile phases were used: (1) an aqueous solution containing 8% acetonitrile and 0.1% formic acid (Mobile Phase 1); and (2) an aqueous solution containing 50% acetonitrile and 0.1% formic acid (Mobile Phase 2). Samples and standard solutions (10 µl) were analyzed at 30 °C using a gradient elution with flow rate of 0.8 µl/min and UV detection at 272 nm and 325 nm wavelengths, corresponding to, respectively, caffeine and theobromine and CGA. Quantification is achieved by peak area measurement and comparison with standards.

Genome and RNA sequencing

High-molecular-weight genomic DNA was extracted from fresh leaf tissue of C. humblotiana (accession RM-CF-00679). The ^plant conserved at the International Coffee Collection at La reunion was collected in 2010 in Mayotte (Mont Bénara). Long reads were obtained using PacBio RSII technology (63 SMRT cells). Forty-eight Gb of the sequences representing a coverage of about 102× (mean length 7.6 Kb) were generated. In addition, 24 Gb of 2 × 100 bp paired-end Illumina (Hiseq 2500) sequences (51 × coverage) were produced. The same tissue from the same genotype (RM-CF-00679) was also used to construct a library for Hi-C analysis and 90 million paired-end reads of 150 bp were obtained from the Illumina platform. In addition, RNA-seq was performed by extracting RNA from fresh leaves, and sequencing libraries were prepared using the Truseq Stranded Kit from Illumina according to the manufacture protocol. The libraries were sequenced using a HiSeq 2500 Illumina platform (2 × 150 bp) and 63,917,118 reads representing 9.5 Gb were generated. Genome quality reads and RNA Illumina reads were evaluated using FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Truseq sequence adapters were removed using Trimmomatic V.0.39⁵².

Genome assembly

The size and heterozygosity of the C. humblotiana genome were estimated via the 21-mer depth distribution of Illumina reads. In detail, the k-mers were counted using Jellyfish⁵³ and the k-mer count histogram was analyzed using Genoscope (http://qb.cshl.edu/genomescope/). A de novo genome assembly was performed using the FALCON assembler⁵⁴. PBJelly⁵⁵ was used for scaffolding and gap-closing the FALCON primary contigs and finally, assembly-polishing was performed using the Illumina data produced to correct potential errors in low-quality regions. Ninety-four percent of the Illumina reads map to the PBJelly2 contigs, 89% of those with the correct orientation and within the expected insert-size range. The assembly was performed by Computomics (https://computomics.com), while Hi-C was performed by Dovetail Genomics (https://dovetailgenomics.com). The quality of the genome assembly was estimated by searching for Benchmarking Universal Single-Copy Orthologs (BUSCO v4.0; https://busco.ezlab.org) with Embryophya odb 9. The C. humblotiana assembly was also evaluated for its contiguity by estimating the LTR Assembly Index (LAI) from LTR Retriever²⁶. The genome-completement assessment was also conducted by mapping genome and RNA reads to the assembly using Bowtie2²⁷ and Hisat2²⁸.

Genome annotation and gene prediction

The genome after-assembly and scaffolding with Hi-C was validated using a collinearity analysis with C. canephora. This revealed that some chromosomes needed to be reoriented to be in accordance with the chromosome orientation from C. canephora. This version of the genome was masked with RepeatMasker⁵⁶ using a repeat library made with REPET from the C. humblotiana genome.

The trained models for AUGUSTUS⁵⁷ for C. humblotiana were retrieved from BUSCO as it trains AUGUSTUS to achieve more precise results, using the mappings from its data set as a reference set. These files need minor intervention in order to make them generally useful for gene prediction. To achieve more precise prediction with AUGUSTUS, special care was taken to produce “hints”. These hints come from protein mapping and RNAseq produced in the frame of the project. The proteins were C. canephora proteins collected from NCBI, which were mapped with GenomeThreader⁵⁸ (with parameters set as: − mincoverage 0.65 − minalignmentscore 0.7 − species Arabidopsis) onto the genome of C. humblotiana. The GenomeThreader output was converted into GFF3, and only CDS and intron features were kept as protein hints (source = P) for AUGUSTUS.

The RNAseq reads were cleared of any adaptors sequences and merged into longer SE reads based on their overlap (min 30 bp) into joined reads. Reads that did not comply with the requirements for joining were left as PE-reads. These reads were then mapped onto the genome using HISAT2 (-k 1 -no-unal -max-intronlen 35000 -min-intronlen 40) as single-end reads (also the remaining PE-reads). The resulting BAM file was processed with RegTools to extract junctions to bed format. This bed-file was converted into a GFF3 format and filtered for junctions with a minimum coverage of 10. From the resulting GFF3 file intron features were given as hints while the remaining exon bits on either side of the introns were given as an exon part (source = E). The accompanying parameter file for extrinsic data for AUGUSTUS was adapted to include these hints as well as to mask the genomic sequence. AUGUSTUS was run as augustus –species = $species –hintsfile = $hint –extrinsicCfgFile = extrinsic.MPE.cfg –softmasking = 1 $fasta.

The resulting gene predictions from AUGUSTUS were further curated with EvidenceModeler⁵⁹ using the same extrinsic data. EVM managed this way to clean a few predicted genes with ambiguous structure and no support. The final predicted gene set (32,874) was subsequently processed to be uploaded into ORCAE (https://bioinformatics.psb.ugent.be/orcae/).

The proteomes from C. canephora (v 2014, 25,574), Arabidopsis thaliana (27,910) and the predicted gene set from C. humblotiana (32,874) were compared using OrthoFinder⁶⁰ with default parameters.

Repeats were de novo identified using different approaches. First an ab initio identification was performed with REPET (V.2.5; https://urgi.versailles.inra.fr/Tools/REPET) leading to the identification of 430 consensuses after removing chimeric sequences and potential host genes. The final library comprises 3 helitrons (DHX), 168 DNA transposons (DTX), 79 unclassified transposons (DXX), 63 unclassified elements, 31 LINEs, 66 LTR retrotransposons, 6 SINEs, 14 unclassified retrotransposons. RepeatMasker was used with default parameters to mask the genomes and get statistics. LTR_STRUC⁶¹, LTR Retriever²⁶ and Inpactor⁶² were used to recover and they classified full-length LTR retrotransposons. A circular annotation plot was performed using ShinyCircos⁶³. A phylogenetic analysis of RT domains was performed as in Yu and coworkers⁶⁴. The timing of LTR retrotransposon insertions was estimated as in Orozco-Arias et al.⁶² with full-length LTR retrotransposon recovered by LTR_STRUC and an average base substitution rate of 1.3E–8²⁹.

Comparative genome analysis

Coffea canephora and C. humblotiana genomes were globally compared using a dot plot as implemented in D-genies with the Minimap2 Aligner³¹ and i-ADHoRe³⁰.

In order to retrieve the sequence containing the S_H3 locus and the corresponding annotations from the C. canephora and C. humblotiana genomes, the region was manually reconstructed using five BAC clones belonging to each Coffea arabica sub-genome, as previously described³². Once reconstructed, the sequences were graphically aligned using Dotter⁶⁵, and the common segment was extracted using the Extractseq function from the EMBOSS suite⁶⁶ and then used to identify the equivalent coordinates on the C. humblotiana and C. canephora genomes using BLASTN⁶⁷ with the default parameters. The region was extracted from each genome based on the coordinates found, using the Extractseq function, and the corresponding gene annotations were retrieved using the Bedtools intersect function⁶⁸. Protein and CDS sequences from the region for each genome were also retrieved and were used to find orthologous gene pairs with the get_homologues-est.pl script from the Get_homologues program⁶⁹, with the OrthoMCL option (-M). Additionally, a matrix of pairwise protein identity was obtained using the Clustal omega program⁷⁰. Graphical comparisons between the two segments were completed using the Genoplotr R package.

Phylogenetic analysis (NMT proteins)

Phylogenetic analyses of NMT genes were conducted using Seaview⁷¹. Proteins were aligned with Muscle and trees were computed using PhyML (Model LG, Branch support aLRT (SH-like); tree searching operation: NNI).

NMT protein-sequence alignment and structural analysis

Sequence alignments were produced with Clustal Omega⁷⁰ and ESPript 3.0⁷².

Data availability

Genome assembly information has been deposited at NCBI under the Bioproject ID: PRJNA665152. We built a C. humbotiana genome website at SGN (https://solgenomics.net) and ORCAE (https://bioinformatics.psb.ugent.be/orcae/), providing data download, Blast and genome browser. All data that support the findings of this study are also available from the corresponding authors upon request.

References

Charrier, A. Rapport de Mission aux Comores. 1–8 (1971).
International Union for Conservation of Nature (IUCN). The IUCN Red List of Threatened Species (2020).
Davis, A. P. et al. High extinction risk for wild coffee species and implications for coffee sector sustainability. Sci. Adv. 5, eaav3473 (2019).
Article ADS PubMed PubMed Central Google Scholar
Chadburn, H. & Davis, A. P. Coffea humblotiana, Caféier de Humblot. The IUCN Red List of Threatened Species 2017 e.T108652718A108665565 (2017).
Guyot, R. et al. WCSdb: A database of Wild Coffea Species. Database (2020).
WCSP. World Checklist of Selected Plant Families (Royal Botanic Gardens, 2018).
Google Scholar
Hamon, P. et al. Genotyping-by-sequencing provides the first well-resolved phylogeny for coffee (Coffea) and insights into the evolution of caffeine content in its species: GBS coffee phylogeny and the evolution of caffeine content. Mol. Phylogenet. Evol. 109, 20 (2017).
Article CAS Google Scholar
Yu, Q. et al. Micro-collinearity and genome evolution in the vicinity of an ethylene receptor gene of cultivated diploid and allotetraploid coffee species (Coffea): Recent speciation event of Coffea arabica. Plant J. 67, 305–317 (2011).
Article CAS PubMed Google Scholar
Bertrand,. Sur la Composition Biochimique du Café de la Grande Comore 162–164 (G. Masson, 1901).
Google Scholar
Campa, C., Doulbeau, S., Dussert, S., Hamon, S. & Noirot, M. Diversity in bean Caffeine content among wild Coffea species: Evidence of a discontinuous distribution. Food Chem. 91, 633–637 (2005).
Article CAS Google Scholar
Ashihara, H. Metabolism of alkaloids in coffee plants. Braz. J. Plant Physiol. 18, 1–8 (2006).
Article CAS Google Scholar
Ky, C. L. et al. Caffeine, trigonelline, chlorogenic acids and sucrose diversity in wild Coffea arabica L. and C. canephora P accessions. Food Chem. 75, 223–230 (2001).
Article CAS Google Scholar
Perrois, C. et al. Differential regulation of caffeine metabolism in Coffea arabica (Arabica) and Coffea canephora (Robusta). Planta 241, 179–191 (2014).
Article PubMed PubMed Central CAS Google Scholar
Ashihara, H., Sano, H. & Crozier, A. Caffeine and related purine alkaloids: Biosynthesis, catabolism, function and genetic engineering. Phytochemistry 69, 841–856 (2008).
Article CAS PubMed Google Scholar
Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345, 1181–1184 (2014).
Article ADS CAS PubMed Google Scholar
Lang, T. et al. Numerous compounds orchestrate coffee’s bitterness. J. Agric. Food Chem. 68, 6692–6700 (2020).
Article CAS PubMed Google Scholar
Tran, H. T., Lee, L. S., Furtado, A., Smyth, H. & Henry, R. J. Advances in genomics for the improvement of quality in coffee. J. Sci. Food Agric. 96, 3300–3312 (2016).
Article CAS PubMed Google Scholar
Leloup, V., Louvrier, A. & Liardon, R. Degradation Mechanisms of Chlorogenic Acids During Roasting. 192–198 (1995).
Ogita, S., Uefuji, H., Yamaguchi, Y., Koizumi, N. & Sano, H. Producing decaffeinated coffee plants. Nature 423, 823–823 (2003).
Article ADS CAS PubMed Google Scholar
Charrier, A. La structure genetique des cafeiers spontanes de la region Malgache et leurs relations avec les cafeiers d’arigine Africaine (Eucoffea). (1978).
Mazzafera, P. & Carvalho, A. Breeding for low seed caffeine content of coffee (Coffea L.) by interspecific hybridization. Euphytica 59, 55–60 (1991).
Article CAS Google Scholar
Silvarolla, M. B., Mazzafera, P. & Fazuoli, L. C. A naturally decaffeinated arabica coffee. Nature 429, 826 (2004).
Article ADS CAS PubMed Google Scholar
Maluf, M. P. et al. Altered expression of the caffeine synthase gene in a naturally caffeine-free mutant of Coffea arabica. Genet. Mol. Biol. 32, 802–810 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mazzafera, P., Baumann, T. W., Shimizu, M. M. & Silvarolla, M. B. Decaf and the steeplechase towards decaffito—the coffee from caffeine-free Arabica plants. Trop. Plant Biol. 2, 63–76 (2009).
Article CAS Google Scholar
Razafinarivo, N. J. et al. Genetic structure and diversity of coffee (Coffea) across Africa and the Indian Ocean islands revealed using microsatellites. Ann. Bot. 111, 20 (2013).
Article CAS Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. https://doi.org/10.1093/nar/gky730 (2018).
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ma, J. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl. Acad. Sci. 101, 12404–12410 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Proost, S. et al. i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 40, e11–e11 (2012).
Article CAS PubMed Google Scholar
Cabanettes, F. & Klopp, C. D-GENIES: Dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958 (2018).
Article PubMed PubMed Central CAS Google Scholar
Ribas, A. F., Cenci, A., Combes, M. C., Etienne, H. & Lashermes, P. Organization and molecular evolution of a disease-resistance gene cluster in coffee trees. BMC Genom. 12, 20 (2011).
Article CAS Google Scholar
Xu, Z. et al. Tandem gene duplications drive divergent evolution of caffeine and crocin biosynthetic pathways in plants. BMC Biol. 18, 1–14 (2020).
Article CAS Google Scholar
Ly, S. N. et al. Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae. PLoS One 15, e0232295 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wicker, T., Yahiaoui, N. & Keller, B. Illegitimate recombination is a major evolutionary mechanism for initiating size variation in plant resistance genes. Plant J. 51, 631–641 (2007).
Article CAS PubMed Google Scholar
Davis, A. P. et al. A global assessment of distribution, diversity, endemism, and taxonomic effort in the Rubiaceae. Ann. Missouri Bot. Garden 20, 68–78 (2009).
Article Google Scholar
Wendel, J. F., Jackson, S. A., Meyers, B. C. & Wing, R. A. Evolution of plant genome architecture. Genome Biol. 17, 37 (2016).
Article PubMed PubMed Central CAS Google Scholar
Guyot, R. et al. Ancestral synteny shared between distantly-related plant species from the asterid (Coffea canephora and Solanum Sp.) and rosid (Vitis vinifera) clades. BMC Genom. 13, 103 (2012).
Article CAS Google Scholar
Razafinarivo, N. J. et al. Geographical gradients in the genome size variation of wild coffee trees (Coffea) native to Africa and Indian Ocean islands. Tree Genet. Genomes 8, 1345–1358 (2012).
Article Google Scholar
Guyot, R. et al. Partial sequencing reveals the transposable element composition of Coffea genomes and provides evidence for distinct evolutionary stories. Mol. Genet. Genom. 291, 1979–1990 (2016).
Article CAS Google Scholar
Jin, J.-Q. et al. Hongyacha, a naturally caffeine-free tea plant from Fujian, China. J. Agric. Food Chem. 66, 11311–11319 (2018).
Article CAS PubMed Google Scholar
Mizuno, K. et al. Isolation of a new dual-functional caffeine synthase gene encoding an enzyme for the conversion of 7-methylxanthine to caffeine from coffee (Coffea arabica L.). FEBS Lett. 534, 75–81 (2003).
Article CAS PubMed Google Scholar
Deng, C. et al. Metabolite and transcriptome profiling on xanthine alkaloids-fed tea plant (Camellia sinensis) shoot tips and roots reveal the Complex metabolic network for caffeine biosynthesis and degradation. Front. Plant Sci. 11, 551288 (2020).
Article PubMed PubMed Central Google Scholar
Deng, W. W., Rakotomalala, J.-J., Nagai, C. & Ashihara, H. Caffeine biosynthesis and purine metabolism in leaves of mascarocoffea species. Eur. Chem. Bull. 6, 223 (2017).
Article CAS Google Scholar
Ashihara, H. et al. Caffeine biosynthesis and adenine metabolism in transgenic Coffea canephora plants with reduced expression of N-methyltransferase genes. Phytochemistry 67, 882–886 (2006).
Article CAS PubMed Google Scholar
Davis, A. P., Tosh, J., Ruch, N. & Fay, M. F. Growing coffee: Psilanthus (Rubiaceae) subsumed on the basis of molecular and morphological data; implications for the size, morphology, distribution and evolutionary history of Coffea: Psilanthus subsumed in Coffea. Bot. J. Linn. Soc. 167, 357–377 (2011).
Article Google Scholar
Hamon, P. et al. Caffeine-free species in the genus coffea. Coffee Health Dis. Prev. 20, 39–44. https://doi.org/10.1016/B978-0-12-409517-5.00005-X (2015).
Article Google Scholar
Clifford, M. N., Gibson, C. L., Rakotomalala, J.-J.R., Cros, E. & Charrier, A. Caffeine from green beans of Mascarocoffea. Phytochemistry 30, 4039–4040 (1991).
Article CAS Google Scholar
Michelmore, R. W. & Meyers, B. C. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8, 1113–1130 (1998).
Article CAS PubMed Google Scholar
Chevalier, A. Un nouveau Caféier sauvage de Madagascar à grains sans caféine. Rev. Bot. Appl. Agric. Colon. 17, 821–826 (1937).
Google Scholar
Dornano, M., Chassevent, F. & Pougneaud, S. Composition et caractéristiques chimiques de Coffea sauvages de Madagascar. II. Recherche de la caféine et d’autres méthylxanthines dans les feuilles et les graines de caféiers sauvages et cultivés. III. Cafamarine et trigonelline contenues dans les graines de trois caféiers sauvages. Café Cacao 11, 235–249 (1967).
CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central CAS Google Scholar
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Article CAS PubMed PubMed Central Google Scholar
English, A. C. et al. Mind the Gap: Upgrading genomes with Pacific biosciences RS long-read sequencing technology. PLoS One 7, e47768 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25, 20 (2009).
Article Google Scholar
Nachtweide, S. & Stanke, M. Multi-genome annotation with AUGUSTUS. In Gene Prediction, vol 1962 (ed. Kollmar, M.) 139–160 (Springer, 2019).
Google Scholar
Gremme, G., Brendel, V., Sparks, M. E. & Kurtz, S. Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Technol. 47, 965–978 (2005).
Article Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central CAS Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
McCarthy, E. M. & McDonald, J. F. LTR_STRUC: A novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–367 (2003).
Article CAS PubMed Google Scholar
Orozco-Arias, S. et al. Inpactor, integrated and parallel analyzer and classifier of LTR retrotransposons and its application for pineapple LTR retrotransposons diversity and dynamics. Biology 7, 32 (2018).
Article PubMed Central CAS Google Scholar
Yu, Y., Ouyang, Y. & Yao, W. shinyCircos: An R/Shiny application for interactive creation of Circos plot. Bioinformatics 34, 1229–1231 (2018).
Article CAS PubMed Google Scholar
Ming, R. et al. The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47, 1435–1442 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sonnhammer, E. L. L. & Durbin, R. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167, GC1–GC10 (1995).
Article CAS PubMed Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 20 (2017).
Article Google Scholar
Sievers, F. & Higgins, D. G. Clustal omega. Curr. Protoc. Bioinform. 48, 20 (2014).
Article Google Scholar
Gouy, M., Guindon, S. & Gascuel, O. SeaView Version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221–224 (2010).
Article CAS PubMed Google Scholar
Gouet, P., Robert, X. & Courcelle, E. ESPript/ENDscript: Extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic Acids Res. 31, 3320–3323 (2003).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Dr. Stephane Dussert from IRD (CRB Coffea, Saint-Pierre, Reunion Island) for providing C. humblotiana.

Author information

Coralie Fournier
Present address: University of Geneva, CMU-Décanat, 1 Rue Michel Servet, 1211, Geneva 4, Switzerland
These authors contributed equally: Nathalie Raharimalala and Stephane Rombauts.

Authors and Affiliations

Centre National de Recherche Appliquée au Développement Rural, BP 1444, 101, Ambatobe, Antananarivo, Madagascar
Nathalie Raharimalala & Jean-Jacques Rakotomalala
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Stephane Rombauts
European Molecular Biology Laboratory, 71 Avenue des Martyrs, CS 90181, 38042, Grenoble Cedex 9, France
Andrew McCarthy
Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas y Naturales, Universidad de Caldas, Manizales, Colombia
Andréa Garavito & Alexa Yadira Morales-Correa
Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
Simon Orozco-Arias
Universidad Autónoma de Manizales, Manizales, Colombia
Simon Orozco-Arias & Romain Guyot
Nestle Research, Société des Produits Nestlé SA, 1015, Lausanne, Switzerland
Sylviane Metairon, Coralie Fournier & Patrick Descombes
Nestle Research-Plant Science Research Unit, BP 49716, 37097, Tours Cedex 2, France
Laurence Bellanger, Solène Froger, Stéphane Michaux, Victoria Berry, Maud Lepelley & Dominique Crouzillat
Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY, 14853, USA
Lukas Mueller
Institut de Recherche pour le Développement, UMR DIADE, Université de Montpellier, Montpellier, France
Emmanuel Couturon, Perla Hamon & Romain Guyot
VIB Center for Plant Systems Biology, 9052, Gent, Belgium
Stephane Rombauts
Centro de Bioinformática y biología computacional de Colombia – BIOS, Ecoparque los Yarumos, Manizales, Caldas, Colombia
Andréa Garavito

Authors

Nathalie Raharimalala
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Rombauts
View author publications
You can also search for this author in PubMed Google Scholar
Andrew McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Andréa Garavito
View author publications
You can also search for this author in PubMed Google Scholar
Simon Orozco-Arias
View author publications
You can also search for this author in PubMed Google Scholar
Laurence Bellanger
View author publications
You can also search for this author in PubMed Google Scholar
Alexa Yadira Morales-Correa
View author publications
You can also search for this author in PubMed Google Scholar
Solène Froger
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Michaux
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Berry
View author publications
You can also search for this author in PubMed Google Scholar
Sylviane Metairon
View author publications
You can also search for this author in PubMed Google Scholar
Coralie Fournier
View author publications
You can also search for this author in PubMed Google Scholar
Maud Lepelley
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Couturon
View author publications
You can also search for this author in PubMed Google Scholar
Perla Hamon
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Jacques Rakotomalala
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Descombes
View author publications
You can also search for this author in PubMed Google Scholar
Romain Guyot
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Crouzillat
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.D., L.B., S.F., Syl.M., V.B., Ste.M., and C.F. performed sequencing, N.R., S.R., A.M.C, A.G., S.O.A., A.M.-C., M.L., L.M., E.C., P.H., J.J.R. performed analysis and R.G., D.C., and N.R. wrote the main manuscript.

Corresponding authors

Correspondence to Romain Guyot or Dominique Crouzillat.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Raharimalala, N., Rombauts, S., McCarthy, A. et al. The absence of the caffeine synthase gene is involved in the naturally decaffeinated status of Coffea humblotiana, a wild species from Comoro archipelago. Sci Rep 11, 8119 (2021). https://doi.org/10.1038/s41598-021-87419-0

Download citation

Received: 09 November 2020
Accepted: 23 March 2021
Published: 14 April 2021
DOI: https://doi.org/10.1038/s41598-021-87419-0

This article is cited by

High quality genomes produced from single MinION flow cells clarify polyploid and demographic histories of critically endangered Fraxinus (ash) species
- Steven J. Fleck
- Crystal Tomlin
- Victor A. Albert
Communications Biology (2024)
A chromosome-scale assembly reveals chromosomal aberrations and exchanges generating genetic diversity in Coffea arabica germplasm
- Simone Scalabrin
- Gabriele Magris
- Michele Morgante
Nature Communications (2024)
Selection signatures and population dynamics of transposable elements in lima bean
- Daniela Lozano-Arce
- Tatiana García
- Jorge Duitama
Communications Biology (2023)
Making headway toward enduring changes: perspectives on breeding tree crops through genome editing
- Renan Terassi Pinto
- Chanaka Roshan Abeyratne
- Vagner Augusto Benedito
Tree Genetics & Genomes (2023)
Multiplex CRISPR/Cas9-mediated knockout of the phytoene desaturase gene in Coffea canephora
- Tatiane Casarin
- Natália Chagas Freitas
- Luciano Vilela Paiva
Scientific Reports (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Genome sequencing, chromosome-level assembly and annotation

Comparative genomic analysis of C. humblotiana

Evolution of N-methyltransferase genes related to caffeine synthase in C. humblotiana

Biochemical analysis of caffeine and theobromine contents

Discussion

The C. humblotiana genome is highly collinear with C. canephora

LTR retrotransposons are mainly responsible of genome size variations

The C. humblotiana genome also shed light on the evolution of genes involved in the caffeine biosynthesis pathway

Methods

Plant material and biochemical analysis

Genome and RNA sequencing

Genome assembly

Genome annotation and gene prediction

Comparative genome analysis

Phylogenetic analysis (NMT proteins)

NMT protein-sequence alignment and structural analysis

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links