Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Comparison of eight complete chloroplast genomes of the endangered Aquilaria tree species (Thymelaeaceae) and their phylogenetic relationships

An Author Correction to this article was published on 22 April 2021

This article has been updated


Aquilaria tree species are naturally distributed in the Indomalesian region and are protected against over-exploitation. They produce a fragrant non-timber product of high economic value, agarwood. Ambiguous species delimitation and limited genetic information within Aquilaria are among the impediments to conservation efforts. In this study, we conducted comparative analysis on eight Aquilaria species complete chloroplast (cp) genomes, of which seven were newly sequenced using Illumina HiSeq X Ten platform followed by de novo assembly. Aquilaria cp genomes possess a typical quadripartite structure including gene order and genomic structure. The length of each of the cp genome is about 174 kbp and encoded between 89 and 92 proteins, 38 tRNAs, and 8 rRNAs, with 27 duplicated in the IR (inverted repeat) region. Besides, 832 repeats (forward, reverse, palindrome and complement repeats) and nine highly variable regions were also identified. The phylogenetic analysis suggests that the topology structure of Aquilaria cp genomes were well presented with strong support values based on the cp genomes data set and matches their geographic distribution pattern. In summary, the complete cp genomes will facilitate development of species-specific molecular tools to discriminate Aquilaria species and resolve the evolutionary relationships of members of the Thymelaeaceae family.


Aquilaria Lam., is a tropical tree genus from the family Thymelaeaceae. Members of this taxon are widely distributed in the Indomalesia region, but their numbers are declining in the wild. Consequentially, they have entered various Red List categories of the International Union for Conservation of Nature (IUCN). The latest IUCN publication placed four species (A. crassna, A. khasiana, A. malaccensis, and A. rostrata) under “Critically Endangered”; one species (A. microcarpa) as “Endangered”; nine species (A. banaensis, A. beccariana, A. cumingiana, A. decemcostata, A. filaria, A. hirta, A. rugosa, A. sinensis, A. yunnanensis) as “Vulnerable”; and seven species (A. apiculata, A. baillonii, A. brachyantha, A. citrinicarpa, A. parvifolia, A. subintegra, A. urdanetensis) as “Data Deficient” (IUCN,, accessed on 6 November 2019). Aquilaria produces a fragrant resin (agarwood) as a self-healing mechanism to external wounding of its trunk, branches and roots1. Agarwood is a valuable raw material in production of perfumes, incense and traditional medicines2. The high demand in agarwood has led to illegal logging and non-selective felling due to the low percentage of naturally occurring agarwood trees in the wild2,3. The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) has listed all Aquilaria species under Appendix II, as one of the countermeasures to reduce illegal agarwood trade2,4.

Agarwood is chiefly traded in wood form or consumer products, yet Aquilaria species delimitation is based on the plant’s botanical characteristics, with much emphasis given to the reproductive parts specifically flower and fruit5. Incomplete distinct morphological characteristics render genetic detection tools indispensable during agarwood identification process. Recent approaches have utilized short DNA sequences to identify Aquilaria species and species-origin of agarwood6,7,8,9. Short cp gene sequences have been adequate in barcoding some land plants10, unfortunately it has been shown that with Aquilaria, highly evolved DNA barcodes are required. Several DNA barcoding loci have been tested in Aquilaria, yet the discrimination power across a majority of the Aquilaria species is insufficient7.

A recent phylogenetic analysis utilizing five non-coding cp DNA regions from 15 species (Aquilarieae) yielded inconclusive resolution due to the high percentage of conserved sites11. Meanwhile, in the same study, nuclear ribosomal DNA internal transcribed spacer (ITS) revealed a paraphyletic relationship between Aquilaria species from Indochina and Malesian. It has been suggested that genome-scale data could provide a better resolution and thorough information of a studied genus pertaining to taxonomical aspects, genetic diversity, and pattern of evolution11.

Chloroplast is an important organelle contributing to the growth development of most plants. They play important roles in photosynthesis and carbon fixation12. Chloroplast or plastid originally derived from a free-living photosynthetic prokaryote. Present-day chloroplast arose from a cynobacterial endosymbiont13. Chloroplasts still exhibit many prokaryotic characteristics such as having a circular DNA, reproducing in a similar way as bacteria through division, and importing nuclear encoded proteins through thylakoids14. In general, the cp genomes in angiosperms are circular DNA molecules with a highly conserved region, gene content, and gene order, and the standard cp genome size is ranged between 120 and 170 kbp in length15. A typical cp genome consists of a pair of inverted repeats (IR) region that is separated by a large single copy (LSC) region and a small single copy (SSC) region16. The advancement of high-throughput sequencing technology has amassed thousands of complete cp genomes from various land plants in just several years17. A total of 23,867 complete cp genomes of land plants have been sequenced (NCBI,, accessed on 12 February 2020). The maternal inheritance in the cp genome has provided an exclusive and substantial information for plant systematics and evolutionary relationships18. Cp genomes have been used in species identification, phylogeny and population genetic analyses17,19,20. Potential markers can be developed through cp genomes analysis for identification of plant species, particularly for the taxonomically complicated groups10,21.

Due to the small genome size, high interspecific and low intraspecific divergence, and ease of handling, cp genome is an attractive alternative to provide more variations in discriminating closely related plants10. To date, research on Aquilaria genomics has yielded a draft whole genome of A. agallocha22 and complete sequences of six cp genomes, one each from A. crassna23, A. malaccensis24, and A. yunnanensis25, and three from A. sinensis18,26,27. In this study, we report complete cp genome sequences of seven Aquilaria species and incorporate these new sequences together with another published cp genome from our group (A. malaccensis)24 in all our analyses. In addition, we also completely sequenced the cp genomes of Gonystylus affinis and Phaleria macrocarpa and retrieved complete cp sequences of another six species, all of which are under the Thymelaeaceae family, to determine their molecular placement within the phylogenetic tree. Collectively, we provide a rich genomic resource to better understand Aquilaria, which may help facilitate the conservation efforts of these endangered species.

Materials and methods

Sample materials

Fresh leaf samples were collected from individual trees growing in the greenhouse of the Faculty of Forestry and Environment, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia, and the Aquilaria germplasm of Forest Research Institute of Malaysia (FRIM), Kepong, Selangor, Malaysia. For comparative analysis, seven species, A. beccariana, A. hirta, A. microcarpa, A. rostrata, A. crassna, A. sinensis, and A. subintegra, were sequenced. The four former species are native to Malaysia, while the following three are introduced plantation species in the country. For phylogenetic analysis, the cp genome of G. affinis and P. macrocarpa, two close relatives of Aquilaria, were also sequenced.

DNA extraction and sequencing

A total of 100 mg fresh leaves was pulverized into powder using mortar and pestle, with the aid of liquid nitrogen. Total genomic DNA was extracted using a modified cetyltrimethylammonium bromide (CTAB) method28. The quantity and quality of the DNA samples were determined using the Qubit dsDNA BR assay (Life Technologies, Carlsbad, CA, USA) using the manufacturer's instructions. DNA samples were fragmented using sonication, purified and end-repaired, and their sizes were determined by gel electrophoresis and the size of fragments were between 200 to 500 bp. A genomic library with an insert size of 300 bp was prepared using TruSeq DNA Sample Prep Kit (Illumina, CA, USA) and next-generation sequencing was conducted on a HiSeq X Ten platform (Illumina, USA).

Chloroplast genome assembly and annotation

Approximately 8 Gb of raw data that consisted of 150-bp paired-end reads were generated and the sequence adaptors for the raw reads were trimmed off using the base quality control software NGS QC Toolkit v2.3.329. The cp genome was assembled using NOVOPlasty v3.8.230, with the rbcL sequence of A. yunnanensis (KR528756) as the seed sequence. The assembled cp genome sequence was annotated using online annotation tool GeSeq31, and further compared manually against A. yunnanensis cp genome (MG656407). The circular cp genome maps were visualized using OGDRAW v1.3.132.

Comparative analysis of Aquilaria chloroplast genomes

For comparative cp genome analysis, the sequence of A. malaccensis (MH286934) was included. Base composition and GC content were determined using DNA Baser Sequence Assemble v5.15 ( and Emboss (, respectively. Sequences were aligned using MAFFT v733 with default settings (strategy of FFT-NS-2) and then transferred into DnaSP v5.10.134 to identify nucleotide diversity in the total genome, LSC, SSC and IR regions. The boundaries between the IR and SC regions were further evaluated manually to examine the differences in length variation in the cp genomes of Aquilaria.

Repeat structure analysis and identification of highly variable regions

Repeat sequences as well as forward (F), reverse (R), complement (C) and palindrome (P) sequences were identified using REPuter35, with the maximum and minimum repeat size set at 50 and 30, respectively, and Hamming distance ≤ 3. To identify highly variable regions, polymorphic sites and nucleotide variability (Pi) in the eight MAFFT aligned cp genomes were evaluated using a sliding window analysis available in DnaSP v5.10.01, under a 200-bp step size and a 600-bp window length. The regions that contain the number of polymorphic sites that are more than the sum of the average and double the standard deviation are regarded as highly variable regions in the cp genome34.

Phylogenetic analysis

Phylogenetic analysis was performed to determine relatedness of the seven Aquilaria cp genomes sequenced in this study and four publicly available Aquilaria cp genomes: (1) A. crassna (MK779998)23, (2) A. sinensis (KT148967)18, (3) A. yunnanensis (MG656407)25, and (4) A. malaccensis (MH286934)24. To place Aquilaria in relation to other species in the family Thymelaeaceae, sequences from G. affinis and P. macrocarpa, and three available accessions: (1) Daphne tangutica (MK455880)36, (2) Daphne kiusiana (KY991380)37, and (3) Stellera chamaejasme (MK681211)38, were included. Neobalanocarpus heimii (MH746730)39 and Eucalyptus grandis (HM347959)40 were used as outgroups. Sequences were aligned using MAFFT v733 with default settings (strategy of FFT-NS-2). Phylogenetic analyses were subsequently performed using Maximum likelihood (ML) and Bayesian inference (BI) methods. Maximum likelihood (ML) analyses were performed using IQ-TREE v.1.4.241 with branch support estimated using 2,000 replicates of both SH-like approximate likelihood-ratio test (SH-aLRT)42 and the ultrafast bootstrapping algorithm (UFboot)43. The ModelFinder option was used to identify the optimal partitioning scheme and substitution models44, in which the DNA substitution model that is most suitable for our dataset was transversion model (TVM) with empirical base frequencies (+ F) and discrete Gamma model with default 4 rate categories (+ G4) (= TVM + F + G4). The phylogenetic tree was rooted using E. grandis and visualized using Figtree v1.4.445. Bayesian inference (BI) analyses were performed using the program MrBayes v3.2.746. Markov chain Monte Carlo (MCMC) simulations were run twice independently for 2 million generations, and sampling trees every 100 generations. Convergence was determined by examining the average standard deviation of split frequencies (≤ 0.01). The first 25% of trees was discarded as burn-in, and the remaining trees were used to build a majority-rule consensus tree.

Results and discussion

Chloroplast genome sequencing

Approximately 60,000,000 raw reads were obtained for each species sequenced using the HiSeq X Ten system. Raw reads were inserted directly into the pipeline without filtering or quality trimming to obtain maximum useful data. To accelerate the assembly of plastid genomes, we selected only the first 13.6 million sequences of each paired-end data, yielding with a total of 15.4 Gb clean data after sequencing. For base quality assessment, 94.7% of Q30 bases were obtained. All newly sequenced cp genomes of Aquilaria species with gene annotations have been deposited into the GenBank (A. beccariana, MN125347; A. crassna, MN125348; A. hirta, MN125349; A. microcarpa, MN125350; A. rostrata, MN125351; A. sinensis, MN147870; A. subintegra, MN147871). In addition, we also sequenced cp genomes of selected species of the two closely related taxa, G. affinis (MN147872) and P. macrocarpa (MN147873), due to the limited number of available cp genome sequences for Thymelaeaceae.

Complete chloroplast genomes of Aquilaria species

The cp genomes of the Aquilaria species are different in size by only 68–214 bp, from the smallest, 174,693 bp (A. rostrata) to the biggest 174,907 bp (A. sinensis) (Table 1). All eight Aquilaria cp genomes share a typical quadripartite structure composed of a pair of IRs known as IRA and IRB, and a single LSC and SSC (Fig. 1). In addition, the gene content and order are highly similar. This agrees with the consensus that the genomic structure in cp genomes of angiosperms is highly conserved15,17. The length of LSC ranged from 87,221 bp (A.hirta) to 87,355 bp (A. sinensis), while the length of SSC and IR ranged from 3,233 bp (A. rostrata) to 3,347 bp (A. malaccensis), and from 42,085 bp (A. microcarpa) to 42,102 bp (A. sinensis), respectively. The GC content is highest in IR (~ 38%), moderate in LSC (~ 34%) and lowest in SSC (~ 29%) (Table 1). Furthermore, the GC contents of LSC and SSC in all Aquilaria species are much lower than IR (Table 1) because of the reduction of AT nucleotides in the five rRNA genes (rrn23s, rrn16s, rrn12s, rrn4.5 s, and rrn5s)47,48 (Table 2). The GC content of the Aquilaria species in our study is similar to that reported in A. yunnanensis (38%)25 and A. sinensis (36.7%)26. The contraction and expansion of IR region boundaries are considered the primary mechanism that affects the varying lengths in angiosperm cp genomes, as demonstrated in Apiales49 and Trochodendraceae50. However, in this study, variations were in fact detected at the LSC/IRA, IRA/SSC, SSC/IRB, and IRB/LSC border regions of the Aquilaria cp genomes (Fig. 3). When comparing the boundary (IR/SC) regions between Aquilaria species and two of their Thymelaeaceae relatives, S. chamaejasme (MK681211) and D. kiusiana (KY991380), they all share highly identical genes at the border junctions.

Table 1 Summary of the assembly data of eight Aquilaria chloroplast genomes.
Figure 1

Chloroplast genome maps of eight Aquilaria species (A. beccariana, A. crassna, A. hirta, A. malaccensis, A. microcarpa, A. rostrata, A. sinensis and A. subintegra). Genes inside outer ring are transcribed clockwise, while genes outside outer ring are transcribed counterclockwise. Genes of different functional groups are shown in colored bars. The inner circle (dashed gray area) indicates the proportional GC content of the corresponding genes. Regions of the large single-copy (LSC), small single-copy (SSC) and inverted repeats (IRA and IRB) are indicated.

Table 2 List of annotated genes in Aquilaria chloroplast genomes.

The number of encoded functional genes from the species we sequenced ranged from 135 to 138 (Table 1), which are not significantly different from the 137 reported in A. sinensis26, 142 in A. yunnanensis (MG656407) and 125 in A. crassna (MK 779,998). There are 38 tRNA genes, eight rRNA genes, and 89–92 protein coding genes in each of the cp genome (Tables 1 and 2). The IR region contains five rRNA genes and 22 other single repeat genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnL-UAG, trnN-GUU, trnR-ACG, trnV-GAC, rps7, rps12, rps15, rpl2, rpl23, ndhA, ndhB, ndhD, ndhE, ndhG, ndhH, ndhI, psaC, and ycf15). There are slight differences between the Aquilaria cp genomes such as the number of genes present, even though the cp genomes of land plants are generally considered as highly conserved51. Other examples like several genes appeared to be species-specific: rps18 in A. malaccensis and A. microcarpa, rpl16 in A. hirta and A. malaccensis, and ycf2 in A. malaccensis (Table 2). Presence or absence of a specific gene from several Aquilaria species and not others could be due to the gene being transferred to the nucleus. Gene transfer events have been observed in gene knockout experiments, such as the rps18 in tobacco52. This event most likely happened when plants are exposed to biotic and abiotic stresses, which lead to the inducing accumulation of reactive oxygen species (ROS), which activates signalling pathway when at low levels, but can cause irreparable injury to cells when produced excessively53,54. ROS is a normal product of plant cellular metabolism that can be affected by various types of stress55. ROS generated in chloroplasts can also act as signals that travel from the cp to the nucleus under stress conditions56. Since signals are moving from the chloroplast to the nucleus under stress condition, these transfers may also promote the transfer of chloroplast genome fragments to the nucleus where they could be incorporated into the nuclear genome54. Consequently, it may assist the transfer of cp gene to the nucleus54.

When comparing the eight cp genomes for base/nucleotide composition, in LSC, A. malaccensis has the highest percentage of A (34%) and G (18%) nucleotides, but the lowest percentage in T (32%) and C (16%) nucleotides (Fig. 2). In SSC, A. sinensis has the highest percentage of A (41%) nucleotide, while A. crassna and A. hirta have the highest percentage of T (32%) and C (16%) nucleotides, respectively. The overall A + T content is more than 50% when compared to G + C content (Fig. 2). This study shows that Aquilaria cp genomes have high levels of A + T content, a feature generally observed in many cp genomes sequences of angiosperm species57.

Figure 2

Base composition in the eight Aquilaria chloroplast genomes. The percentage (%) of each nucleotide (A, T, G, C) content in the three different regions, large single-copy (LSC), small single-copy (SSC) and inverted repeats (IR) region, are shown. Data for A. malaccensis were from Lee et al.24.

Interspecies chloroplast genome sequence analysis

Multiple cp genome sequence alignment of the eight Aquilaria cp genomes with a total of 174,832 nucleotide sites revealed 697 variable (polymorphic) sites including 405 singleton variable sites (SVS) and 292 parsimony informative sites (PIS) (Table 3). There are two different categories under SVS, 403 sites with two variants (SV2V) and two sites with three variants (SV3V). Similarly, the PIS also has two variants (PIS2V) (288 sites) and three variants (PIS3V) (4 sites) (Table 3). In the SV2V category, A. rostrata has the highest number of SVS (nucleotide: A = 17, T = 22, G = 47, and C = 45), followed by A. hirta (nucleotide: A (30), T (19), G (35) and C (35) and A. sinensis (nucleotide: A (23), T (24), G (40) and C (28), while A. crassna has none (Table 4). In the PIS2V category, A. hirta has the highest number of PIS for nucleotide A (81), while A. crassna for nucleotide T (90). In summary, most of the variable sites were identified in A. hirta, A. rostrata and A. sinensis (Table 4). The information on SVS and PIS are useful for species identification studies and for determining phylogenetic relationships58,59.

Table 3 Variable site analysis shows the presence of singleton variable sites (SV) and parsimony informative sites (PIS) in the eight Aquilaria chloroplast genomes.
Table 4 Number of nucleotides in variable site analysis among eight Aquilaria species.

IR contraction and expansion

Close examination of the IR/SC boundary regions among the eight Aquilaria species revealed three main differences (Fig. 3). Firstly, the rps19 gene (284 bp) is extended beyond the LSC into the IRA region by 15 bp in all species. Secondly, the ndhf gene spans the IRA/SSC border, between 25 to 28 bp in the IRA region and 2,211 bp in the SSC region, except in A. microcarpa, where it is completely in the SSC region, distanced by 6 bp from the IRA region. Thirdly, in all the eight Aquilaria species, the rpl32 and trnL genes are in the SSC region and IRB region, respectively, however with slight differences in the distance to or from the ISSC/RB border. No differences were observed in the IRB/LSC border region; the rpl2 gene is in the IRB region and the trnH is in the LSC region. In general, the IR region is one of the main reasons for a change in the size of the cp genome due to expansion, shrinkage and loss of the IR60.

Figure 3

Comparison of the border regions of LSC, SSC, and IR among the eight Aquilaria chloroplast genomes.

Large sequence repeat analyses

The large sequence repeat (LSR) of the eight Aquilaria cp genomes were analyzed using REPuter software. A total of 832 repeats (at least 30 bp per repeat unit with Hamming distance = 3), including forward (F), reverse (R), palindromic (P) and complement (C) repeats were identified (Table 5, and Supplementary Tables 1 to 31). In general, F repeats are the most common type detected in the Aquilaria cp genomes, while C repeats are the least. Among the eight species, C repeats are absent from A. rostrata, although it has 50 F and 2 R repeats, and 48 P repeats (Fig. 4). Large repeat sequences are informative for phylogenetic studies of Aquilaria species as they play important roles in cp genome evolution and may aid in future development of molecular markers61.

Table 5 Repeat sequence analysis in eight Aquilaria cp genomes with different hamming distance.
Figure 4

Large sequence repeat (LSR) analysis. The eight Aquilaria chloroplast genomes have four repeat types, which are forward (F), reverse (R), palindrome (P) and complement (C).

Frequent variation in repeat regions in most angiosperm plants occurs due to slipped-strand mispairing and illegitimate recombination. Frequent variation in repeat regions also plays an important role in variation and sequence rearrangement in cp genomes20,62. In addition, the quantity of the identified repeats is sensitive to the Hamming distance used. For example, when we cut the Hamming distance from 3 to 1 (in other words rigidity was augmented), the number of repeat sequences was lowered (Table 5).

Identification of highly variable regions within the Aquilaria cp genomes

Using the alignment created by MAFFT and DnaSP software, the nucleotide variability (Pi) values within 600 bp window were calculated in all eight cp genomes. They are in the range from 0 to 0.01370 (Fig. 5). There are nine highly divergent regions (Pi > 0.005), divided between the intergenic spacer (IGS) region (trnD-trnY, trnT-trnL, trnL-trnF, trnF-ndhJ, trnV-trnM) and the coding sequence (CDS) regions (matK-rps16, rpoC1-rpoC2, petA-cemA and rpl32) (Fig. 5). In total, there are 144 variable sites, 72 parsimony informative sites and Pi values from 0.00630 to 0.01370, in the nine regions (Table 6). Among these, rpl32 has the most nucleotide variation (0.01370). Meanwhile, we found that the IR region is extremely conserved (Pi < 0.005) because highly variable region/divergent sequences were not found.

Figure 5

Nucleotide variability values compared between the eight chloroplast genomes of Aquilaria using the window sliding analysis (window length: 600 bp and step size: 200 bp). X-axis indicates the position of the midpoint of the window, while Y-axis indicates the nucleotide diversity of each window.

Table 6 Nine regions of highly variable sequences in Aquilaria.

Phylogenetic analysis

For construction of phylogenetic trees, the Maximum Likelihood (ML) analyses were performed via IQ-TREE v.1.4.2 software41 and Bayesian Inference (BI) analyses were performed via MrBayes v3.2.7 software46 using the complete cp genomes of 11 accessions (nine Aquilaria species), and the results are summarized in Fig. 6. Similar phylogenetic topologies structures were found in the ML and BI nucleic acid analyses. The nine Aquilaria species are diverged into two major clades (Clade 1 and 2) showing a paraphyletic relationship, with a strong support as indicated from the high bootstrap values for SH-aLRT and UFBoot and posterior probability values (100%, 100%, and 1, respectively) (Fig. 6A). Clade 1 has three species of Malay Peninsula origin (A. hirta, A. beccariana and A. malaccensis) and one species of Borneo origin (A. microcarpa). They come from a recent common ancestor (99.7%, 99%, and 1). The branch is considered reliable when the support value of SH-aLRT, UFBoot, and posterior probability values are > 80%, > 95% and > 0.75, respectively46,63. Clade 2 is further diverged into two sub-clades. Sub-clade 2a shows that A. rostrata is genetically distanced from A. crassna and A. subintegra (99.9%, 100%, and 1). Aquilaria rostrata is an endemic species of Malay Peninsula, confined to high altitudes64. We also showed that A. crassna (MK779998), which originated from Cambodia and A. subintegra (MN147871) from Thailand are sisters. However, the low support values (85.1%, 85%, and 1) suggest that local hybridization might have occurred65, although this needs further investigation. We have observed that in the field, A. crassna and A. subintegra, have very similar morphological features5. Interestingly, when comparing the polymorphic sites, except for the SV2V category, these two species have the same number of nucleotides in the PIS3V category (Table 4). Meanwhile, sub-clade 2b shows that two Aquilaria species of China origin (A. sinensis and A. yunnanensis) are grouped together with strong support values (100%, 100%, and 1) (Fig. 6A). We conclude that the phylogenetic positions within the Aquilaria species reported here corresponds well with their natural geographic distribution pattern. When compared to the recent Aquilaria phylogenetic tree constructed using a concatenated dataset of five cp gene sequence (matK, rbcL, trnL intron, trnL-trnF, and psbC-trnS)11, a consistent clustering pattern was observed. However, our cp genome-ML tree has a higher statistical support. This shows that comparative analysis using complete cp genome reveals greater abundance in informative characters when compared to the short gene fragments in Aquilaria.

Figure 6

Phylogram depicting the relationships among different Aquilaria species (A), and molecular placement of Aquilaria genus in the family Thymelaeaceae (B), estimated using maximum likelihood (ML) and bayesian inference (BI) analysis in IQ-TREE and MrBayes. The data set was partitioned by the optimal scheme identified using the ModelFinder option of IQ-TREE. The bootstrap was set at 2000 replicates and the support value is indicated at each branch, where the first number indicates SH-aLRT value, the second number indicates UFboot value and the third number indicates posterior probability (pp) value. Five closely related taxa, Daphne kiusiana (KY991380), Daphne tangutica (MK455880), Gonystylus affinis (MN147872), Phaleria macrocarpa (MN147873), and Stellera chamaejasme (MK681211), and two outgroups, Eucalyptus grandis (HM347959) and Neobalanocarpus heimii (MH746730), were included. Species sequenced in this study are in bold. GenBank accession numbers are indicated for each species.

Figure 6B exhibits Aquilaria’s position in relation to other member taxa in the family Thymelaeaceae. All Aquilaria species clustered into a strongly supported clade (100% ,100%, and 1) after G. affinis. Aquilaria is the first to diverge from Gonystylus followed by Phaleria and Daphne (Fig. 6B). Both latter taxa are under the Daphneae tribe, placed as sister to the Aquilarieae tribe in the subfamily Thymelaeoideae. Our findings are in agreement with the classification system by Herber (2003)66, who proposed two major subfamilies for Thymelaeaceae, Octolepidoideae (Gonystylus) and Thymelaeoideae (s.l.). The latter subfamily is further divided into tribes Aquilarieae (Aquilaria), Daphneae (Phaleria, Daphne, and Stellera) and Synandrodaphneae. Similarly, our results compliment the classification system of Domke (1934)67, as shown by Beaumont et al.68 through a phylogenetic analysis involving 143 specimens from members of the Thymelaeaceae family and the combined dataset of rbcL + trnL-trnF + ITS. Aquilaria, placed under the subfamily Aquilarioideae, is shown to evolve after the Gonystyloideae (Gonystylus), and sister to Thymelaeoideae (Phaleria, Daphne, and Stellera)66.


In this study, we report new complete cp genomes sequences from seven Aquilaria species and analyzed these genomes including another, which we recently published. The eight Aquilaria cp genomes were similar in genome content, structure, and gene order. Comparison of the eight Aquilaria cp genomes revealed 832 LSR and nine divergent regions (trnD-trnY, trnT-trnL, trnL-trnF, trnF-ndhJ, trnV-trnM, matK-rps16, rpoC1-rpoC2, petA-cemA and rpl32). Both ML and BI phylogenetic analyses strongly supported the phylogenetic positions within the Aquilaria species and their natural geographic distribution pattern. We have successfully revealed the complete cp genome sequences for eight Aquilaria species, in which five were native to Malaysia. Future studies should identify potential molecular markers to provide a clear discrimination between these important and closely related genetic resources.

Change history


  1. 1.

    Liu, Y. et al. Whole-tree agarwood-inducing technique: an efficient novel technique for producing high-quality agarwood in cultivated Aquilaria sinensis trees. Molecules 18(3), 3086–3106 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Azren, P. D., Lee, S. Y., Emang, D. & Mohamed, R. History and perspectives of induction technology for agarwood production from cultivated Aquilaria in Asia: a review. J. For. Res. 30(1), 1–11 (2019).

    Google Scholar 

  3. 3.

    Ng, L. T. A review on agar (gaharu) producing Aquilaria species. J. Trop. Forest Prod. 2, 272–285 (1997).

    Google Scholar 

  4. 4.

    UNEP-WCMC. 2019. The checklist of CITES species website. Compiled by UNEP-WCMC, Cambridge, UK. CITES Secretariat, Geneva. Accessed 2019 June 28.

  5. 5.

    Lee, S.Y., & Mohamed, R. The origin and domestication of Aquilaria, an important agarwood-producing genus. In Agarwood 1–20. (Springer, Singapore, 2016).

  6. 6.

    Jiao, L., Yin, Y., Cheng, Y. & Jiang, X. DNA barcoding for identification of the endangered species Aquilaria sinensis: comparison of data from heated or aged wood samples. Holzforschung 68(4), 487–494 (2014).

    CAS  Google Scholar 

  7. 7.

    Lee, S. Y., Ng, W. L., Mahat, M. N., Nazre, M. & Mohamed, R. DNA barcoding of the endangered Aquilaria (Thymelaeaceae) and its application in species authentication of agarwood products traded in the market. PLoS ONE 11(4), e0154631 (2016).

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Tanaka, S. & Ito, M. DNA barcoding for identification of agarwood source species using trnL-trnF and matK DNA sequences. J. Nat. Med. 74(1), 42–50 (2020).

    CAS  PubMed  Google Scholar 

  9. 9.

    Lee, S. Y., Turjaman, M. & Mohamed, R. Phylogenetic relatedness of several agarwood-producing taxa (Thymelaeaceae) from Indonesia. Trop. Life Sci. Res. 29(2), 13 (2018).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Li, X. et al. Plant DNA barcoding: from gene to genome. Biol. Rev. 90(1), 157–166 (2015).

    PubMed  Google Scholar 

  11. 11.

    Farah, A. H. et al. Genome size, molecular phylogeny, and evolutionary history of the tribe Aquilarieae (Thymelaeaceae), the natural source of agarwood. Front. Plant Sci. 9, 712 (2018).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Neuhaus, H. E. & Emes, M. J. Nonphotosynthetic metabolism in plastids. Annu. Rev. Plant Biol. 51(1), 111–140 (2000).

    CAS  Google Scholar 

  13. 13.

    Ku, C. et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature 524(7566), 427–432 (2015).

    ADS  CAS  PubMed  Google Scholar 

  14. 14.

    Finkemeier, I., & Leister, D., 2001. Plant chloroplasts and other plastids. e LS.

  15. 15.

    Wicke, S., Schneeweiss, G. M., Depamphilis, C. W., Müller, K. F. & Quandt, D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol. Biol. 76(3–5), 273–297 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Sugiura, M. The chloroplast genome. In 10 Years Plant Molecular Biology 149–168. (Springer, Dordrecht, 1992).

  17. 17.

    Yu, X. Q., Drew, B. T., Yang, J. B., Gao, L. M. & Li, D. Z. Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny. PLoS ONE 12(6), e0178026 (2017).

    PubMed  PubMed Central  Google Scholar 

  18. 18.

    Wang, Y. et al. Complete chloroplast genome sequence of Aquilaria sinensis (Lour.) Gilg and evolution analysis within the Malvales order. Front. Plant Sci. 7, 280 (2016).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Zhang, Y. J., Ma, P. F. & Li, D. Z. High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae). PLoS ONE 6(5), e20596 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Zhang, Y. et al. The complete chloroplast genome sequences of five Epimedium species: lights into phylogenetic and taxonomic analyses. Front. Plant Sci. 7, 306 (2016).

    ADS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Chen, J. et al. The complete chloroplast genome sequence of the relict woody plant Metasequoia glyptostroboides Hu et Cheng. Front. Plant Sci. 6, 447 (2015).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Chen, C. H. et al. Identification of cucurbitacins and assembly of a draft genome for Aquilaria agallocha. BMC Genom. 15(1), 578 (2014).

    Google Scholar 

  23. 23.

    Li, G. D., Rao, P. Y., Guo, J. L. & Zhang, Y. H. The complete chloroplast genome of a critically endangered agarwood tree, Aquilaria crassna (Thymelaeaceae). Mitochondrial DNA B 4(1), 1810–1811 (2019).

    Google Scholar 

  24. 24.

    Lee, S. Y., Ng, W. L., Mohamed, R. & Terhem, R. The complete chloroplast genome of Aquilaria malaccensis Lam. (Thymelaeaceae), an important and threatened agarwood-producing tree species. Mitochondrial DNA B 3(2), 1120–1121 (2018).

    Google Scholar 

  25. 25.

    Zhang, Y. H., Huang, Y., Li, Z. M. & Zhang, S. D. Characterization of the complete chloroplast genome of the vulnerable agarwood tree, Aquilaria yunnanensis (Thymelaeaceae). Conserv. Genet. Resour. 11(2), 161–164 (2019).

    Google Scholar 

  26. 26.

    Lin, C. P. et al. The complete chloroplast genome of agarwood producing species, Aquilaria sinensis (Lour.) Gilg: a species on IUCN red list. Mitochondrial DNA B 4(2), 2992–2993 (2019).

    Google Scholar 

  27. 27.

    Deng, X. et al. Characterization of the complete chloroplast genome of Aquilaria sinensis, an endangered agarwood-producing tree. Mitochondrial DNA B 5(1), 422–423 (2020).

    Google Scholar 

  28. 28.

    Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1(5), 2320 (2006).

    CAS  Google Scholar 

  29. 29.

    Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7(2), e30619 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucl. Acids Res. 45(4), e18–e18 (2017).

    PubMed  Google Scholar 

  31. 31.

    Tillich, M. et al. GeSeq-versatile and accurate annotation of organelle genomes. Nucl. Acids Res. 45(W1), W6–W11 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3. 1: expanded toolkit for the graphical visualization of organellar genomes. Nucl. Acids Res. 47(W1), W59–W64 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Boil. Evol. 30(4), 772–780 (2013).

    CAS  Google Scholar 

  34. 34.

    Librado, P. & Rozas, J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25(11), 1451–1452 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Kurtz, S. et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucl. Acids Res. 29(22), 4633–4642 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Yan, F., Wang, Q. L., Zhang, Y. J., Zhang, C. M. & Chen, Y. The complete chloroplast genome sequence of medicinal plant, Daphne tangutica Maxim. (Thymelaeaceae). Mitochondrial DNA B 4(1), 1776–1777 (2019).

    Google Scholar 

  37. 37.

    Cho, W. B., Han, E. K., Choi, G. & Lee, J. H. The complete chloroplast genome of Daphne kiusiana, an evergreen broad-leaved shrub on Jeju Island. Conserv. Genet. Resour. 10(1), 103–106 (2018).

    Google Scholar 

  38. 38.

    Yun, N., Park, J. & Oh, S. H. The complete chloroplast genome of the traditional medicinal plant Stellera chamaejasme L. (Thymelaeaceae). Mitochondrial DNA B 4(1), 1796–1797 (2019).

    Google Scholar 

  39. 39.

    Lee, S. Y., Ng, W. L., Hishamuddin, M. S. & Mohamed, R. The complete chloroplast genome sequence of Chengal (Neobalanocarpus heimii, Dipterocarpaceae), a durable tropical hardwood. Mitochondrial DNA B 4(1), 19–20 (2019).

    Google Scholar 

  40. 40.

    Paiva, J. A. et al. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries. BMC Genom. 12(1), 137 (2011).

    CAS  Google Scholar 

  41. 41.

    Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010).

    CAS  Google Scholar 

  43. 43.

    Minh, B. Q., Nguyen, M. A. T. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30(5), 1188–1195 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14(6), 587 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Rambaut, A., 2018. FigTree v1. 4.2, a graphical viewer of phylogenetic trees. 2014.

  46. 46.

    Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–755 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Liu, M. L. et al. Evolutionary analysis of plastid genomes of seven Lonicera L. species: implications for sequence divergence and phylogenetic relationships. Int. J. Mol. Sci. 19(12), 4039 (2018).

    PubMed Central  Google Scholar 

  48. 48.

    Meng, J. et al. Comparative analysis of the complete chloroplast genomes of four Aconitum medicinal species. Molecules 23(5), 1015 (2018).

    PubMed Central  Google Scholar 

  49. 49.

    Downie, S. R. & Jansen, R. K. A comparative analysis of whole plastid genomes from the Apiales: expansion and contraction of the inverted repeat, mitochondrial to plastid transfer of DNA, and identification of highly divergent noncoding regions. Syst. Bot. 40(1), 336–351 (2015).

    Google Scholar 

  50. 50.

    Sun, Y. X. et al. Complete plastid genome sequencing of Trochodendraceae reveals a significant expansion of the inverted repeat and suggests a Paleogene divergence between the two extant species. PLoS ONE 8(4), e60429 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Kumar, S., Stecher, G. & Tamura, K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33(7), 1870–1874 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Rogalski, M., Ruf, S. & Bock, R. Tobacco plastid ribosomal protein S18 is essential for cell survival. Nucl. Acids Res. 34(16), 4537–4545 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Martin, W. et al. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393(6681), 162–165 (1998).

    ADS  CAS  PubMed  Google Scholar 

  54. 54.

    Cullis, C. A., Vorster, B. J., Van Der Vyver, C. & Kunert, K. J. Transfer of genetic material between the chloroplast and nucleus: how is it related to stress in plants?. Ann. Bot. 103(4), 625–633 (2009).

    CAS  PubMed  Google Scholar 

  55. 55.

    Sharma, P., Jha, A.B., Dubey, R.S. and Pessarakli, M., 2012. Reactive oxygen species, oxidative damage, and antioxidative defense mechanism in plants under stressful conditions. J. Bot., 2012.

  56. 56.

    Woodson, J. D. & Chory, J. Coordination of gene expression between organellar and nuclear genomes. Nat. Rev. Genet. 9(5), 383–395 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Bi, Y. et al. Chloroplast genomic resources for phylogeny and DNA barcoding: a case study on Fritillaria. Sci. Rep. 8(1), 1–12 (2018).

    ADS  Google Scholar 

  58. 58.

    Wynen, L. P. et al. Phylogenetic relationships within the eared seals (Otariidae: Carnivora): implications for the historical biogeography of the family. Mol. Phylogenet. Evol. 21(2), 270–284 (2001).

    CAS  PubMed  Google Scholar 

  59. 59.

    Zhang, J., Wang, J., Xia, T. & Zhou, S. DNA barcoding: species delimitation in tree peonies. Sci. China Ser. C Life Sciences 52(6), 568–578 (2009).

    CAS  Google Scholar 

  60. 60.

    Bock, R. & Knoop, V. (eds) Genomics of Chloroplasts and Mitochondria Vol. 35 (Springer, Berlin, 2012).

    Google Scholar 

  61. 61.

    Zong, D. et al. Comparative analysis of the complete chloroplast genomes of seven Populus species: Insights into alternative female parents of Populus tomentosa. PLoS ONE 14(6), e0218455 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Yuan, C. et al. The complete chloroplast genome sequence and phylogenetic analysis of Chuanminshen (Chuanminshenviolaceum Sheh et Shan). Physiol. Mol. Biol. Plants 23(1), 35–41 (2017).

    CAS  PubMed  Google Scholar 

  63. 63.

    Hoang, D. T., Chernomor, O., Von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35(2), 518–522 (2018).

    CAS  Google Scholar 

  64. 64.

    Lee, S. Y. & Mohamed, R. Rediscovery of Aquilaria rostrata (Thymelaeaceae), a species thought to be extinct, and notes on Aquilaria conservation in Peninsular Malaysia. Blumea Biodivers. Evol. Biogeogr. Plants 61(1), 13–19 (2016).

    Google Scholar 

  65. 65.

    Stegemann, S., Keuthe, M., Greiner, S. & Bock, R. Horizontal transfer of chloroplast genomes between plant species. Proc. Natl. Acad. Sci. 109(7), 2434–2438 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Herber, B.E. Thymelaeaceae. In Flowering Plants· Dicotyledons 373–396. (Springer, Berlin, 2003).

  67. 67.

    Domke, W., 1934. Untersuchungen über die systematische und geographische-Gliederung der Thymelaeaceen nebst einer Neubeschreibung-ihrer Gattung.

  68. 68.

    Beaumont, A. J. et al. Gnidia (Thymelaeaceae) is not monophyletic: taxonomic implications for Thymelaeoideae and a partial new generic taxonomy for Gnidia. Bot. J. Linn. Soc. 160(4), 402–417 (2009).

    Google Scholar 

Download references


We thank Dr. Mohd Noor Mahat from FRIM for providing valuable plant specimens for this study. This work was supported by Universiti Putra Malaysia under the Research University Grant Scheme – PUTRA Grant [Project No. UPM/800-3/3/1/9630100] from the Ministry of Education of Malaysia.

Author information




R.M. and S.Y.L. designed the study, collected, and identified plant materials. W.L.N. prepared the samples. M.S.H. performed data analysis guided by W.L.N., S.Y.L. and S.I.R. M.S.H. drafted and revised the manuscript. S.Y.L. and R.M. edited the manuscript. D.U.L., S.I.R., and R.M. supervised. R.M. acquired funding. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Rozi Mohamed.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hishamuddin, M.S., Lee, S.Y., Ng, W.L. et al. Comparison of eight complete chloroplast genomes of the endangered Aquilaria tree species (Thymelaeaceae) and their phylogenetic relationships. Sci Rep 10, 13034 (2020).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing