Characteristics of Amorphophallus konjac as indicated by its genome

Amorphophallus konjac, belonging to the genus Amorphophallus of the Araceae family, is an economically important crop widely used in health products and biomaterials. In the present work, we performed the whole-genome assembly of A. konjac based on the NovaSeq platform sequence data. The final genome assembly was 4.58 Gb with a scaffold N50 of 3212 bp. The genome includes 39,421 protein-coding genes, and 71.75% of the assemblies were repetitive sequences. Comparative genomic analysis showed 1647 gene families have expanded and 2685 contracted in the A. konjac genome. Likewise, genome evolution analysis indicated that A. konjac underwent whole-genome duplication, possibly contributing to the expansion of certain gene families. Furthermore, we identified many candidate genes involved in the tuber formation and development, cellulose and lignification synthesis. The genome of A. konjac obtained in this work provides a valuable resource for the further study of the genetics, genomics, and breeding of this economically important crop, as well as for evolutionary studies of Araceae family.

order to better understand the association between genetic diversity and KGM content in a broader population of Amorphophallus species.
The main species of Amorphophallus genus have been studied and described in relation to their morphology and palynology [12][13][14][15] .Since the morphological and palynological characters are highly variable, a number of molecular markers have been employed to determine relationships in the genus.These markers include the LEAFY (FLint2) gene and the chloroplast regions rbcL, matK and trnL [16][17][18][19] .Since phylogenetic studies based on these regions do not produce consistent cladograms (due to a high level of conflicting signals in the informative characters), further variable regions and also other non-sequencing molecular methods are needed to establish the evolutionary history of Amorphophallus.The transcriptomics approach may lead to useful insights into important traits such as KGM production, tuber formation and development and other characteristics.
The genomes of two important monocotyledonous species in the order of Alismatales namely Spirodela polyrhiza 20 and Zostera marina 21 have been sequenced and their characteristics have been described by the authors of these papers.Although A. konjac as a glucomannan-producing cash crop in many Asian countries, there have been no any genomic information reports on A. konjac before we conducted whole-genome sequencing on this species.Therefore, we sequenced the whole genome of A. konjac, and the data was submitted to the NCBI database in 2020.Although Gao et al. subsequently provided a high-quality chromosome-level genome of A. konjac 22 , our results can also enrich the genomic information of Amorphophallus to a certain extent.In this study, we performed a series of genomic analyses on A. konjac including assembly, annotations, identification of phylogenetic relationship, gene family analysis, divergence time estimation.We also identified cellulose and lignification synthesis genes, and tuber formation and development genes.The results will provide important insights as well as resources for future study of A. konjac.

Genome assembly and annotation
The DNA sequencing data (1119.58Gb, average 110× coverage) of the A. konjac sample were obtained using the Illumina Hiseq 2500 sequencer.A summary of the sequence data used for the assembly is presented in Table S1.The estimated genome size is 4,512,012,462 bp using 19-mer frequency distribution based on the paired-end sequenceing data (Fig. S1), which is consistent with measurement by flow cytometry (Fig. S2).Based on the Illumina sequencing data, 2.99 Gb contigs were assembled using SOAPdenovo2 23 (Table S2).After constructing scaffolds and filling gaps, the 4.58 Gb A. konjac reference genome was assembled, and this resulted in the 7,423,768 scaffolds with a scaffold N50 of 3212 bp (Tables 1, S2).The A. konjac genome shows significant genomic synteny with S. polyrhiza.The assembly performed in this study captured 75.81% (188 of 248) of core eukaryotic genes (Table S3) and captured 624 complete BUSCOs v5.2.2 (Table S4) using core eukaryotic genes mapping approach software (CEGMA) and BUSCO software 24 , respectively.
Combination of de novo prediction and homology-based search resulted in identification of 3,289,511,160 bp repetitive elements in A. konjac genome (Table S5), make up about 71.75% of the assembled genomes (Table S5).Most of the repeats were de novo predicted (70.98%), the repeats detected by homologous method were relatively few (Table S5).Among the repeats in the A. konjac genome, 69.16% were transposable elements (TEs), of which 52.06% were long terminal repeats (LTR), including 31.42%Gypsy LTRs and 11.6% Copia LTRs (Table S6).
A total of 39,241 protein-coding genes were predicted in assembled genomes following a combination of homology and ab initio methods, with an average coding length of 1372.75 bp and a mean of 2.29 exons per gene, respectively (Table 1, Fig. S3, Table S7), the gene number and average gene length is close to that of S. polyrhiza and the average gene is longer than that of Oryza sativa and Zea mays (Fig. S4, Table S7).Moreover, an average of 92.22% of the RNA sequencing (RNA-seq) reads of the four A. konjac tissues (leaf, stem, root and tuber) could be mapped to the genome.In addition, 65.26% of the predicted genes (25,725/39,241) showed expression levels (FPKM > 0.05) by aligning leaf, stem, root and tuber RNA-seq data to the set of protein-coding genes using TopHat2 25 , and estimating expression values based on the resulting alignments using Cufflinks 26 .In total, 26,456, 26,512, 25,797 and 33,715 of the predicted genes were assigned with a functional annotation in the Swiss-Prot, KEGG, InterProScan, and Trembl databases, respectively (Table S8), a total of 34,126 of the predicted genes (87%) were assigned with a functional annotation in at least one database (Table S8).
An overview of annotated ncRNA is shown in Table S9.1078 miRNAs, 761 tRNAs, 2894 rRNAs and 1553 snRNAs were predicted in A. konjac.S10), that is more than other four species, and the number of singlecopy orthologs genes in A. konjac is 4509.The Venn diagram (Fig. S5a) shows that five species share a common core set of 6438 gene families.
The unique gene families in A. konjac were enriched in nucleobase-containing compound biosynthetic process, nucleobase-containing compound catabolic process, regulation of nucleobase-containing compound metabolic process, aromatic compound biosynthetic process, heterocycle catabolic process, negative regulation of growth, 1,3-beta-d-glucan synthase complex, cytoskeleton organization, membrane, molecular function regulator, peptidase regulator activity, 1,3-beta-d-glucan synthase activity and so on (Fig. S5B).Moreover, the unique gene families containa large number of unique paralogous genes (7847 genes) that are not orthologous to any known genes in other four species, which were enriched in 1,3-beta-d-glucan synthase complex, a series of related components of vesicle membrane and so on in cellular component.The 1,3-beta-d-glucan synthase complex can catalyse the transfer of a glucose group from UDP-glucose to a (1→3)-beta-d-glucan chain, which may be related with the high starch content in tuber and the fast-growing trait in A. konjac.

Evolution, expansion and contraction
To systematically study the evolutionary dynamics of Alismatales species, species phylogeny was performed utilizing single-copy orthologous genes among five species, which included 4509 single-copy orthologous genes in A. konjac.As illustrated in Fig. 1c, the estimated divergence time is 130.7 (124.6-139.9) million years ago (MYA) between Alismatales and Poaceae, Araceae and Zosteraceae separated at about 124.6 (115.3-131.9)MYA, the divergence time is 86.2 (78.2-96.0)MYA between S. polyrhiza and A. konjac (Fig. 1c).This result based on genomic data will provide a phylogenetic framework for interpreting the evolutionary events of the family.
Comparative analysis of the gene family expansion and contraction showed that 1647 gene families have expanded and 2685 contracted in the A. konjac genome (Fig. 1c).Based on the InterProScan functional annotation, the expansive genes in A. konjac were enriched in iron coordination entity transport, vitamin E metabolic process, vitamin E biosynthetic process, cofactor transport, heme transport and so on in the biochemical processes (p-value < 0.05) (Fig. S6).Furthermore, the gene families that had undergone contraction in A. konjac were enriched in reproduction, pollination, pollen-pistil interaction, multi-sprout formation, reproductive process, cell recognition and various biochemical processes (p-value < 0.05) (Fig. S7), which may suggest that the mode of reproduction is asexual reproduction principally in A. konjac, and the occurrence of sexual reproduction needs particular conditions.
Whole-genome duplication (WGD) followed by gene loss has been found in most eudicots and is regarded as the major evolutionary force that gives rise to gene neofunctionalisation in both plants and animals 27 .Synonymous substitution rates showed a unimodal distribution, indicating that the WGD of A. konjac occurred recently (Fig. 1d), it needs better reference genome to identify that whether or not it corresponds to the ⍺SP/ βSP WGDs in Alismatales 20 .

Detection of positively selected genes
Positive selection was proposed to contribute to fitness.Respectively 686 and 122 genes of A. konjac were determined as positively selected genes and compared with S. polyrhiza and Z. marina (Tables S11, S12).GO enrichments showed that more positively selected genes in A. konjac in comparison with S. polyrhiza were involved in RNA biosynthetic process, regulation of biosynthetic process, regulation of gene expression, protein modification process, cell wall organization or biogenesis, transcription, DNA-templated cell synthesis, cell growth and so on (Fig. S8).Moreover, the positively selected genes in A. konjac were more involved than those in Z. marina in leucine biosynthetic process, regulation of signal transduction, regulation of cell communication, regulation of signaling, regulation of response to stimulus and so on (Fig. S9).

Analysis of transcription factor families
Transcription factors regulate gene expression and protein kinases regulate cellular activities by phosphorylating target proteins in response to internal or external signals.This study identified a total of 1275 transcription factors and 345 transcriptional regulators in A. konjac (Table S13).The number of transcription factors in A. konjac is more than in S. polyrhiza (1115 genes), and the number of transcriptional regulators in A. konjac is more than in both, S. polyrhiza and Z. marina (271 and 307 genes, respectively), but fewer than that in maize (573 genes).The AP2/ERF-ERF, GRAS, HSF, SBP, ULT transcription factors are more abundant in A. konjac in comparison with S. polyrhiza and Z. marina, as well as the AUX/IAA, mTERF, and SNF2 transcriptional regulators.This difference may be caused by different growth environment, A. konjac is a terrestrial plant, while other two are hydrophilous plants.In addition, the number of BBR-BPC and ULT genes in A. konjac is higher than in maize.In co-transfection experiments, BBR activates (GA/TC)-containing promoters 27 , and its overexpression in tobacco leads to a pronounced leaf shape modification 28 .In Arabidopsis, the ULTRAPETALA1 (ULT1) gene is a key negative regulator of cell accumulation shoot and floral meristems, and the mutations in ULT1 can cause the enlargement of inflorescence and floral meristems, the production of supernumerary flowers and floral organs, and a delay in floral meristem termination, downregulation of both ULT genes can lead to shoot apical meristem arrest shortly after germination, revealing a requirement for ULT activity in early development 29  www.nature.com/scientificreports/

Contractive cellulose and lignification synthesis genes
Amorphophallus konjac is a lodging plant a trait that is consistent with a reduction of genes involved in cell wall biosynthesis and lignification.According to InterProScan annotation, 50 cellulose synthase (CesA) and cellulose synthase-like (Csl) genes were identified in A. konjac (Table 2), which is obviously fewer than in the woody bamboo species.Lignin, a major component of secondary cell wall, plays an important role for support, water transport and stress responses in vascular plants 19 .A total of 20 genes involved in the lignin biosynthesis pathway were detected in A. konjac (Table 2), which contained 6 lignin biosynthesis gene families out of 10 families (PAL, 4CL, HCT, CCR , F5H, CAD but not C4H, C3H, CCoAMT, COMT).Overall, the absolute copy number of both cellulose-and lignin-related genes decreased in A. konjac compared with woody species.The expression of CesA and Csl genes also showed two different profiles (Fig. 2a), of which the expression of most genes (Cluster I and Cluster II) was higher in tuber, fibre and stem than in leaf, and expression of six genes (cluster III) were higher in leaf than in tuber, fibre and stem.For the expressed profile of lignin-related genes, the leaf and stem showed distinct difference against fibre and tuber (Fig. 2b).

Tuber formation and development genes
Sucrose metabolism is considered important for the development of a plant sink organ.In most plants, assimilated carbon in source leaves is transported as sucrose into sink organs, including roots, tubers, fruit, and seeds 30 .The present study investigated the genes related to starch and sucrose metabolism pathway and found that the expressed profile of most genes in fibre and tuber showed distinct difference against the leaf and stem, which were consistently high expression (Fig. 3, Table S14).To utilise sucrose, this bond should be cleaved to generate the two monosaccharides.Sucrose synthase (SUS) is the key enzyme that catalyzes both the synthesis and the cleavage of sucrose 30 .SUS is a glycosyl transferase, which converts sucrose into UDP-glucose and fructose in the presence of uridine diphosphate (UDP).SUS shows consistently high expression patterns in fibre and tuber, whereas low expression was observed in leaf and stem (Fig. 3).On the other hand, SPS plays a major role in photosynthetic sucrose synthesis by catalysing the rate-limiting step of sucrose biosynthesis from UDP-glucose and fructose-6-phosphate.The expression of sucrose-phosphate synthase (SPS) gene was higher in leaf (Fig. 3), which was consistent with the role played as a limiting factor in the export of photoassimilates out of the leaf.These results suggest that sucrose synthase specifically facilitates the storage and maturation of sinks.Sucrose generated from photosynthates in source organs is transported to sink organs and is then converted into starch.Plants store sugar as polymerised starch, enabling the storage of a larger amount of sugar without problems caused by osmotic pressure 30 .In A. konjac, the starch synthase (glgA), granule-bound starch synthase (WAXY), and glucose-1-phosphate adenylyltransferase (glgC) showed high expression patterns in fibre and tuber (Fig. 3), which catalyse precursor substances to synthesise starch.Specially, the expression of 1,4-alpha-glucan branching enzyme (GBE1) gene was slightly higher in leaf when comparing the three tissues.GBE catalyzes the formation of α-1,6 branching points in starch and plays a key role in synthesis 31 .In general, starch synthesized and accumulated directly from the products of photosynthesis in the leaf during the daytime, and is then degraded into sugars as an energy source for the following night 32 .Therefore, the high expression of GBE1 in leaf may be related to the synthesis of starch through photosynthesis.In addition, 59 putative genes involved in  The expression profiles in FPKM (fragments per kilobase per million reads mapped) of genes involved in the starch and sucrose metabolism pathway in the four tissues (tuber, fibre, stem and leaf) from 7-month-old plant of A. konjac.Data are plotted as log10 values.PYG: glycogen phosphorylase; SUS: sucrose synthase; GBE1: 1,4-alpha-glucan branching enzyme; glgA: starch synthase; malQ: 4-alpha-glucanotransferase; HK: hexokinase; FRK: fructokinase; glgC: glucose-1-phosphate adenylyltransferase; otsB: trehalose 6-phosphate phosphatase; AMY: alpha-amylase; BMY: beta-amylase; EG: endoglucanase; malZ: alpha-glucosidase; bglU: beta-glucosidase; INV: beta-fructofuranosidase; ENPP1_3: ectonucleotide pyrophosphatase/phosphodiesterase family member 1/3; GPI: glucose-6-phosphate isomerase; pgm: phosphoglucomutase; bglX: beta-glucosidase; bglB: beta-glucosidase; SPP: sucrose-6-phosphatase; WAXY: granule-bound starch synthase; TPS: trehalose 6-phosphate synthase/phosphatase; GN: included GN1_2_3 (glucan endo-1,3-beta-glucosidase 1/2/3), GN4 (glucan endo-1,3-beta-glucosidase 4) and GN5_6 (glucan endo-1,3-beta-glucosidase 5/6).
the pathway wrere identified (Fig. 4) according the previous studies on glucomannan biosynthesis 22,33 , and most of them also were highly expressed in fibre and tubers.

Discussion
As a major provider of KGM, A. konjac is abundant in southern China and Japan.The different species of genus Amorphophallus show high genetic diversity.A. konjac is classified as a species with high KGM content.Its tubers contain between 40 and 70% KGM 33 .In the natural habitat, fruiting efficiency of A. konjac is less than 1% through sexual reproduction.Although breeding strategies for A. konjac comprise asexual and sexual reproduction, sexual reproduction happens on the condition of cross-pollination.Increasingly agricultural studies reported that special structure of inflorescence in A. konjac can facilitate the cross-pollination process and possibly increase diversity of KGM-biosynthetic gene pool.However, genomic background of many traits of A. konjac is little known.
Here, we report the earliest sequenced A. konjac genome, which was sequenced by our research team in 2018 and uploaded to the ncbi database.The genome assembly of A. konjac exhibited a total size of 4.58 Gb, which was smaller than the another genome of A. konjac (5.60 Gb) was assembled by Gao et al. using a combination of Illumina, PacBio, and Hi-C technology 22 .Meanwhile, Gao et al. also identified 80.6% of the assembled sequences as repetitive sequences, and 75.6% were transposable elements (TEs) 22 .Among the various TEs, long terminal repeats (LTRs, 74.04%), especially Gypsy (40.28%) and Copia (9.58%) type, were remarkably prevalent in the genome 22 .Nevertheless, we found that A. konjac genome comprised of 71.75% repeat sequences and 69.16% were TEs, including 31.42%Gypsy LTRs and 11.6% Copia LTRs.A potential reason for the smaller genome size and fewer repetitive sequences may be related to the second-generation sequencing data used in the present study.The second-generation sequencing technologies are difficult to get the large repetitive sequences and lead to incomplete assemblies 34,35 .Strong correlation between genome size and the proportion of TEs (especially LTR-Copia and LTR-Gypsy) has been reported in many studies 34,36,37 .In addition, previous studiep also found that the A. konjac and the S. polyrhiza shared a recent WGD event, which is consistent with the results of this study 21 .This study employed the genome analysis to characterise genetic traits of A. konjac.The results implied that A. konjac possesses 3001 unique families and 4509 single-copy orthologs in a total of 13,190 identified genes in comparison with the other four species (Z.marina, O. sativa, S. polyrhiza and Z. mays).In addition, time-tree based on phylogenetic analysis showed that a more closely genetic relationship was found between S. polyrhiza and A. Konjac (divergent time, 86.2 million years) than another three species (divergent time, over 100 million years between A. konjac and Z. marina, O. sativa and Z. mays).Moreover, the data of this study further illustrated that some contracted genes in A. konjac genome are involve in pollination, pollen-pistil interaction and reproductive process, which may offer genomic hints for sexual reproduction of A. konjac.
Positive selection was proposed to contribute to fitness.The ratio of non-synonymous to synonymous substitutions (Ka/Ks), is widely used for the estimation of positive selection at the amino-acid site 38 .Analysis of the ratios of Ka/Ks between Chrysanthemum morifolium and C. boreale two Chrysanthemum species, indicating that 107 genes experienced positive selection, with Ka/Ks more than one, which may have been crucial for the adaptation, domestication, and speciation of Chrysanthemum 39 .In current study, we identified 625 and 111 genes in A. konjac were detected under positive selection compared to S. polyrhiza and Z. marina, respectively.Enrichment analysis suggested that those genes under positive selection are involved in biosynthetic process of RNA and other organic substances, regulatory process of biogenesis, cellular organization and cell growth.These results support the fact that diverse genes were under positive selection in A. konjac, which might influence the adaptation and evolution of A. konjac.Some genes under positive selection can be used as potential biomarkers for breeding outcrossing species.So far, asexual reproduction of tubers is widely used for breeding A. konjac in traditional agriculture.However, many problems are related to asexual breeding process, such as low breeding efficiency, long cultivation cycle, high risk of infectious diseases, and breeding degeneration.Genome analysis in the present study partially demonstrates evolutionary scenario of A. konjac undergoing artificial breeding, and helps to screen outcrossing populations with high KGM content.
Additionally, the analysis of the data collected in the present study suggested that a total of 20 genes were observed to act in biosynthetic pathways of lignin, which might help cells of A. konjac adapt in habitats suitable for fast-growing.
Over a few decades, purified KGM from tubers of A. konjac, a dietary fibre composed of hydro-colloidal polysaccharide, was used widely as food additive as well as dietary supplement in many countries.Results from nutritional studies indicated that KGM can decrease the levels of triglycerides, glucose, cholesterol, and blood pressure, and prevent many chronic diseases through wide-ranging regulation of metabolism 40 .Other studies suggested that KGM content over 50% dry matter should be used to obtain high-purity glucomannan for development of additives and supplements since high-purity glucomannan can easily form transparent and odourless gel with high viscosity.The cultivated A. konjac was reported to be major source of high KGM content material (KGM content over 45% dry matter).Apart from environmental factors and cultivation conditions, genetic factors are presumed to contribute to productive efficiency of high KGM content.However, it is still not clear which genes of A. konjac genome are involved in regulatory process of KGM biosynthesis in tubers.In this study, genomic and transcriptomic analysis has been applied to characterise the metabolic process of starch and sucrose in A. konjac.Previous studies have demonstrated that polysaccharide metabolism is essential both for formation of tuber sink and biosynthetic source of KGM in A. konjac.Transcriptomic analysis of A. konjac in the present study suggested that expression patterns of starch and sucrose metabolism differed between tubers and leaf or stem, and sucrose metabolism related genes maintained consistently higher expression level in tubers than in leaf and stem.For example, starch synthase (glgA), granule-bound starch synthase (WAXY), and glucose-1-phosphate adenylyltransferase (glgC) are more expressed in tubers and fibres than in leaf and stem.Previously, some physiological tests suggested the role of sucrose-phosphate synthase (SPS) as exporting factor of photoassimilates out ofthe leaf.Down regulation of SPS can specifically help A. konjac facilitate storage and maturation of polysaccharides in tubers.The findings in the present study partially clarify versatile functions of polysaccharide metabolism specific to tubers of A. konjac, and thus potentially help to study biosynthetic mechanism of formation of KGM.

Figure 1 .
Figure 1.Overview for evolutionary analysis of A. konjac.(a) Images of the sequenced A. konjac.(b) Ortholog clustering analysis of the protein-coding genes in the A. konjac genome.(c) Phylogenetic tree and divergence time of A. konjac and four plant species.Phylogenetic tree was generated from the single-copy orthologs using the maximum-likelihood method.The divergence time range is shown by red blocks.The predicted divergence time is shown as number inside the pink blocks.The pie charts show the proportion of expanded/contracted gene families in each plant species.(d) Distribution of substitutions per synonymous site (Ks) in A. konjac.

Figure 2 .Figure 3 .
Figure 2. Heatmaps of gene expression.(a) Heatmap depicting the expressed profile of CesA and Csl genes; (b) Heatmap depicting the expressed profile of lignin-related genes.

Table 1 .
Summary of genome assembly and annotation.Based on pair-wise protein sequence similarity, the gene family clustering analysis of five species genes, Z. marina, O. sativa, S. polyrhiza, Z. mays and A. konjac has been carried out.A total of 22,730 genes in A. konjac were clustered into 13,190 gene families, however, A. konjac has 16,691 unclustered genes and 3001 unique gene families (Table1, Fig.1b, Fig.S5A, Table

Table 2 .
20py Data from Guo et al.70.b data from Wang et al.20.