Genome sequence and analysis of the Japanese morning glory Ipomoea nil

Ipomoea is the largest genus in the family Convolvulaceae. Ipomoea nil (Japanese morning glory) has been utilized as a model plant to study the genetic basis of floricultural traits, with over 1,500 mutant lines. In the present study, we have utilized second- and third-generation-sequencing platforms, and have reported a draft genome of I. nil with a scaffold N50 of 2.88 Mb (contig N50 of 1.87 Mb), covering 98% of the 750 Mb genome. Scaffolds covering 91.42% of the assembly are anchored to 15 pseudo-chromosomes. The draft genome has enabled the identification and cataloguing of the Tpn1 family transposons, known as the major mutagen of I. nil, and analysing the dwarf gene, CONTRACTED, located on the genetic map published in 1956. Comparative genomics has suggested that a whole genome duplication in Convolvulaceae, distinct from the recent Solanaceae event, has occurred after the divergence of the two sister families.

were mapped on the classic linkage map. The cd, fe, dy, a3, mg, dp, and dk-2 mutations were assigned to classic LG1, LG2, LG3, LG4, LG5, LG6, and LG10, respectively. The recessive mutations of c1 and sp were also assigned to LG3. In this study, LG3N with dy and LG3S containing c1 and sp were found to correspond to different chromosomes (Supplementary Table S22).  Figure S7. Mis-assembly breakage process. Case 1 and 2 depicts breakage using BAC-end pair information. In case 1, the breakpoint is at the nearest complete BAC-end pair, and in case 2, the breakpoint is at the nearest BAC-end read, whose read-pair is in a different scaffold. Also, when there is not sufficient BAC-end read information, the SNP marker from the linkage maps was used as the breakpoint (Case 3). All cases were identified using disputes in linkage maps and were split into 3 separate scaffolds. The first and last scaffolds were assigned to corresponding chromosomes from the linkage map. Figure S8. Histogram of the observed BAC-end inserts. The BACend reads were aligned against the scaffolds, and the insert lengths between the pairs were calculated, and a histogram was plotted after removing outliers.  Supplementary Table S20. Supplementary Table S1 Step 1

Supplementary Tables
Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 The asterisks indicate phosphorothioate bonds.

Tissues Sampling condition Flowers
Tissues include sepals, petals, stamens, and carpels with short peduncles. Fully opened flowers, large flower buds (1-3 days before flower opening), and small flower buds (more than 4 days before flower opening) were separately collected in the evening. Leaves Various-sized leaves with short peduncles were mixed. Samplings were done at 4:30 on October 5, 2011 and at 14:30 on January 11, 2012. Stems Young stems, including the tips. Seed coats Seed coats on immature seeds in various developmental stages were mixed. Embryos Immature green embryos. Small embryos without bending cotyledon and large embryos with bending cotyledons were separately collected and subjected to RNA extraction. Roots Three-week-old roots cultured in vermiculite.
Flowers, leaves, stems, and seed coats are from a mature TKS plant using a whole genome shotgun sequence. Embryo and roots are from the progeny of the plant.

Supplementary Table S23. Starter materials for the EST analysis
Tissues Sampling condition Flowers, flower buds Plant was grown in a greenhouse. Tissues include sepals, petals, stamens and carpels with short peduncles. Fully opened flowers and flower buds at 6 different stages were collected separately. The stages were 12 h and 36 h before flower opening, 30-40 mm, 20-30 mm, 10-20 mm, and less than 10 mm in length.

Seedlings
Plants were grown in a growth chamber, Biotron LH300 (Nippon Medical and Chemical Instruments) set to 28 °C. Aerial parts of the 8-day-old seedlings were collected. The light conditions were continuous light, 16 h light and 3 h dark after continuous light, and 10 h dark after continuous light.

Seed coats
Seed coats on immature seeds in various developmental stages were mixed. Plant was grown in a greenhouse.

DNA isolation
Genomic DNA for the shotgun sequence analysis was extracted from flower petals of young buds. Young buds were collected, frozen with liquid nitrogen, and stored at -80 To characterize the CT gene, genomic DNA was isolated from the leaves by either the NA-2000 or PI-480 (Kurabo) automated DNA isolation systems.

RNA isolation
Samples were collected, immediately frozen using liquid nitrogen, and stored at -80 °C until use. For RNA-seq analysis of the six tissues (Supplementary Table S21 Total RNA subjected to cDNA library constructions was isolated from tissues using a guanidinium isothiocyanate extraction buffer and purified by CsCl (cesium chloride) centrifugation. The tissues and stages of the sample are listed in Supplementary Table   S23. Each 1-g sample was ground to powder in liquid nitrogen with a mortar and pestle, 10 ml of the extraction buffer was added, and it was then homogenized using an ultra were used to synthesize the first and second strand cDNA respectively. The fragments were digested with XhoI and cloned into the SalI site of the λFLC-I vector 5 .
Construction of the JMSF library was ordered to Danaform. First and second strand cDNA was synthesized using 1st strand and 2nd strand primers respectively. One round of normalization was performed, as described 6 , and they were then cloned into the λFLC-III vector 5 . The λ vector clones were subsequently converted into pFLC-I and pFLC-III phagemid derivatives by in vivo excision and transformed into phage resistant E. coli DH10B T1.
JMCP stand for Japanese morning glory seed coat PCR, and the JMCP library was constructed using the SMART cDNA Library Construction Kit (Clontech) in accordance with the manufacturer's protocol, with slight modification. After cDNA synthesis using long-distance PCR, the amplified cDNA fragments were cloned into pCR-XL-TOPO (Invitorogen) and transformed into E. coli TOP10.

EST analysis
Plasmid DNA was prepared from cDNA clones that were randomly chosen from the cDNA libraries. The 5´-and 3´-end sequences of the clones were determined using the ABI Prism 3100 Genetic Analyzer and ABI Prism 3700 Genetic Analyzer with BigDye version 3.1 chemistry (Applied Biosystems). The numbers of the clones analyzed as well as those of the obtained EST sequences are listed in Supplementary Table S24. The entire sequences of the remaining five BAC clones were sequenced using a shotgun sequencing procedure and were used for genome assembly validation.

Organellar genome sequence and annotation
BAC clones carrying the chloroplast and mitochondria genome fragments were selected by using the end sequences of the BAC clones. The clones are JMHiBa067I20, and editing the assembled sequence. The assembled chloroplast and mitochondria genomes were annotated using DOGMA 9 and MITOFY 10 respectively. Initiation and stop codons as well as intron/exon boundaries were manually corrected. The published partial chloroplast genome of I. nil line REM459 (KF242487) 11 was used as a reference for manual correction. The organellar genome maps were generated using OrganellarGenomeDRAW 12,13 .
Leaving out the partial and smaller overlapping contigs, the chloroplast and mitochondrial sequences were able to be completely reconstructed from just five and three sequences respectively in the PacBio based assembly. One of the three mitochondrial sequences (approximately 244 kb), which was merged as a chimeric misassembly with the end of a chromosomal contig, was separated manually prior to scaffolding.

Mis-assembly elimination at the contig level
When mis-assemblies, as predicted using linkage maps, occurred at the contig level rather than the scaffold level, the following method was followed to split the scaffolds.
At the contig level, since there are no gap boundaries, it would be impossible to locate the exact junction point without a reference sequence. Hence, i) a larger chimeric region was identified using linkage maps; and ii) two breakpoints were induced at each side of the chimeric regions splitting the scaffold into 3 parts. The first part would map to one chromosome (linkage group), and the last part would map to a different chromosome, while the middle part would still remain chimeric; however, the length was narrowed to as short as possible. The following three cases were used to find the breakpoints in chimeric regions: 1) the last base position, where both pairs of the BAC-end reads were concordantly mapped, 2) the last base position of a BAC-end pair where only one read was mapped near the scaffold, and 3) the base position after the SNP marker from the linkage maps ( Supplementary Fig. S7). The breakpoints were then manually split to resolve mis-assembled contigs. Breakpoints were induced at positions, when there are at least 2 markers in a scaffold corresponding to two different pseudo-chromosomes from the linkage maps. The conservative strategy may lead to unnecessary contig breaking, with the tradeoff being shortening in contig lengths, however, the process ensures that there will be fewer mis-assemblies.

Isolation of the TnpA and TnpD transcripts
To isolate the transcripts derived from autonomous Tpn1 family transposons, total RNA was extracted from the Q1072 strain, where Tpn1 actively transposes. Primers were designed from a series of defective Tpn1 family transposons 14 ( Supplementary Fig. S20 and Supplementary Tpn+2569F were used to amplify the TnpD transcript. Using TNPA15-1R and Tpn+2569F, the TnpA transcript was obtained. 5´-RACE (rapid amplification of cDNA ends) was performed using 5´-RACE systems (Invitrogen) in accordance with the manufacturer's protocols. The reverse-transcription step was performed using