Nanopore sequencing reads improve assembly and gene annotation of the Parochlus steinenii genome

Shin, Seung Chul; Kim, Hyun; Lee, Jun Hyuck; Kim, Han-Woo; Park, Joonho; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Ji Hee; Lee, Hyoungseok; Kim, Sanghee

doi:10.1038/s41598-019-41549-8

Download PDF

Article
Open access
Published: 25 March 2019

Nanopore sequencing reads improve assembly and gene annotation of the Parochlus steinenii genome

Seung Chul Shin ORCID: orcid.org/0000-0001-6835-484X¹,
Hyun Kim¹,
Jun Hyuck Lee^1,2,
Han-Woo Kim^1,2,
Joonho Park³,
Beom-Soon Choi⁴,
Sang-Choon Lee⁴,
Ji Hee Kim⁵,
Hyoungseok Lee ORCID: orcid.org/0000-0002-5831-6345^1,2 &
…
Sanghee Kim⁵

Scientific Reports volume 9, Article number: 5095 (2019) Cite this article

7323 Accesses
16 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Parochlus steinenii is a winged midge from King George Island. It is cold-tolerant and endures the harsh Antarctic winter. Previously, we reported the genome of this midge, but the genome assembly with short reads had limited contig contiguity, which reduced the completeness of the genome assembly and the annotated gene sets. Recently, assembly contiguity has been increased using nanopore technology. A number of methods for enhancing the low base quality of the assembly have been reported, including long-read (e.g. Nanopolish) or short-read (e.g. Pilon) based methods. Based on these advances, we used nanopore technologies to upgrade the draft genome sequence of P. steinenii. The final assembled genome was 145,366,448 bases in length. The contig number decreased from 9,132 to 162, and the N50 contig size increased from 36,946 to 1,989,550 bases. The BUSCO completeness of the assembly increased from 87.8 to 98.7%. Improved assembly statistics helped predict more genes from the draft genome of P. steinenii. The completeness of the predicted gene model increased from 79.5 to 92.1%, but the numbers and types of the predicted repeats were similar to those observed in the short read assembly, with the exception of long interspersed nuclear elements. In the present study, we markedly improved the P. steinenii genome assembly statistics using nanopore sequencing, but found that genome polishing with high-quality reads was essential for improving genome annotation. The number of genes predicted and the lengths of the genes were greater than before, and nanopore technology readily improved genome information.

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Introduction

Parochlus steinenii is a winged midge found on islands off the coast of Antarctica^1,2. It is a polytypic species and is widely distributed throughout Patagonia and the Maritime Antarctic and sub-Antarctic¹. P. steinenii midges from the Maritime Antarctic are more closely related to those from the sub-Antarctic than to those from Patagonia. The divergence period between midges from the Maritime Antarctic South Shetland Islands and those from sub-Antarctic South Georgia is 7.6 million years (Myr)³. In the maritime Antarctic, another midge, Belgica antarctica, occur naturally with P. steinenii¹. The wingless midge, B. antarctica are freeze-tolerant in their larval stage, and the draft genome was recently reported⁴. However, P. steinenii are not freeze-tolerant but cold-tolerant¹. This different adaption in Antarctic midges are interesting in terms of evolutionary processes within a harsh environment. Previously, we have reported the genome of the Antarctic midge P. steinenii⁵, but the completeness of the genome assembly was only 67.2% and the completeness of the annotated gene sets was only 70.7%. The genome completeness and gene set completeness of draft genome of B. antarctica is 86.4% and 86.6%, respectively. These results were due to the limited contig contiguity in the draft genome of P. steinenii. Recently, there have been many reports of improvements in assembly using nanopore technology^6,7,8,9,10. Base-calling methods have been improved sufficiently^11,12, thus the base quality of nanopore reads was reported to be enough for the de novo genome assembly^6,7,10,13. The development of ultra-long reads up to 882 kb is only a merit of nanopore technology⁸. Various methods for improving low base quality of the assembled sequence have also been reported^10,14. High quality reads and signal-level data of nanopore reads were used to improve the base quality of draft genome sequence^10,14. In this study, we applied these nanopore technologies to upgrade the draft genome sequence of P. steinenii. Prior to a comparative analysis between Antarctic midges, we investigated the difference in annotation.

Results and Discussion

Oxford Nanopore Technology 1D sequencing

We obtained 2 μg of total DNA from 50 adult midges, and constructed an Oxford Nanopore Technologies (ONT) library. The total amount of final library was 930 ng of DNA (Table 1). Through ONT 1D sequencing using a single 1D flow cell, 10,970,289,711 bases were identified from 1,999,088 reads (Table 2).

Table 1 Library preparation.

Full size table

Table 2 Summary of nanopore read statistics.

Full size table

We found that 80% of all reads were longer than 5 kilo base pairs (kbp), 60% of reads were longer than 10 kbp, and 24% of reads were over 20 kbp. The longest read comprised 96,705 bases, and the reads had a mean Phred score (a measure of the quality of base identification) of over 10.4.

De novo genome assembly of Illumina reads and nanopore reads

The scaffold sequence generated from ALLPATHS-LG in a previous study⁵ contained information about ambiguities within the assembly. For comparison with assemblies from nanopore reads, we removed the assembly ambiguity information, and filled the gaps in the resulting scaffolds. The final assembly using Illumina reads had a total size of 138 mega base pairs (Mbp), comprising 9,132 contigs with an N50 contig size of 36,946 and an N50 scaffold size of 176 kbp (Table 3).

Table 3 Genome assembly statistics.

Full size table

Assembly of the nanopore reads was performed using the Canu-SMARTdenovo method¹⁵. Nanopore reads were corrected with Canu (ver. 1.1.1)¹⁶ before assembly, and we obtained 341,108 corrected reads with 5,742,044,883 bp (Table 2). All trimmed reads were longer than 5 kb, 96% were longer than 10 kb, and 39% were longer than 20 kb. The maximum read length was reduced to 87,202 bp. The resulting reads were assembled using SMARTdenovo¹⁷. The final assembled genome comprised 145,366,448 bp, the number of contigs decreased from 9,132 to 162, and the N50 contig size increased from 36,946 to 1,989,550 bp. The maximum contig size increased markedly from 320,332 to 9,644,260 bp (Table 3). The draft genome sequence assembled from nanopore reads (NR) exhibited excellent contiguity compared to that of the draft genome sequence assembled from the Illumina reads (IR).

Genome polishing and the genome completeness of draft genome sequences

The accuracy of draft genome sequences assembled from nanopore sequencing reads is reported to be below 98%⁸. We used two programs to improve the accuracy of the draft genome sequence (Fig. 1)⁸. First, we used Nanopolish (ver. 0.10.1)¹⁰, which is a software package for single-level analysis of nanopore sequencing. Nanopolish can improve the quality of the consensus sequence through signal-level data in the FAST5 files. We used the newly aligned read information about the draft assembly obtained using BWA (ver. 0.7.17)¹⁸ and the signal-level data to improve the quality of the consensus sequence during genome polishing¹⁰. Next, we used Pilon (ver. 1.22) to polish the draft assembly¹⁴. Pilon was developed to improve variant detection and genome assembly. It uses high-quality reads such as an Illumina reads to correct draft assemblies constructed from relatively low-quality reads^8,14. After genome polishing of NR, the identities between IR and NR increased from 0.53 to 0.79% (Table 4). However, the maximum identity was below 99%. This may have been due to heterogeneity and variation in the DNA samples, which were obtained at different times, even from the same site.

Table 4 Summary of genome polishing.

Full size table

The genome completeness of the draft genome sequences was validated using benchmarking universal single-copy orthologs (BUSCO; ver. 3)^19,20. We conducted BUSCO analyses against Eukaryota, Insecta, and Diptera datasets (Fig. 2 and Table 5). Although the contiguity of the NR markedly improved, BUSCO completeness assessments for the genome were lower than those of the IR. As BUSCO estimates the genome completeness by gene annotation using Augustus with BUSCO group consensus sequences, the bases exhibiting low quality in the NR may decrease the rate of gene annotation and lower the rates of BUSCO completeness assessments for the genome. Given this, we could identify that genome polishing improving the accuracy of base qualities increased BUSCO completeness assessment for the genome of the NR (Tables 4 and 5). Although the identity did not increase dramatically after genome polishing, the genome completeness assessment of the NR obtained using Nanopolish with signal-level data (NR + np) increased to a level similar to that of the IR. Nanopolish improved the genome completeness assessment, but the effect was less than that of genome polishing using Illumina reads. Genome polishing with Pilon using Illumina reads (NR + pl, NR + np + pl, and NR + np + pl × 2) increased completeness values of NRs to more than 98.7% in the BUSCO analysis against Eukaryota odb9, to 97.9% against Insecta odb9, and to 91.3% against Diptera odb9 (Fig. 2). Genome polishing using Pilon alone markedly increased the genome completeness assessment of the NRs.

Table 5 BUSCO completeness assessments for genomes.

Full size table

Repeat analysis and non-coding RNA

The total coverage of repeat sequences in P. steinenii ranged from 6.74 to 11.89% of the total contig length (Table 6). Almost all statistics for repeats were similar among the draft genome sequences (Table 6); however, the number and the total length of masked interspersed repeats increased in the NR, and those of predicted long interspersed nuclear elements (LINEs) and unclassified repeat among the interspersed repeats increased markedly (Table 7). The total length of non-LTR retrotransposons comprise long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs) also increased. The number of predicted tRNAs ranged from 151 to 172 (Table 6).

Table 6 Major repetitive content and tRNAs.

Full size table

Table 7 Statistics of interspersed repeats contents.

Full size table

Gene annotation and gene set completeness of draft genome sequences

As reported in Table 8, 11,690 genes were predicted in the IR. The number of genes in NRs (NR + np, NR + pl, NR + np + pl, and NR + np + pl × 2) was predicted to be similar. Except for the NR, the number of genes ranged from 11,690 to 12,074. A relatively large number of genes (16,956) was predicted in the NR compared to the other draft genome sequences, whereas the total length of the gene regions was smaller than in the others sequences. The total length of the gene regions increased in NRs (NR + np, NR + pl, NR + np + pl, and NR + np + pl × 2) after genome polishing, but the total lengths of the coding sequence and gene regions did not increase compared with the total length of the gene regions in NR + np. Instead, the total lengths of intron and untranslated regions (UTRs) increased. In the NRs polished using Pilon (NR + pl, NR + np + pl, and NR + np + pl × 2), the total lengths of the exons, coding sequences (CDSs), and introns increased, and the total lengths of the 5′-UTR and 3′-UTR regions were similar to those of the IR (Table 8).

Table 8 Summary of MAKER2 annotation.

Full size table

Annotation edit distance (AED) values of annotated genes lie between 0 and 1; if the alignment evidence matches the annotated gene exactly, the AED value is 0; if there is no supporting evidence, the AED value is 1²¹. Figure 3 comprises a plot of the cumulative distribution of the AED values for each assembly and a box plot of the AED scores. The AED distribution of NR + np was shifted slightly toward lower AED values relative to the IR below 0.5, and those of the NR were shifted toward much lower AED values than NR + np. The AED distribution of the IR and the NRs polished using Pilon (NR + pl, NR + np + pl, and NR + np + pl × 2) had similar cumulative distributions of AED below 0.2, but those of the NRs were shifted slightly to lower AED values relative to the IR from an AED value of 0.2 (Fig. 3a). In the box plot, the 25th percentile, the 75th percentile, and the median showed that the annotated gene quality of the NRs polished with Illumina reads (NR + pl, NR + np + pl, and NR + np + pl × 2) did not increase markedly compared with that of the IR (Fig. 3b).

We performed a BUSCO analysis against three datasets (Eukaryota odb9, Insecta odb9, and Diptera odb9) to assess the annotated gene set completeness of the assemblies. In the NRs, the gene set completeness increased markedly after genome polishing (Fig. 4 and Table 9). The gene set completeness of NR + np exceeded that of the IR. Genome polishing using Pilon (NR + pl, NR + np + pl, and NR + np + pl × 2) improved the gene set completeness by more than 88.8% against Eukaryota odb9, by 89.5% against Insecta odb9, and by 84.2% against Diptera odb9, irrespective of genome polishing using Nanopolish or the number of Pilon repetitions. Before genome polishing, the NR had low gene set completeness (below 50%). Fragmented BUSCOs appeared to increase owing to their low accuracy in the assembly (Fig. 4 and Table 9). The IR had a gene set completeness of 79.5% against Eukaryota odb9, 79.7% against Insecta odb9, and 67.8% against Diptera odb9.

Table 9 BUSCO completeness assessments for gene sets.

Full size table

Conclusion

Recently, reports of genome assemblies produced from nanopore reads have increased, and the improvement to contiguity in such genome assemblies is seen as a benefit of using long reads⁸. Therefore, we applied nanopore reads to a draft genome of P. steinenii assembled from Illumina MiSeq data, and investigated the difference in annotation. Low-quality nanopore reads were sufficient to improve the genome completeness, but nanopore reads alone were not sufficient to improve the annotation quality of the assembly when compared with that of the draft assembly produced using Illumina reads. Genome polishing with high-quality reads effectively improved the gene set completeness of the genome assembly produced using nanopore reads. Through MAKER annotation, we could identified the improvements in the gene set completeness without a difference in AED value. The genome of P. steinenii is smaller than 150 Mbp, so just one MinION cell is sufficient to increase the quality of its assembly and annotation.

Materials and Methods

Sample and DNA preparation

We collected P. steinenii adults from fresh water on King George Island, West Antarctica (62° 14′ S, 58° 47′ W) during 2018. We used 50 adult midges for DNA preparation. Genomic DNA was extracted using a DNeasy Tissue Kit (Qiagen, Valencia, CA, USA), and we used 2 μg of DNA for library construction and sequencing.

Oxford Nanopore Technology library preparation and 1D sequencing

We constructed a genomic library for ONT sequencing using the ONT 1D ligation sequencing kit (SQK-LSK108) according to the manufacturer’s instructions^8,9. We constructed the library in three steps and measured the DNA concentration using a PicoGreen assay at the end of each step (Table 1). First, we subjected 2.0 μg of genomic midge DNA to DNA repair using an NEBNext FFPE Repair Mix (NEB cat no. M6630) to eliminate DNA fragmentation. After purification using AMPure XP beads, we subjected the repaired genomic DNA to end repair and dA-tailing using an NEBNext Ultra II End-Repair/dA-tailing Module (NEB cat no. E7546), and purified the DNA using AMPure XP beads. We ligated an adapter for sequencing to the purified DNA using adapter mix 1D in an SQK-LSK108 kit and an NEB Blunt/TA ligase Master Mix (NEB cat no. M0367). Finally, we cleaned-up the adaptor-ligated DNA using AMPure XP beads, an ABB buffer, and an elution buffer. We quantified the final library using a Qubit.

Oxford nanopore technology library preparation and 1D sequencing

We carried out sequencing using a GridION X5 sequencer and a single 1D flow cell (FLO-MIN106) with protein pore R9.4 1D chemistry for 48 h according to the manufacturer’s instructions. The FAST5 files generated during sequencing were live base-called using Guppy software (ver. 0.5.1) installed on GridION X5 using the default parameters. Sequencing and base-calling were controlled using ONT MinKNOW software (ver. 1.14.1). The FASTQ files obtained by base-calling were merged into single files and used for trimming using Porechop (ver. 0.2.3)²². All sequencing procedures were performed by Phyzen Co. Ltd. (Seongnam, Korea).

De novo genome assembly of Illumina reads

The sequencing reads generated from the paired-end library (400 bp: SRX1976250) and the mate-pair library (3 kbp: SRX1976251 and 5 kbp: SRX1976252) from a previous study⁵ were trimmed using fastq_quality_trimmer in the FASTX-Toolkit (ver. 0.0.11)²³ with the parameters “-t 30 –l 200 –Q 33”, and the resulting trimmed Illumina reads were assembled into scaffolds using ALLPATHS-LG (ver. 44849)²⁴. The resulting scaffold sequence contained information about ambiguities within the assembly. These ambiguities are also represented as a comma-separated list of alternatives within curly braces in extended FASTA (eFASTA) format, which is another output format in ALLPATHS-LG. We removed the assembly ambiguity information using the efasta2fasta script²⁵, which converts eFASTA to FASTA. The gaps in the resulting scaffolds were filled using GapFiller (ver. 2.1.1) with the parameters “-m 30 -o 2 -r 0.7 -n 5 -d 3000 -t 5 -g 1 -T 10 -i 1”²⁶.

Error correction and de novo genome assembly of nanopore reads

De novo genome assembly was performed using Canu-SMARTdenovo methods¹⁵. Nanopore reads were corrected using Canu (ver. 1.1.1)¹⁶. As the default parameters of Canu are applicable to a single 1D flow cell with protein pore R9.4 1D chemistry, and the genome size of P. steinenii predicted with GenomeScope is 143.8 Mbp according to a previous study^5,27, we corrected the trimmed reads with default parameters and with “genomeSize = 140 m –nanopore-raw” according to Canu FAQ²⁸. The resulting reads were assembled using SMARTdenovo^15,17. A dot matrix over-lapper was selected as the over-lapper engine, and k-mer was set to 16.

Genome polishing and the identity values of the draft genome sequences

We aligned sequencing reads obtained from ONT using Burrows-Wheeler Aligner (BWA; ver. 0.7.17)¹⁸ with parameters “-x ont2d”, and these were polished using Nanopolish (ver. 0.10.1)¹⁰. MiSeq reads were also aligned using BWA, and the obtained information was used for genome polishing using Pilon (ver. 1.22)¹⁴. The identity values of the draft genome sequence assembled from nanopore reads were computed based on the draft genome sequence assembled from the Illumina reads using the nucmer command in the MUMmer tool (ver. 3.0.) with parameters “-l 100 –c 500 –maxmatch”^8,29. The resulting delta file was processed with the dnadiff script in the MUMmer tool, and average 1-to-1 alignment identity was used⁸.

Repeat analysis and non-coding RNA

Repeat sequences for P. steinenii were predicted using RepeatMasker (ver. 3.3.0)³⁰, a de novo repeat library was used as the database, and rmblastn (ver. 2.6.0) was used as a search program³¹. A de novo repeat library was constructed using RepeatModeler (ver. 1.0.11)³², including the RECON (ver. 1.08)³² and RepeatScout (ver. 1.0.5) software³³, with default parameters. Tandem repeats, including simple repeats, satellites, and low-complexity repeats, were predicted using TRF³⁴. Putative tRNA genes were identified using tRNAscan-SE (ver. 2.0)³⁵ with option “-E -H”.

Gene annotation

We carried out gene annotation using the MAKER annotation pipeline^21,36. We used the RepBase library (ver. 20170100)³⁷ to mask the repeat sequence in the draft genome with RepeatMasker (ver. 3.3.0)³⁰, and selected the SNAP gene finder³⁸ for ab initio gene prediction. RNA and protein sequences used in previous studies were aligned and used to find the best possible gene model in MAKER2³⁶. Upper limit of the AED metric for controlling the quality of annotation for the final gene predictions was set to 1 in MAKER2³⁶.

Genome and gene set completeness of draft genome sequences

The genome completeness and gene set completeness of the draft genome sequences was validated using BUSCO (ver. 3)^19,20. For the Augustus step in BUSCO, training data set for Aedes aegypti was selected. We conducted BUSCO analyses against Eukaryota, Insecta, and Diptera datasets.

Accession codes

The raw data have been deposited at the National Center for Biotechnology Information (NCBI) BioProject repository PRJNA284858 (SRX5001002).

References

Convey, P. & Block, W. Antarctic Diptera: ecology, physiology and distribution. European Journal of Entomology 93, 1–14 (1996).
Google Scholar
EDWARDS, M. & USHER, M. B. The winged Antarctic midge Parochlus steinenii (Gerke) (Diptera: Chironomidae) in the South Shetland Islands. Biological Journal of the Linnean Society 26, 83–93 (1985).
Article Google Scholar
Allegrucci, G., Carchini, G., Todisco, V., Convey, P. & Sbordoni, V. A molecular phylogeny of Antarctic Chironomidae and its implications for biogeographical history. Polar Biology 29, 320–326 (2006).
Article Google Scholar
Kelley, J. L. et al. Compact genome of the Antarctic midge is likely an adaptation to an extreme environment. Nature communications 5 (2014).
Kim, S. et al. Genome sequencing of the winged midge, Parochlus steinenii, from the Antarctic Peninsula. GigaScience 6, 1–8 (2017).
CAS PubMed PubMed Central Google Scholar
Eccles, D. et al. De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads. BMC biology 16, 6 (2018).
Article Google Scholar
Giordano, F. et al. De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Sci Rep 7, 3935 (2017).
Article ADS Google Scholar
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature biotechnology 36, 338 (2018).
Article CAS Google Scholar
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome biology 17, 239 (2016).
Article Google Scholar
Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature methods 12, 733 (2015).
Article CAS Google Scholar
Ryan, R. & Wick, L. M. J. A. K. E. H. Comparison of Oxford Nanopore basecalling tools (2018).
Sahoo, N. Sequence Base-calling through Albacore software: A part of the Oxford Nanopore Technology (2017).
Deschamps, S. et al. Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens. Sci Rep 6, 28625 (2016).
Article ADS CAS Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one 9, e112963 (2014).
Article ADS Google Scholar
Schmidt, M. H.-W. et al. De novo assembly of a new Solanum pennellii accession using nanopore sequencing. The Plant Cell 29, 2336–2348 (2017).
Article CAS Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research, gr. 215087.215116 (2017).
SMARTdenovo, https://github.com/ruanjue/smartdenovo. Accessed 19 November 2018.
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, btv351 (2015).
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Molecular biology and evolution 35, 543–548 (2017).
Article Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. Bmc Bioinformatics 12, 1 (2011).
Porechop. https://github.com/rrwick/Porechop Accessed 19 November 2018.
FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit. Accessed 19 November 2018.
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108, 1513–1518 (2011).
efasta2fasta script. https://github.com/nylander/efasta2fasta. Accessed19 November 2018.
Nadalin, F., Vezzi, F. & Policriti, A. GapFiller: a de novo assembly approach to fill the gap within paired reads. Bmc Bioinformatics 13, S8 (2012).
Vurture, G. W. et al. GenomeScope: Fast reference-free genome profiling from short reads. bioRxiv, 075978 (2016).
Canu FAQ. https://canu.readthedocs.io/en/latest/faq.html. Accessed 19 November 2018.
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic acids research 30, 2478–2483 (2002).
Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics, 4.10. 11–14.10. 14 (2009).
RMBlast. http://www.repeatmasker.org/RMBlast.html. Accessed 19 November 2018.
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome research 12, 1269–1276 (2002).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573 (1999).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research 25, 955–964 (1997).
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome research 18, 188–196 (2008).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research 110, 462–467 (2005).
Korf, I. Gene finding in novel genomes. Bmc Bioinformatics 5, 1 (2004).

Download references

Acknowledgements

The present study was supported by the following: grant PE18090 and PE19090; Modeling responses of terrestrial organisms to environmental changes on King George Island grant funded by the Korea Polar Research Institute (KOPRI); a grant from the National Research Foundation of Korea (NRF), which was funded by the Ministry of Science and ICT (MSIT) (Grant Number NRF-2017M1A5A1013568; title: Application study on the Arctic cold-active enzyme degrading organic carbon compounds); and KOPRI’s basic research project (Grant Numbers PN18082 and PN19082).

Author information

Authors and Affiliations

Unit of Polar Genomics, Korea Polar Research Institute (KOPRI), Incheon, 21990, Republic of Korea
Seung Chul Shin, Hyun Kim, Jun Hyuck Lee, Han-Woo Kim & Hyoungseok Lee
Department of Polar Sciences, University of Science and Technology, Incheon, 21990, Republic of Korea
Jun Hyuck Lee, Han-Woo Kim & Hyoungseok Lee
Department of Fine Chemistry, Seoul National University of Science and Technology, Seoul, 01811, Republic of Korea
Joonho Park
Phyzen Co., Ltd, Seongnam, 13558, Republic of Korea
Beom-Soon Choi & Sang-Choon Lee
Division of Life Sciences, Korea Polar Research Institute (KOPRI), Incheon, 21990, Republic of Korea
Ji Hee Kim & Sanghee Kim

Authors

Seung Chul Shin
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hyuck Lee
View author publications
You can also search for this author in PubMed Google Scholar
Han-Woo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Joonho Park
View author publications
You can also search for this author in PubMed Google Scholar
Beom-Soon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Choon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ji Hee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyoungseok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sanghee Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.H.K., J.H.L., H.W.K., J.H.P., J.H.K., H.S.L. and S.C.S. designed the study. S.C.S. and S.H.K. collected the samples and performed the experiments. H.K., B.S.C. and S.C.L. analyzed the data. All authors participated in the writing of the manuscript.

Corresponding authors

Correspondence to Seung Chul Shin or Sanghee Kim.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shin, S.C., Kim, H., Lee, J. et al. Nanopore sequencing reads improve assembly and gene annotation of the Parochlus steinenii genome. Sci Rep 9, 5095 (2019). https://doi.org/10.1038/s41598-019-41549-8

Download citation

Received: 06 December 2018
Accepted: 08 March 2019
Published: 25 March 2019
DOI: https://doi.org/10.1038/s41598-019-41549-8

This article is cited by

Intragenomic polymorphisms in the ITS region of high-quality genomes of the Hypoxylaceae (Xylariales, Ascomycota)
- Marc Stadler
- Christopher Lambert
- Eric Kuhnert
Mycological Progress (2020)
Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome
- Seung Chul Shin
- Woong Choi
- Han-Woo Kim
3 Biotech (2020)
De novo Assembly of the Brugia malayi Genome Using Long Reads from a Single MinION Flowcell
- Joseph R. Fauver
- John Martin
- Peter U. Fischer
Scientific Reports (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Improving prime editing with an endogenous small RNA-binding protein

Genome assembly in the telomere-to-telomere era

The variation and evolution of complete human centromeres

Introduction

Results and Discussion

Oxford Nanopore Technology 1D sequencing

De novo genome assembly of Illumina reads and nanopore reads

Genome polishing and the genome completeness of draft genome sequences

Repeat analysis and non-coding RNA

Gene annotation and gene set completeness of draft genome sequences

Conclusion

Materials and Methods

Sample and DNA preparation

Oxford Nanopore Technology library preparation and 1D sequencing

Oxford nanopore technology library preparation and 1D sequencing

De novo genome assembly of Illumina reads

Error correction and de novo genome assembly of nanopore reads

Genome polishing and the identity values of the draft genome sequences

Repeat analysis and non-coding RNA

Gene annotation

Genome and gene set completeness of draft genome sequences

Accession codes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Intragenomic polymorphisms in the ITS region of high-quality genomes of the Hypoxylaceae (Xylariales, Ascomycota)

Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome

De novo Assembly of the Brugia malayi Genome Using Long Reads from a Single MinION Flowcell

Comments

Search

Quick links