Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system

Rajeev, Meora; Jung, Ilsuk; Lim, Yeonjung; Kim, Suhyun; Kang, Ilnam; Cho, Jang-Cheon

doi:10.1038/s41597-023-02622-0

Download PDF

Data Descriptor
Open access
Published: 17 October 2023

Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system

Scientific Data volume 10, Article number: 707 (2023) Cite this article

5292 Accesses
1 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Biofloc technology is increasingly recognised as a sustainable aquaculture method. In this technique, bioflocs are generated as microbial aggregates that play pivotal roles in assimilating toxic nitrogenous substances, thereby ensuring high water quality. Despite the crucial roles of the floc-associated bacterial (FAB) community in pathogen control and animal health, earlier microbiota studies have primarily relied on the metataxonomic approaches. Here, we employed shotgun sequencing on eight biofloc metagenomes from a commercial aquaculture system. This resulted in the generation of 106.6 Gbp, and the reconstruction of 444 metagenome-assembled genomes (MAGs). Among the recovered MAGs, 230 were high-quality (≥90% completeness, ≤5% contamination), and 214 were medium-quality (≥50% completeness, ≤10% contamination). Phylogenetic analysis unveiled Rhodobacteraceae as dominant members of the FAB community. The reported metagenomes and MAGs are crucial for elucidating the roles of diverse microorganisms and their functional genes in key processes such as nitrification, denitrification, and remineralization. This study will contribute to scientific understanding of phylogenetic diversity and metabolic capabilities of microbial taxa in aquaculture environments.

Metagenome sequencing and 103 microbial genomes from ballast water and sediments

Article Open access 10 August 2023

Microbial community profiling of ammonia and nitrite oxidizing bacterial enrichments from brackishwater ecosystems for mitigating nitrogen species

Article Open access 23 March 2020

Characterization of microbial communities in seven wetlands with different anthropogenic burden using Next Generation Sequencing in Bogotá, Colombia

Article Open access 09 October 2023

Background & Summary

Uncultured microorganisms constitute a significant proportion of microbial populations in an ecosystem and play a vital role in its functioning¹. The challenges associated with cultivating these microbes have constrained access to the vast phylogenetic and functional diversity they possess. However, recent advancements in metagenomics have opened a new window to explore the enigmatic “microbial dark matter”, revealing the hidden genetic potential of as-yet-uncultured microorganisms².

One of the recent advancements in shotgun metagenomic data analysis is the generation of metagenome-assembled genomes (MAGs) through de novo assembly and binning of individual bacterial genomes from complex microbial communities³. This approach provides a culture-independent way to directly reconstruct genomes from environmental samples, thereby offering insights into the genomic makeup and metabolic potential of previously uncharacterized microbial taxa⁴. Since the first successful recovery of MAGs^5,6, the approach has seen a remarkable expansion, with construction of hundreds to thousands of MAGs from a variety of complex environments, including thermal pools⁷, animal and human guts⁸, river estuaries⁹, deep marine sediments¹⁰, and activated sludge^11,12. In fact, these MAGs have been used to explore the functional potential of microbes across various environments^12,13.

Aquaculture is one of the fastest developing food sectors, meeting the global seafood demand¹⁴. As traditional open-water aquaculture systems encounter several challenges such as water pollution, disease outbreaks, and inefficient resource utilization, there is a growing need for sustainable and environmentally friendly aquaculture methods. In this context, biofloc technology (BFT) has emerged as a promising approach that facilitates recycling of toxic nitrogenous components into microbial biomass by supporting the growth of definite microbial consortia¹⁵.

The BFT-based aquaculture system principally relies on balancing the carbon-to-nitrogen (C/N) ratio to stimulate the growth of dense microbial aggregates (biofloc)¹⁶. The floc-associated bacterial (FAB) community helps regulate excessive nutrients, particularly inorganic nitrogen (e.g., ammonia and nitrite), by promoting heterotrophic assimilation. As organic matters accumulate in the biofloc aquaculture system, heterotrophic bacteria use these organic carbon compounds as a source of energy and simultaneously assimilate ammonia and nitrite into cellular components, including proteins and nucleic acids. Through this process, heterotrophic bacteria assimilate deleterious nitrogenous compounds into microbial biomass. This assimilated biomass subsequently serves as a valuable nutrient source for the culturing animals^17,18.

In this manner, BFT systems not only maintains adequate water quality but also offers several other advantages, including enhanced productivity, regulation of animal health, and assurance of biosafety¹⁹. Since microbial communities determine the overall functioning of a BFT aquaculture system, substantial scientific efforts have been devoted to understanding the bacterial community composition of various BFT components^20,21,22. However, most of these studies have used 16S rRNA gene amplicon sequencing (a metataxonomic approach), which provides information on community composition but falls short of capturing the complete genetic diversity and functional potential of microorganisms^23,24. Therefore, earlier studies have recommended the employment of a metagenomic approach to investigate aquaculture systems²⁵.

In the present study, we characterized eight metagenomes derived from the FAB community (>3 µm size fraction) of a commercial aquaculture system in South Korea that operates based on BFT. These metagenomes represent the temporal variations in the FAB community during the growth of two batches of Pacific white shrimp (Litopenaeus vannamei) (Table 1). A schematic diagram of the workflow followed in this study is presented in Fig. 1. The methodological workflow largely involves the collection of rearing water from a commercial biofloc aquaculture system, nucleic acid extraction from the FAB community, Illumina sequencing, and finally the bioinformatics analyses to recover MAGs. The Illumina-generated shotgun metagenome sequencing effort produced a total of 106.6 Gbp, with 12.3–16.8 Gbp per sample, and 353.18 million raw paired-end (PE) reads, with an average of 44.14 million reads per sample (Table 2). After eliminating low-quality reads and applying other quality control criteria, 300.25 million (average 37.53 million per sample) high-quality PE reads were retained. These metagenome reads exhibited a Phred quality score >30 according to the MultiQC report, indicating that the raw reads are of very good quality. The quality control criteria implemented in our study resulted in the elimination of 13.97% to 16.14% of metagenome reads across the analysed metagenomes. Taxonomic classification of the high-quality reads against various RefSeq databases revealed that a predominant fraction of metagenome reads remains unclassified. The relative proportions of these unclassified reads ranges from 60.33% to 82.10% across the biofloc metagenomes, with an average of 70.15% (Fig. 2 and Table 3). Of the classified reads, the highest proportion was attributed to bacteria (average 29.37%), followed by eukaryota (0.28%), archaea (0.10%), fungi (0.06%), and viruses (0.01%). This observation is well corroborated with a previous study that investigated the biofloc-forming community through metagenomic approach²⁶.

Table 1 Sampling period, physicochemical properties, and inorganic nutrient content of rearing water collected from a commercial aquaculture system operating based on BFT.

Full size table

Table 2 An overview of the Illumina sequencing performed on the biofloc metagenomes obtained from a commercial BFT-based aquaculture system.

Full size table

Table 3 Taxonomic classification of biofloc metagenomes based on the Kraken2 program using various RefSeq databases.

Full size table

Next, we used both individual assembly and co-assembly (collectively termed as “mix assembly”) approaches on our datasets (Table 4). The individual assemblies of qualified reads using SPAdes generated a total of 1,175,916 contigs with lengths of ≥1 kbp. The shortest and longest contig lengths obtained were 1.16 Mbp and 2.34 Mbp, respectively. Co-assembly produced a total of 878,328 contigs (length ≥1 kbp) with an N50 length of 3235.

Table 4 Overview of the assembly statistics for the analysed biofloc metagenomes.

Full size table

We further performed binning of the contigs to recover MAGs. The bins obtained from all eight individual assemblies and one co-assembly were dereplicated at an average nucleotide identity (ANI) ≥95%, resulting in a total 444 non-redundant MAGs with completeness ≥50% and contaminations ≤10% (see Quality Metrics File). Among the reconstructed MAGs, 230 were classified as high-quality (completeness ≥90%; contamination ≤5%), while 214 were categorized as medium-quality (completeness ≥50%; contamination ≤10%) (Fig. 3a). All recovered MAGs had a quality score value [defined as completeness – (5 × contamination)] of ≥50. The genome sizes vary from 0.14 to 11.59 Mbp, with the majority falling within the range of 2–5 Mbp (Fig. 3b). Intriguingly, about half of the MAGs (n = 229) possessed less than 200 contigs (Fig. 3c). Of the 230 high-quality MAGs, 61 contained essential ribosomal genes, including the 16S, 23S, and 28S rRNA genes, as well as at least 18 tRNA genes (see Quality Metrics File). These MAGs met the stringent criteria outlined by the Genomic Standard Consortium for high-quality MAGs, ensuring their adherence to the minimum information on MAG (MIMAG) standards²⁷. As expected, a higher proportion of the MAGs recovered in our study lacked ribosomal genes. This may be attributed to the inherent challenges associated with accurately assembling repetitive regions utilizing short-read sequencing methods²⁸.

The taxonomic classification of the recovered MAGs revealed their distribution across nine dominant bacterial phyla, with the majority belonging to Proteobacteria (161 MAGs), Bacteroidota (86), Planctomycetota (38), Myxococcota (27), Patescibacteria (29), Actinobacteriota (20), Bdellovibrionota (11), Verrucomicrobiota (16), Chloroflexota (11), and Bdellovibrionota_C (7) (Fig. 4aand Quality Metrics File). Among the recovered MAGs, the family Rhodobacteraceae occupied a predominant proportion, followed by Flavobacteriaceae. The prevalence of Rhodobacteraceae members in biofloc aquaculture systems has been documented in earlier studies as well^29,30. Notably, phylogenetic molecular network analysis in our recent study revealed that some Rhodobacteraceae members served as keystone taxa in both rearing water and bioflocs³¹. Therefore, this bacterial family may be essential component in regulating the microbial communities of various components in biofloc aquaculture systems.

Several low-abundant bacterial phyla (each represented by <10 MAGs) were also recovered from the FAB community. These phyla include Acidobacteriota (4 MAGs), Chlamydiota (6), Armatimonadota (2), Calditrichota (2), CLD3 (1), Cyanobacteria (4), Delongbacteria (1), Dependentiae (1), Desulfobacterota (2), Eisenbacteria (1), Eremiobacterota (1), Gemmatimonadota (3), Hydrogenedentota (3), and Nitrospirota (1) (see Quality Metrics File). It is intriguing to note that approximately 39% of the recovered MAGs (n = 174) could not be classified at the genus level, while 93% of the MAGs (n = 415) could not be classified at the species level (Fig. 4b). This data emphasizes the necessity of investigating aquaculture environments for microbial phylogeny.

To the best of the authors’ knowledge, this is the first report of multiple MAGs being recovered from a biofloc aquaculture system. The genome-resolved metagenomic approach employed in this study is expected to provide deeper insights into the metabolic potential and functional roles of individual microorganisms in BFT-based aquaculture systems. Gaining a comprehensive understanding of the genomic composition of biofloc-associated bacterial communities can help elucidate their roles in nutrient cycling, water quality management, disease prevention, and overall system performance. Our findings will contribute to the effective management and optimization of aquaculture systems.

Methods

Rearing water sampling and shotgun metagenomic sequencing

The entire methodological workflow followed in this study is represented in Fig. 1. Water samples for metagenomic analysis of the FAB community were collected from a commercial aquaculture system that uses a BFT-based approach to cultivate whiteleg shrimp (Litopenaeus vannamei). The investigated aquaculture system is located in Ganghwa-do, Incheon, Republic of Korea (37.7000 N, 126.3888 E). We collected surface rearing water along the growth of two L. vannamei batches (batch-1 and -2) on a total of eight occasions from April 2018 to July 2018 (Table 1). On each occasion, samples were collected randomly from three sites of the aquaculture tank and pooled to generate representative samples. Physicochemical characteristics such as temperature, dissolved oxygen, salinity, and pH were measured on-site using a handheld multi-parameter analyser YSI 556MPS (YSI Inc., Yellow Springs, USA). The concentrations of nitrite (NO₂⁻), nitrate (NO₃⁻), phosphate (PO₄³⁻), and total ammonia-nitrogen (TAN, NH4⁺-N) were determined using a spectrophotometer (DR/2010, HACH Company, USA), following the standard protocol described in our previous study³² (Table 1). The collected samples were immediately transported to the laboratory under ice-cold conditions.

Subsequently, the water samples were centrifuged gently to separate the high-density bioflocs. The supernatant resulting from this centrifugation step was then filtered through 3 µm pore-size membrane filters (Advantec MFS, Inc., Japan) to recover any remaining low-density bioflocs¹⁴. Both fractions were combined and subjected to whole community nucleic acid extraction using the DNeasy PowerWater DNA isolation kit (QIAGEN, Hilden, Germany), as per the manufacturer’s instructions. The extracted metagenomic DNAs were assessed for quality and quantity using 1% agarose gel electrophoresis and a Qubit 4 Fluorometer (Thermo Fisher Scientific, USA), respectively, and preserved at −20 °C until further processing.

Illumina library preparation and the subsequent sequencing followed a standard shotgun metagenomic sequencing protocol, as detailed in a previous study³³. In brief, DNA samples were fragmented by sonication, end-polished, A-tailed, ligated with adapter sequences. The shotgun metagenomic library was then constructed using the Nextera XT library preparation kit (Illumina, San Diego, CA, USA), in accordance with the manufacturer’s guidelines. The resulting libraries were pooled at equimolar concentrations and then sequenced on the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA) at ChunLab, Inc. (Seoul, Republic of Korea) using a paired-end method (150 bp × 2). In total, eight metagenomes, representing FAB community at various growth stages of L. vannamei, were sequenced from a biofloc aquaculture system.

Quality enhancement, taxonomic classification, and assembly of metagenomes

Forward and reverse Illumina raw reads were initially visualized using MultiQC v1.11³⁴, followed by processing through BBduk program from the BBTools suits v39.01³⁵. Adapters were trimmed, contaminants were screened, and short-length reads were removed using the following parameters: k=23, ktrim=r, mink=11, hdist=1, tpe, tbo, ftm=5, qtrim=rl, trimq=20, and minlen=100. The resulting high-quality reads were initially subjected to taxonomic classification against various preconstructed databases (https://benlangmead.github.io/aws-indexes/k2), including RefSeq archaea, bacteria, viruses, plasmids, human, UniVec Core, protozoa, and fungi, using Kraken2 program v2.1.3³⁶.

On the other hand, obtained high-quality reads were assembled into longer fragments using metaSPAdes v3.15.4 with k-mer values of 21, 33, 55, 77, 99, and 127³⁷. Both individual assembly and co-assembly approaches (collectively referred as the “mix-assembly” approach)³⁸ were applied to our dataset. The individual assembly was used to obtain high-quality genomes from fairly-abundant bacterial groups, while the co-assembly approach was employed to recover genomes from less abundant bacteria^39,40. The adapted assembly approaches provided eight individual assemblies and one co-assembly. Finally, we utilized metaQUAST v5.1.0⁴¹ to evaluate quality metrics and statistics of each metagenome assembly.

Reconstruction of MAGs and taxonomic assignment

Contigs with a length >1 kb were binned to recover MAGs using the metaWRAP v1.3.2 pipeline⁴². During the metaWRAP processing, the binning module was deployed to generate the initial bin sets based on reads coverage and tetranucleotide frequencies. Subsequently, the bin_refinement module (parameters: -c 50, -x 10) was employed to recover consolidated sets of bins. The multiple bin sets recovered from all eight individual assemblies and one co-assembly were de-replicated using dRep v3.4.2 with a 95% ANI threshold to remove redundant bins and retain only the highest quality ones³⁹. Default parameters were used for dRep, except for -comp 50. The final non-redundant collection of MAGs, showing medium- to high-quality (completeness ≥50%; contamination ≤10%), was retrieved after a quality evaluation using CheckM2 v1.0.1⁴³, according to the proposed definition of MIMAG²⁷. CheckM2, the program employed here, is renowned for estimating the completeness and contamination of microbial genomes, courtesy of a set of lineage-specific marker genes. Additional quality control measures were enforced to ensure the recruitment of only high-quality MAGS. Specifically, we selected MAGs with a quality score ≥50, calculated by deducting five times contamination from the completeness⁴⁴. In addition, ribosomal RNA genes and transfer RNA genes were detected using Barrnap v0.9 (https://github.com/tseemann/barrnap) and tRNAscan-SE v2.0.9⁴⁵, respectively.

Of a high number of initially reconstructed bins (approximately 950), a total of 444 passed the imposed quality control criteria and therefore were considered as MAGs (see Quality Metrics File). These MAGs were named using the following scheme: the characters preceding the term ‘bin’ represent the assembly from which they were binned (‘1’ to ‘8’ for individual assemblies and ‘Co’ for co-assembly), and the numerical value following the term ‘bin’ corresponds to the number of non-redundant MAGs within each assembly. A comprehensive overview of various statistics, including completeness, contamination, genome size, GC content, positions of the ribosomal RNA genes, and the number of contigs of the recovered 444 MAGs, is detailed in Quality Metrics File and summarized in Fig. 3. Finally, the MAGs were taxonomically assigned against the Genome Taxonomy Database (GTDB; release R207_v2) using the Genome Taxonomy Database toolkit (GTDB-Tk) v2.2.4 (options: --full_tree, --skip_ani_screen)⁴⁶. The entire bioinformatics roadmap used for the reconstruction and taxonomic classification of MAGs is illustrated in Fig. 1.

Data Records

The shotgun metagenome reads generated in this study are publicly available on the NCBI Sequence Reads Archive (SRA) under BioProject identifier PRJNA967453⁴⁷ and accession number SRP436034⁴⁸. The reconstructed MAGs have been deposited in the DDBJ/ENA/GenBank database under accession numbers JAUHVK000000000–JAUIML000000000, and their fasta files have been made accessible through figshare⁴⁹. Detailed information pertaining to all the reconstructed MAGs, including their corresponding BioSample and GenBank accession numbers, is detailed in Quality Metrics File⁴⁹.

Technical Validation

The removal of contaminant bases, adapter sequences, and short-length reads was performed using BBduk. The final read sets were then visualized using MultiQC. We selected only those reads that had a quality score ≥30, suggesting that the majority of analysed metagenome reads were of high-quality. In adherence to the MIMAG guidelines, the quality of recovered MAGs was assessed using CheckM2 for their completeness and contamination. We only selected those MAGs that met the specified quality thresholds (as presented in Quality Metrics File). As an additional measure of quality, we identified the presence of tRNA and rRNA genes in all MAGs using tRNAscan-SE and Barrnap, respectively.

Code availability

All software used, with versions and non-default parameters, is described precisely and referenced in the method section to ensure easy access and reproducibility. For further transparency, the complete set of codes employed throughout the bioinformatics workflow have been uploaded to a GitHub repository at https://github.com/Meora-Rajeev/Biofloc-Metagenomics⁵⁰.

References

Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
ADS CAS PubMed Google Scholar
Sharon, I. & Banfield, J. F. Genomes from metagenomics. Science 342, 1057–1058 (2013).
ADS CAS PubMed Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
ADS CAS PubMed PubMed Central Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
CAS PubMed Google Scholar
Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).
ADS CAS PubMed Google Scholar
Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).
ADS CAS PubMed Google Scholar
Wilkins, L. G., Ettinger, C. L., Jospin, G. & Eisen, J. A. Metagenome-assembled genomes provide new insight into the microbial diversity of two thermal pools in Kamchatka, Russia. Sci. Rep. 9, 3059 (2019).
ADS PubMed PubMed Central Google Scholar
Chen, C. et al. Expanded catalog of microbial genes and metagenome-assembled genomes from the pig gut microbiome. Nat. Commun. 12, 1106 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Xu, B. et al. A holistic genome dataset of bacteria, archaea and viruses of the Pearl River estuary. Sci. Data 9, 49 (2022).
CAS PubMed PubMed Central Google Scholar
Nathani, N. M. et al. 309 metagenome assembled microbial genomes from deep sediment samples in the Gulfs of Kathiawar Peninsula. Sci. Data 8, 194 (2021).
PubMed PubMed Central Google Scholar
Ye, L., Mei, R., Liu, W.-T., Ren, H. & Zhang, X.-X. Machine learning-aided analyses of thousands of draft genomes reveal specific features of activated sludge processes. Microbiome 8, 1–13 (2020).
Google Scholar
Singleton, C. M. et al. Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat. Commun. 12, 2009 (2021).
CAS PubMed PubMed Central Google Scholar
Weigel, B. L., Miranda, K. K., Fogarty, E. C., Watson, A. R. & Pfister, C. A. Functional insights into the kelp microbiome from metagenome-assembled genomes. mSystems 7, e0142221 (2022).
PubMed Google Scholar
Wei, G. et al. Prokaryotic communities vary with floc size in a biofloc-technology based aquaculture system. Aquaculture 529, 735632 (2020).
CAS Google Scholar
Crab, R., Defoirdt, T., Bossier, P. & Verstraete, W. Biofloc technology in aquaculture: beneficial effects and future challenges. Aquaculture 356, 351–356 (2012).
Google Scholar
Guo, H. et al. Effects of carbon/nitrogen ratio on growth, intestinal microbiota and metabolome of shrimp (Litopenaeus vannamei). Front. Microbiol. 11, 652 (2020).
PubMed PubMed Central Google Scholar
Robles‐Porchas, G. R. et al. The nitrification process for nitrogen removal in biofloc system aquaculture. Rev. Aquac. 12, 2228–2249 (2020).
Google Scholar
Abakari, G., Luo, G. & Kombat, E. O. Dynamics of nitrogenous compounds and their control in biofloc technology (BFT) systems: A review. Aquac. Fish. 6, 441–447 (2021).
Google Scholar
Kumar, V., Roy, S., Behera, B. K. & Das, B. K. Biofloc microbiome with bioremediation and health benefits. Front. Microbiol. 12, 3499 (2021).
Google Scholar
Cardona, E. et al. Bacterial community characterization of water and intestine of the shrimp Litopenaeus stylirostris in a biofloc system. BMC Microbiol. 16, 1–9 (2016).
Google Scholar
Deng, M. et al. The effect of different carbon sources on water quality, microbial community and structure of biofloc systems. Aquaculture 482, 103–110 (2018).
CAS Google Scholar
Huang, L. et al. The bacteria from large-sized bioflocs are more associated with the shrimp gut microbiota in culture system. Aquaculture 523, 735159 (2020).
CAS Google Scholar
Poretsky, R., Rodriguez-R, L. M., Luo, C., Tsementzi, D. & Konstantinidis, K. T. Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PLoS One 9, e93827 (2014).
ADS PubMed PubMed Central Google Scholar
Durazzi, F. et al. Comparison between 16S rRNA and shotgun sequencing data for the taxonomic characterization of the gut microbiota. Sci. Rep. 11, 3030 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Martínez‐Porchas, M. & Vargas‐Albores, F. Microbial metagenomics in aquaculture: a potential tool for a deeper insight into the activity. Rev. Aquac. 9, 42–56 (2017).
Google Scholar
Meenakshisundaram, M., Sugantham, F., Muthukumar, C. & Chandrasekar, M. S. Metagenomic characterization of biofloc in the grow‐out culture of Genetically Improved Farmed Tilapia (GIFT). Aquac. Res. 52, 4249–4262 (2021).
CAS Google Scholar
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
CAS PubMed PubMed Central Google Scholar
Baptista, R. P. et al. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231. Microb. Genom. 4 (2018).
Chen, X. et al. Metagenomic analysis of bacterial communities and antibiotic resistance genes in Penaeus monodon biofloc-based aquaculture environments. Front. Mar. Sci. 8, 762345 (2022).
Google Scholar
Kim, S. K. et al. Exploring bacterioplankton communities and their temporal dynamics in the rearing water of a biofloc-based shrimp (Litopenaeus vannamei) aquaculture system. Front. Microbiol. 13, 995699 (2022).
PubMed PubMed Central Google Scholar
Rajeev, M., Jung, I., Song, J., Kang, I. & Cho, J. C. Comparative microbiota characterization unveiled a contrasting pattern of floc-associated versus free-living bacterial communities in biofloc aquaculture. Aquaculture 577, 739946 (2023).
CAS Google Scholar
Moon, K., Kim, S., Kang, I. & Cho, J. C. Viral metagenomes of Lake Soyang, the largest freshwater lake in South Korea. Sci. Data 7, 349 (2020).
CAS PubMed PubMed Central Google Scholar
Nho, S. W. et al. Taxonomic and functional metagenomic profile of sediment from a commercial catfish pond in Mississippi. Front. Microbiol. 9, 2855 (2018).
PubMed PubMed Central Google Scholar
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
CAS PubMed PubMed Central Google Scholar
Bushnell, B. BBTools software package. http://sourceforge.net/projects/bbmap, 578–579 (2014).
Lu, J. et al. Metagenome analysis using the Kraken software suite. Nat. Protoc. 17, 2815–2839 (2022).
CAS PubMed PubMed Central Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
CAS PubMed PubMed Central Google Scholar
Delgado, L. F. & Andersson, A. F. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome 10, 1–11 (2022).
Google Scholar
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
CAS PubMed PubMed Central Google Scholar
Saheb Kashaf, S., Almeida, A., Segre, J. A. & Finn, R. D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021).
CAS PubMed Google Scholar
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
CAS PubMed Google Scholar
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 1–13 (2018).
Google Scholar
Chklovski, A., Parks, D. H., Woodcroft, B. J., & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 1–10 (2023).
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
CAS PubMed Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
CAS PubMed PubMed Central Google Scholar
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2020).
CAS Google Scholar
Rajeev, M. et al. Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system, BioProject, https://identifiers.org/ncbi/bioproject:PRJNA967453 (2023).
Rajeev, M. et al. Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system. Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP436034 (2023).
Rajeev, M. et al. Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system. Figshare https://doi.org/10.6084/m9.figshare.23599461 (2023).
Rajeev, M. et al. Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system. GitHub https://github.com/Meora-Rajeev/Biofloc-Metagenomics (2023).

Download references

Acknowledgements

This research was supported by High Seas Bioresources Program of Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries (KIMST-20210646) and the Mid-Career Research Program (NRF-2022R1A2C3008502) through the National Research Foundation (NRF) funded by the Ministry of Sciences and Information and Communications Technology, Korea.

Author information

Authors and Affiliations

Department of Biological Sciences and Bioengineering, Inha University, Inharo 100, Incheon 22212, Republic of Korea
Meora Rajeev, Ilsuk Jung & Jang-Cheon Cho
Institute for Specialized Teaching and Research, Inha University, Inharo 100, Incheon 22212, Republic of Korea
Meora Rajeev
Center for Molecular and Cell Biology, Inha University, Inharo 100, Incheon 22212, Republic of Korea
Yeonjung Lim, Suhyun Kim, Ilnam Kang & Jang-Cheon Cho

Authors

Meora Rajeev
View author publications
You can also search for this author in PubMed Google Scholar
Ilsuk Jung
View author publications
You can also search for this author in PubMed Google Scholar
Yeonjung Lim
View author publications
You can also search for this author in PubMed Google Scholar
Suhyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ilnam Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jang-Cheon Cho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.R. and J.-C.C. conceptualized and designed the study. I.J., Y.L. and S.K. collected the samples, analysed physicochemical parameters, and extracted DNA for Illumina sequencing. M.R., with the assistance of Y.L., S.K. and I.K., performed bioinformatics analyses and constructed figures. M.R., I.K. and J.-C.C. wrote the manuscript. J.-C.C. supervised the study. All authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Jang-Cheon Cho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rajeev, M., Jung, I., Lim, Y. et al. Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system. Sci Data 10, 707 (2023). https://doi.org/10.1038/s41597-023-02622-0

Download citation

Received: 13 July 2023
Accepted: 06 October 2023
Published: 17 October 2023
DOI: https://doi.org/10.1038/s41597-023-02622-0