Introduction

Tilapia lake virus disease (TiLVD) is a highly contagious viral disease that affects tilapia, an important and affordable fish protein source produced in aquaculture globally. TiLV has rapidly spread since, now reported in over 18 countries1,2,3,4, posing a significant threat to tilapia production and the livelihoods of farmers who rely on tilapia farming for income and food security5. To mitigate the introduction and spread of TiLV and its impacts, it is imperative to implement surveillance, improved biosecurity measures, farming practices and continuous development of effective diagnostic and rapid sequencing methods.

The TiLV genome consists of 10 segments that complicate its genome sequencing process thus precluding high mass-scale genome sequencing efforts to be undertaken. The first TiLV genome was sequenced using a shotgun transcriptome approach on an Illumina sequencing platform6. The genomes of TiLV were also sequenced using the Sanger sequencing technique7,8. Recently, a similar approach using shotgun metagenomics was used to generate the near complete genome of a TiLV isolate causing mass-mortality in tilapia farmed in Bangladesh9. Shotgun metagenomics involved the random sequencing of all RNA fragments, i.e., TiLV-positive tilapia liver samples without any enrichment of mRNA; followed by bioinformatics analysis to identify and assemble the 10-segments of TiLV genome present in the sample. This approach is not scalable as the RNA library preparation cost is higher and most of the sequencing data will belong to the host, requiring high sequencing depth to successfully assemble the TiLV-derived contigs. To address these challenges and improve sequencing effectiveness, some approaches have been explored. One such approach involves the propagation of viruses in cell culture prior to sequencing10. Additionally, enrichment through single RT-PCR amplification of the TiLV 10 segments before subjecting them to Illumina sequencing has also been employed3. Nevertheless, the use of Illumina technology necessitates a significant investment in infrastructure, which hinders rapid on-site deployment and real-time sequencing.

In recent years, Oxford Nanopore Technologies (ONT) have become more commonly used to sequence part(s) or whole genomes of pathogens affecting aquatic animals11. Nanopore sequencing offers several advantages, including high throughput, portability, cost-effectiveness, and real-time sequencing, which can greatly facilitate the detection and sequencing of viral genomes in remote locations12. Rapid amplicon-based reverse transcription polymerase chain reaction (RT-PCR) assays coupled with Nanopore technologies can provide a sensitive and specific means of detecting and genotyping TiLV based on segment 1 in field samples. This approach allows for fine epidemiological surveillance and timely management and control of outbreaks13. While Nanopore has been used to sequence the genomes of non-segmented fish viruses such as infectious spleen and kidney necrosis virus (ISKNV)12, salmonid alphavirus (SAV) and segment 5 and 6 of infectious salmon anaemia virus (ISAV)11, it has not yet been applied to the TiLV genome.

In this study, for the first time, we designed a new method for TiLV whole genome sequencing using singleplex and multiplex amplicon-based RT-PCR protocols coupled with Minion Nanopore sequencing. These novel tools enable real-time diagnosis and characterization of TiLV genomes, thereby facilitating improved surveillance and effective control measures in tilapia aquaculture.

Methods

Ethics declarations

No animal ethical approval was required for this study as it involved the utilization of archived preserved fish tissue samples. These tissues were obtained from experimentally infected fish and constituted our archived samples from a previous study conducted by our research group. All methods were performed in accordance with the relevant guidelines and regulations. The animal ethics committee of the Faculty of Science at Mahidol University had approved all the experimental protocols of that previous study (ethics approval #MUSC62-017-481). For further details, please refer to the corresponding publication available at Dong et al. 202014. Additionally, the tissues obtained from naturally infected fish used in this study were previously provided to our laboratory, as described in our publication by Taengphu et al. 202215.

Primer design

Primer sequences targeting all 10 complete genome segments of TiLV were manually designed with reference to the TiLV genome of the Israel strain Til-4-2011 (GenBank accession no. KU751814 to KU751823)6 (Table 1). The primer sequences were carefully selected at the outermost regions of the 5′ and 3′ terminal ends, enabling the amplification of full-length genomic segments, thereby ensuring maximal preservation of genetic information. In addition to our primary objective of achieving full-length amplification, we followed general criteria for PCR primer design as outlined by ThermoFisher Scientific (2019)16. These criteria encompassed a primer length of 22–25 oligonucleotides in this study, ensuring that each primer pair had similar melting temperature (Tm) and %GC content.

Table 1 TiLV primers designed in this study were based on the TiLV genome isolate of the Israeli strain Til-4-2011 (GenBank accession no. KU751814 to KU751823).

Samples, total RNA extraction, and TiLV quantification

RNA templates (N = 10) for the amplification and analysis of the TiLV genome sequence were prepared from various sources, including tissues of TiLV-infected Nile tilapia (Oreochromis niloticus) and red tilapia (Oreochromis spp.), TiLV isolates propagated in E-11 cell culture, and a concentrated water sample from a tilapia rearing river (Table 2). RNA from fish tissues (liver, kidney, spleen, and/or brain) was extracted using Trizol reagent (Invitrogen), while RNA from TiLV-infected E-11 cell culture was isolated using MagTec™ ViroNA Nano-magnetic beads for DNA/RNA virus isolation (Bioentist), following the manufacturer’s instructions. The virus in the river water sample was concentrated using the iron flocculation method and then filtered through a 0.4-μm pore size filter using a vacuum pump15. The filters that trapped the flocculate were subsequently used for nucleic acid extraction with the Patho Gene-spin DNA/RNA extraction kit (iNtRON Biotechnology). The obtained nucleic acids were quantified using spectrophotometry, measuring absorbance at OD260 nm and OD280 nm. TiLV quantification by probe-based qPCR assays of the 10 samples were performed based on segment 915 and segment 1 (this study) (Supplemental Table 1).

Table 2 Background information of TiLV strains reported in study as well as strains with publicly available genomes.

Development of singleplex one-step RT-PCR for the enrichment of TiLV genome

The efficiency of the designed TiLV primers and their optimal annealing temperatures (Ta) were investigated by one-step gradient RT-PCR assays with the range of Ta from 50 to 60 °C. Singleplex RT-PCR (sPCR) reaction mixture of 25 µL were prepared and subjected to amplifications as outlined in Supplemental Table 1 Amplified products of the singleplex RT-PCR (sPCR) were analyzed by agarose gel electrophoresis. Four RNA templates were used in this assay (Table 2).

Development of multiplex RT-PCR to streamline the PCR enrichment of TiLV genome

Two multiplex PCR (mPCR) reactions were developed to reduce the number of PCR reactions from 10 to only two reactions per sample. The primers were divided into two sets based on their annealing temperatures similarity, and we also considered the separability of product sizes on agarose gel electrophoresis. Initial trials with different primer mixes and amplification conditions were conducted (data not shown). Based on the results, reaction 1 employs primers for segment 1, 2, 3, 4, 5 and 8 with Ta at 52 °C while reaction 2 uses primers for segment 6, 7, 9 and 10 with Ta at 60 °C. Detailed reaction mixtures and amplification conditions are shown in Supplemental Table 1. Then, various PCR conditions were further tested by varying the dNTPs (200–500 nM), MgSO4 (1.6–1.8 mM), enzyme (1–2.5 μl) and primer concentrations (100–300 nM) to obtain optimal PCR outcomes. Nine RNA templates were used in this assay (Table 2).

Nanopore sequencing

PCR products from the singleplex (15 µl of each 10 PCR reactions) and the multiplex (40 µl of each 2 PCR reactions) were pooled followed by PCR clean up using NucleoSpin Gel and PCR Clean-up column (Macherey–Nagel) and quantification with Qubit dsDNA Broad Range kit (Invitrogen). Approximately 250 ng of the purified and pooled amplicons was used as the template for library preparation using the native barcoding expansion 1–12 kit (EXP-NBD104) according to the manufacturer’s instructions. The prepared library was loaded onto a R9.4.1 Flongle and sequenced for 24 h. Basecalling of the fast5 raw signals used Guppy v4.4.1 in super accuracy mode to generate the fastq sequences for subsequent bioinformatics analysis.

Reference-based genome assembly of TiLV samples

Raw reads were quality- and length-filtered using NanoFilt (qscore > 9 and length > 250 bp). The raw and filtered read statistics were generated using seqkit v.2.1.0. Reference-based genome assembly of the TiLV was performed according to the ARTIC pipeline (https://github.com/artic-network/fieldbioinformatics)17. This pipeline is an open-source software that integrates a series of tools for base-calling, quality control, read trimming, reference-based mapping, variant calling, consensus sequence generation, and annotation. Briefly, the filtered reads were aligned to the reference TilV genome using Minimap2 v2.1718 followed by variant calling using Medaka (r941_min_sup_g507) (https://github.com/nanoporetech/medaka). The variants identified were subsequently filtered based on several criteria, including the quality score, depth of coverage, strand bias, and frequency of occurrence. In addition, genomic regions with read depth of lower than 20× were masked prior to generating the final consensus sequence for each sample. Each assembled viral segment from each sample was analyzed with QUAST v519 to calculate the percentage of the assembled viral genome that is represented by gaps (Ns), providing insights into the PCR and pooling efficiency.

Phylogenetic analysis

The assembled viral segments with less than 20% gap were selected and combined with publicly available TiLV genomes for phylogenetic analysis (Table 2). The DNA sequences of the viral genome segments from each sample were extracted and grouped based on their segment number followed by alignment with MAFFT v8 (-adjustdirection -maxiterate 1000 -localpair)20. All 10 individual alignments were subsequently concatenated and used to reconstruct a maximum likelihood tree using FastTree 221. The resulting tree was visualized and annotated using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Results

Ten primer pairs for the recovery of complete TiLV genome from various isolation sources

A total of 10 primer pairs were designed with their PCR condition optimized (Table 1 and Supplemental Table 1) to amplify the complete segment of one of the ten TiLV genomic segments. Intact and specific band corresponding to the respective size of the TiLV genomic segments were successfully obtained when the total RNA extracted from TiLV-infected tilapia, TiLV-infected E-11 cell line, and concentrated river water sample were used as the template for RT-PCR (Fig. 1). However, the PCR band intensity for segment 4 (1250 bp) of the water samples is substantially lower compared to the other segments, requiring another round of PCR (Table 3, Supplemental Table 1).

Figure 1
figure 1

Gel electrophoresis (original photo) results of one-step RT-PCR amplification of 10 genomic segments of TiLV. Representative results from sample D1-2 are shown. A 1% agarose gel was used to visualize the PCR products, with expected band sizes indicated at the bottom of the gel. M, DNA marker (New England Biolabs). The original unlabeled image can be found in the supplemental materials.

Table 3 PCR outcome, sequencing, and alignment statistics of 10 individual TiLV segments for each sample used in this study.

A streamlined two-tube multiplex RT-PCR for TiLV genome amplification

To minimize the risk of human error associated with handling multiple singleplex PCR reactions (10 per template), and to reduce chemical costs, a two-tube multiplex RT-PCR was designed (Supplemental Table 1). The addition of MgSO4 (increased magnesium ion concentration) was crucial for improved sensitivity (stronger band intensity) while an increase in dNTP concentration does not improve PCR efficiency (Supplemental Fig. 1). In addition, increasing the amount of RT/Taq enzyme mix was also shown to slightly improve over band intensity (Supplemental Fig. 1). As a result, the concentration of MgSO4 and RT/Taq enzyme mix was increased in further multiplex RT-PCR assays (Supplemental Fig. 2). After applying the final mPCR conditions to the 10 RNA templates, it was not surprising to observe that samples with high TiLV loads (as determined by qPCR assays) (Table 2) produced the expected six bands and four bands in mPCR reaction 1 and 2, respectively (Fig. 2, Table 3). These samples included A1-3, B1-1, B1-2, D1-2, FM2, and the cell line. In contrast, samples NK and Ri, which had lower TiLV loads, showed some missing amplicons, while sample A1-2 exhibited no observable bands in either multiplex RT-PCR reactions (Fig. 2, Table 3). It is important to note that mPCR reaction 1 is less sensitive than mPCR reaction 2 (Fig. 2) and inconsistently produces observable bands when the Cq value of the tested sample exceeds 19.

Figure 2
figure 2

Amplification results of multiplex PCR (mPCR) for TiLV segments. Two separate reactions (Reaction#1 and Reaction#2) were used to amplify 10 TiLV segments. Reaction#1 amplified segments 1, 2, 3, 4, 5, and 8 (A), while Reaction#2 amplified segments 6, 7, 9, and 10 (B). A 2-log DNA marker (New England Biolabs) was used to visualize the PCR products. −ve, no template control. Codes of samples are listed in Table 2. The white dash lines indicate where two images were connected since one lane (Lane X) was excluded from the original gel photo (Supplemental Fig. 3). The original unlabeled image can be found in the supplemental materials.

Rapid and on-site sequencing of the TiLV genome using oxford nanopore

A total of 413,379 demultiplex raw reads with an accumulative length of 238,983,973 bp were generated from the Flongle sequencing runs (Supplemental Table 2). After filtering (qscore > 9, length > 250 bp), only 194,564 reads (147,415,018 bp) remain. On average more than 35% data reduction was observed across the samples with sample A1-2 showing the largest reduction (67.7%, from 28 to 9.2 Mb) in the amount of usable data. Overall, the Arctic-based reference genome assembly could successfully assemble the viral segments 6,7,8,9,10 for all samples with more than 99–100% completeness except for samples A1-2_m and Nk_m that showed only a slightly lower completeness of 98% for segments 6 and 7 (Table 3). On the contrary, several segments from the first set of multiplex RT-PCR showed reduced completeness (high % of gaps in sequence) particularly for samples with high Cq. The reduced completeness is a direct result of the low read depth (< 20) observed for the viral segments in the respective samples. Generally, any viral segment with a read depth of more than 50× will produce a highly complete assembly that can be used in subsequent analysis (Table 3).

High phylogenetic diversity among Thai TiLV strains

The total alignment length after the concatenation of 10 individually aligned TiLV viral segments is 10,396 bp. Using a midpoint rooting approach, multiple clades with high SH-like support values were observed in the maximum likelihood tree (Fig. 3). Nanopore-sequenced samples from either the pooled singleplex (N = 4 templates) or multiplex amplicons (N = 9 templates) were always placed in the same cluster, consistent with their identical sample origin (Table 2). This observation suggests that accurate genome sequences can be obtained using either singleplex or multiplex amplicon enrichment methods. TiLV strains from Peru, Ecuador, Israel, and India were clustered together and this subclade subsequently formed a sister group with slightly lower support with two earlier TiLV strains from Thailand isolated in 2013 and 2014 to form Clade A (Fig. 3). Clade B consisting entirely of Thai TiLV strains from 2015 to 2016 formed a sister group with Clades A. However, a majority of Thai TiLV strains that were reported in 2018 onwards showed yet another distinct clustering as indicated by their phylogenetic placement in Clades C and E with most of the sequences reported in this study belonging to subclade E1. On the other hand, subclade E2 consists of a mixture of Thai and USA TiLV strains. The currently sampled Bangladeshi TiLV strains consist of only single clade despite being isolated 2 years apart (2017 and 2019) while the Vietnamese strains are highly divergent even between themselves (Pairwise nucleotide similarity of only 92%), possibly representing novel strains of TiLV.

Figure 3
figure 3

Maximum likelihood tree showing the evolutionary relationships of TiLV strains analyzed in this study. Thirteen samples (10 unique strains) with ONT-TH prefix and publicly available genomes were used. The blue colored tip labels indicate the TiLV strains reported in this study. SH-like local support values and branch length indicate the number of substitutions per site. NT: Nile tilapia; RT: Red tilapia; HT: Hybrid tilapia.

Discussion

Nanopore sequencing technology is known for its capability to sequence a broad range of DNA inputs including non-target bands. To ensure the specificity of our study, we employed an amplicon-based approach with specific primers. However, it's important to note that this approach does not eliminate completely the potential for sequencing non-target DNA bands. To address this issue, we implemented an in-silico alignment of the generated reads to the TiLV virus reference. This strategic step allowed us to effectively filter out non-target reads and data, thereby enhancing the accuracy and specificity of our sequencing results.

In this study, we report the successful recovery of the complete TiLV genome using a novel approach that combines singleplex PCR and multiplex PCR and Nanopore amplicon sequencing. The approach accommodates a range of biological resources, including fish tissues, water samples, and cell cultures, with appropriate sample preparation steps tailored to the nature of each sample type. Our findings indicate that mPCR is particularly effective for samples with high TiLV loads. Therefore, we recommend utilizing mPCR for heavily infected TiLV samples, while sPCR can be employed for lightly infected samples. Notably, since the maximum length of the viral segment is 1641 bp, our approach obviates the need for a PCR-tiling strategy typically used for recovering large non-segmented viruses such as ISKNV and SAR-CoV-212,22. Moreover, our method offers the added advantage of visualizing PCR efficiency and specificity for each viral segment on a gel, as each fragment has a different size.

Our current multiplex PCR appears to show lower efficiency for Multiplex reaction 1, particularly in samples with high Cq values. The variation in sequencing depth among different segments may be attributed to amplicon input and amplicon size and can result in reduced completeness and gaps in the consensus. It is evident that Multiplex reaction 1 predominantly amplifies larger segments of the TiLV genome, specifically segment 1–5. Larger amplicon sizes can pose challenges in terms of PCR sensitivity, particularly when dealing with samples featuring lower viral loads. To improve the multiplex RT-PCR amplification uniformity and efficiency, the performance of poorly performing primers can be enhanced by optimizing their concentrations or adjusting their annealing temperature by altering their sequence length.

Nevertheless, it remains unknown whether individual genomic segments of TiLV exhibit variations in expression levels and timing, akin to the patterns observed in other viruses23,24. Additionally, the Multiplex set 1 reaction, which amplifies TiLV genomic segments 1–5 and 8, can be further split into two pools (e.g., 1A and 1B) that will amplify an average of three viral segments each. In addition, the use of a more processive High-Fidelity Taq polymerase such as Q5 from New England Biolabs that was currently used for high-degree multiplex tiling PCR of the SAR-CoV19 and ISKNV viral genomes is also worth exploring12,17. It is also worth noting that despite the absence of visible bands for some of the samples, partial or even near-complete genome assembly was still attainable using our sequencing pipeline. It is possible that the amount of PCR product is below the detection limit of gel-staining dye at its loading concentration although it is in fact present in the samples as revealed from sequences information. To streamline future work in high throughput sequencing of TiLV using this approach, gel visualization may be skipped once a lab can consistently reproduce the PCR outcome with evidence from sequencing data.

Nanopore sequencing is an attractive approach for viral amplicon sequencing due to its portability, convenience, and speed13. Our method, which utilizes Nanopore sequencing, eliminates the need for additional fragmentation steps, allowing motor proteins to be directly ligated to amplicons for native sequencing. On the same day, tens of samples can be prepared and sequenced, and the low computing requirements of the ARTIC protocol enable swift genome assembly on a laptop computer, without requiring access to a dedicated server. To further streamline TiLV genome sequencing on the Nanopore platform, we suggest designing multiplex primers that incorporate a partial adapter suitable for Nanopore sequencing25. This enables cost-effective PCR-based barcoding that is both efficient and scalable. In cases of low data output, samples can be re-pooled and sequenced on a separate flow cell to achieve the necessary sequencing depth for genome assembly.

By utilizing R9.4.1 sequencing chemistry with super accuracy mode and implementing the ARTIC pipeline, we successfully recovered TiLV genomes that are highly suitable for phylogenetic inference. Our study revealed the presence of TiLV in both fish and environmental water samples from the same farm, which clustered together in Clade E1. Our approach, combining the previously reported water sample TiLV concentration method15 with a singleplex RT-PCR amplicon-based Nanopore sequencing strategy, allowed for direct recovery of TiLV genomes from water samples. This innovative method has significant implications for non-lethal, environmental DNA/RNA monitoring, as it eliminates the need for sacrificing fish for genomic analysis. The potential applications of our innovative approach extend to TiLV detection and genome sequencing in fish samples across wider geographical areas. Additionally, for cell culture samples, our approach can serve both as a confirmatory diagnosis tool and allow us to investigate genetic stability of TiLV in relation to potential downstream events, such as viral virulence and genetic changes over time. To gain a more comprehensive understanding of TiLV genetic variations and dynamics, further research is essential, involving the collection of additional samples and more extensive comparative studies.

Our analysis suggests that, in addition to country of origin, the genetic background of the hosts may also contribute to the clustering patterns observed within Clade E. With few exceptions, our results indicate phylogenetic grouping of Thai TiLV strains (E1: red tilapia, E2: Nile tilapia), suggestive the likelihood of multiple introductions into the country or rapid viral evolution. The presence of the Thai isolates in multiple clusters indicates a significant genetic diversity within the virus. RNA viruses are known for their high mutation rate attributed to the absence of proofreading ability in RNA polymerases26, allowing them to undergo rapid evolutionary changes.

Furthermore, our findings based on the current genomic sampling contradict the initial hypothesis previously put forth on Tilapia trade movement, which was based on a small genome-based phylogenetic tree with limited supported clustering of Bangladeshi and Thai TiLV strains9. Specifically, we found no grouping of Thai strains within the Bangladesh clade (Fig. 3, Clade D), thereby reducing support for the previously proposed hypothesis.

Although viral whole genome sequencing of TiLV is now technically feasible, the current representation of its genome in public databases is limited, making it difficult to infer its evolutionary relationships. Given the significant impact of TiLV on the tilapia aquaculture industry, there is a critical need for more robust genomic surveillance to facilitate better management and tracking in relevant regions. Our method can be used in future studies to generate more representative genomes from Vietnam. Our proposed multiplex PCR Nanopore-based amplicon sequencing approach offers a promising solution, as it enables cost-effective and high-throughput sequencing of TiLV virus genomes. This strategy is poised to revolutionize the field of advanced diagnostics and surveillance of multiple pathogens concurrently from biological samples of animals as well as environmental DNA/RNA of pathogens in water, within a single assay. This strategy eliminates the need for separate reactions and reduces the overall cost and time required for sequencing multiple samples. We anticipate that our approach will provide a valuable resource for ongoing efforts to understand the molecular epidemiology and evolution of TiLV, with important implications for disease control and prevention (e.g., vaccination). The implementation of our method in resource-limited regions is expected to face several challenges, e.g., limited access to finance and skilled labor, procurement, and logistical difficulties. As a result, the primary focus should be on enhancing the capacity of local operators by fostering collaboration with local institutions. This collaborative effort will leverage both existing and new resources to kickstart pilot programs aimed at refining the implementation strategy within real-world settings.