Introduction

Genome sequencing of viruses has been used to study the spread of disease in outbreaks1. Real-time genomic surveillance is important in managing viral outbreaks, as it can provide insights into how viruses transmit, spread and evolve1,2,3,4. Such work depends on rapid sequencing of viral material directly from clinical samples—i.e., without the need to isolate the virus in pure culture. During the Ebola virus epidemic of 2013–2016, prospective viral genome sequencing was able to provide critical information on virus evolution and help inform epidemiological investigations3,4,5,6. Sequencing directly from clinical samples is faster, less laborious and more amenable to near-patient work than time-consuming culture-based methods. Metagenomics, the process of sequencing the total nucleic acid content in a sample (typically cDNA or DNA), has been successfully applied to both virus discovery and diagnostics7,8,9. Metagenomic approaches have seen rapid adoption over the past decade, fueled by relentless improvements in the yield of high-throughput sequencing instruments5,10,11,12. Whole-genome sequencing of Ebola virus directly from clinical samples without amplification was possible because of the extremely high virus copy numbers found in acute cases13,14,15. However, direct metagenomic sequencing from clinical samples poses challenges with regard to sensitivity: genome coverage may be low or absent when attempting to sequence viruses that are present at low abundance in a sample with high levels of host nucleic acid background.

Development of the protocol

During recent work on the Zika virus epidemic16, we found that it was difficult to generate whole-genome sequences directly from clinical samples using metagenomic approaches (Table 1). These samples had cycle threshold (Ct) values between 33.9 and 35.9 (equivalent to 10–48 genome copies per microliter). Before sequencing, these samples were depleted of human rRNA and prepared for metagenomic sequencing on the Illumina MiSeq platform as previously described2,17. In these cases, sequences from Zika virus comprised <0.01% of the data set, resulting in incomplete coverage. Greater coverage and depth are critical for accurate genome reconstruction and subsequent phylogenetic inference. In addition, there are substantial sequencing, analysis and storage costs associated with generating large sequencing data sets; therefore, metagenomic approaches currently do not lend themselves to the cost-effective use of lower-throughput portable sequencing devices such as the Oxford Nanopore MinION.

Table 1 Results of metagenomic sequencing of five Zika-positive clinical samples collected from Colombia in January 2016 (unpublished data, K.G.A. and N.D.G.).

To generate complete viral genome coverage from clinical samples in an economic manner, target enrichment is often required18. Enrichment can be achieved directly through isolation in culture or the use of oligonucleotide bait probes targeting the virus of interest, or indirectly via host nucleic acid depletion. Amplification may also be required to generate sufficient material for sequencing (>5 ng for typical Illumina protocols and 100–1,000 ng for MinION). PCR can provide both target enrichment and amplification in a single step, and is relatively cheap, available and fast as compared with other methods. To generate coding-sequence complete coverage, a tiling amplicon scheme is commonly used19,20,21. During our work with Ebola virus, we were able to reliably recover >95% of the genome by sequencing 11 long amplicons (1–2.5 kb in length) on the MinION5.

The likelihood of long fragments being present in the sample, however, reduces with lower virus abundance. Therefore, we anticipated that, for viruses such as Zika that are present at low abundance in clinical samples, we would be more likely to amplify shorter fragments. As an extreme example of this approach, a recent approach termed 'jackhammering' was used to amplify degraded HIV-1 samples stored for >40 years; this approach used 200–300 nt amplicons to help maximize sequence recovery22. Using shorter amplicons necessitates a larger number of products to generate a tiling path across a target genome. Doing this in individual reactions requires a large number of manual pipetting steps and therefore increases the potential for mistakes, with a heightened risk of cross-contamination, as well as a greater cost in time and consumables. To solve these problems, we designed a multiplex assay to carry out tens of reactions in an individual tube. This method has been subsequently used to perform Zika sequencing in order to understand the spread of Zika virus in the Americas16,23,24,25,26. Our resulting step-by-step protocol, described here, allows any researcher to successfully amplify and sequence viruses of low abundance directly from clinical samples. The method also has other potential uses that are not demonstrated here. One potential application is multilocus sequencing typing approaches, which could be carried out by amplifying conserved genes from bacteria, fungi and yeasts. Simultaneously, antibiotic-resistance-determining genes or key virulence genes could also be targeted in the same assay. The scheme could also be used to sequence chloroplast and mitochondrial genomes.

Comparison with other approaches

The three most common approaches for sequencing viruses are metagenomic sequencing, PCR amplicon sequencing and target enrichment sequencing, recently reviewed in detail by Houldcroft et al.27. The main benefits of the PCR-based approach described here are cost and sensitivity. In theory, both PCR and cell culture require only one viral copy, making them both exquisitely sensitive. In practice, however, the reaction conditions do not allow single-genome amplification, and, typically, multiple starting molecules are required. PCR also has limited sensitivity in cases in which the template sequence is divergent from the expected because of primer-binding kinetics. However, in an outbreak situation in which isolates are highly related, and low cost per sample and rapid turnaround time are required, PCR is particularly suitable. Sequencing amplicons on the Oxford Nanopore MinION is a popular method for determining viral genomes and has been used for diverse viruses, including Ebola, influenza and poxvirus, using either single primer pair reactions generating long amplicons (>1 kb) or multiple reactions that are pooled before sequencing5,28,29,30. However, these approaches are laborious to scale up when many small amplicons are required (because of low viral copy numbers), or when multiple samples are sequenced on a single sequencing run, as in this protocol.

The most similar alternative approach to the one described here is AmpliSeq (Life Technologies), which was previously used for Ebola sequencing on the Ion Torrent PGM6. However, this method is specific to the Ion Torrent platform, and primer schemes must be ordered directly from the manufacturer; thus, it may consequently be more expensive per sample. Alternative software packages for designing primer schemes are available, some of which cater specifically to multiplex or tiling amplicon schemes20,21,31,32, and these may perform better when dealing with divergent genomes because of an increased emphasis on oligonucleotide degeneracy. Primers generated with such software may also be compatible with this protocol, although PCR conditions may require optimization, as the Primal Scheme software used in this protocol is designed with an emphasis on monitoring short-term evolution of known lineages, and primer conditions have been optimized for multiplex PCR amplification efficiency.

Propagation in cell culture is another method that has been widely used for virus enrichment33,34,35. This process is time-consuming, and requires specialist expertise and high containment laboratories for especially dangerous pathogens. There is also concern that viral passage can introduce mutations that are not present in the original clinical sample, potentially confounding analysis36,37.

Oligonucleotide bait probes have also shown promise as an alternative to metagenomics and amplicon sequencing38,39,40,41,42. These isolate viral nucleic acid sequences by hybridizing target-specific biotinylated probes to the DNA/RNA sample and then separating them using magnetic streptavidin-coated beads. Such methods, however, are limited by the efficiency of the capture step because of the kinetics of nucleic acid hybridization in complex samples such as those containing the human genome. The complete hybridization of all probes to targets can take hours (typical protocols suggest a 24-h incubation, although shorter times may be possible) and may never be achieved because of competitive binding by the host DNA. These methods suffer from a coverage bias, which worsens at lower viral abundances, resulting in increasingly incomplete genomes, as demonstrated by recent work on the Zika virus43. They work best on samples with higher viral abundances and may not have the sensitivity to generate near-complete genomes for the majority of isolates in an outbreak. Probes for hybridization capture are also more expensive than PCR primers because they are usually designed in a fully overlapping 75-nt scheme, which can run to hundreds of probes per virus and thousands for panels of viruses.

Direct sequencing of RNA has been recently demonstrated on the Oxford Nanopore MinION44,45. This method is attractive because it eliminates the need for reverse transcription, and so potentially may reduce biases resulting from nonrandom priming and copying errors introduced by reverse transcriptase. However, this method currently requires 500 ng of RNA as starting material and would suffer from the same sensitivity issues associated with cDNA metagenomics approaches when applied to samples containing very low viral copy numbers.

Limitations of tiling amplicon sequencing

Our method is not suitable for the discovery of new viruses or for sequencing highly diverse or recombinant viruses because primer schemes are virus-genome-specific. This protocol has not been validated for discovery of intra-host nucleotide variants, and we expect that minor allele frequencies will not be reliably recovered when amplifying from very small amounts of starting virus, as shown by Metsky et al.25. We expect that this method will work for larger virus genomes, but we have not tested this protocol with viral genomes longer than 12 kb. The protocol is designed for infections resulting from single clones, and may not perform well with mixed infections of diverse viruses. We have not tested performance of the method in chronic infections in which large amounts of diversity may have evolved within a patient (for example, viral quasispecies during HIV infection). Amplicon sequencing is prone to coverage dropouts that may result in incomplete genome coverage, especially at lower abundances, and the loss of both 5′ and 3′ regions that fall in regions not covered by primer pairs. Sequencing of complete 5′- and 3′-UTR regions may require alternative techniques such as RACE46. Targeted methods are also highly sensitive to amplicon contamination from previous experiments. Extreme caution should be taken to keep pre-PCR areas, reagents and equipment free of contaminating amplicons.

Experimental design

Description of the protocol. We describe a fully integrated end-to-end protocol for rapid sequencing of viral genomes directly from clinical samples. The protocol proceeds in four stages: (i) multiplex primer pool design, (ii) multiplex PCR, (iii) sequencing on MinION or Illumina instruments and (iv) bioinformatic analysis and quality control (QC) (Fig. 1).

Figure 1
figure 1

Workflow for tiling amplicon sequencing on MinION/Illumina platforms, with associated Procedure step numbers indicated.

Primer design. We developed a web-based primer design tool called Primal Scheme (http://primal.zibraproject.org), which provides a complete pipeline for the development of efficient multiplex primer schemes. Each scheme is a set of oligonucleotide primer pairs that generate overlapping products, the size of which is determined by the target genome length, amplicon length and overlap required, as discussed below. For Zika, we use 35 primer pairs, amplifying products of 400-nt length with a 100-nt overlap for the 11-kb viral genome. Together, the amplicons generated by the pairs span the target genome or region of interest (Fig. 2).

Figure 2: Overview of multiplex primer design using Primal Scheme online primer design tool.
figure 2

(a) Submission box for online primer design tool. (b) Primer table of results. (c) Schematic showing expected amplicon products for each pool in genomic context for the ZikaAsian and ChikAsianECSA schemes.

As input, Primal Scheme requires a FASTA file containing one or more reference genomes. The user specifies a desired PCR amplicon length (default = 400 nt, suggested values between 200 and 2,000 nt) and the desired length of overlap between neighboring amplicons (default = 75 nt). Using a shorter amplicon length may be useful for samples in which longer products fail to amplify (e.g., when the virus nucleic acid is highly degraded). However, if amplicon lengths become too short (e.g., <300 nt), it may not be possible to find suitable primer pairs; reducing the overlap parameter may help with this.

The Primal Scheme software performs the following processes:

Generation of candidate primers: The first sequence listed in the FASTA file should be the most representative genome, with further sequences spanning the expected interhost diversity. Primal Scheme uses the Primer3 software to generate candidate primer pairs (five, by default)47. It selects primers based on thermodynamic modeling, which takes into account length, annealing temperature, %GC, 3′ stability, estimated secondary structure and likelihood of primer–dimer formation, maximizing the chance of a successful PCR reaction. Primers are designed with a high annealing temperature within a narrow range (65–68 °C) that allows PCR to be performed as a 2-step protocol (95 °C denaturation, 65 °C combined annealing and extension) for highly specific amplification from clinical samples without the need for nested primers.

Testing of candidate primers: Subsequent reference genomes in the file are used to help choose primer pairs that maximize the likelihood of successful amplification of known virus diversity. A semi-global alignment score between each candidate primer and all supplied references is calculated to ensure that the most 'universal' candidate primers are picked for the scheme. Mismatches at the 3′ end are severely penalized, as they have a disproportionate effect on the likelihood of successful extension48,49. The alignment scores are summed, and the single best-scoring pair for each region is selected. If no candidates are returned by Primer3 for a region, most likely because all primers had insufficient annealing temperature, an error message prompting you to adjust the amplicon length or the overlap parameter will appear.

Output of primer pairs: Output files include a table of primer sequences to be ordered, a BED file of primer locations that can be used subsequently for primer trimming and a diagram of the primer scheme.

Choice of amplicon length. The choice of amplicon length when designing primer pools for sequencing is important. There is an inverse relationship between amplicon length and the number of primer pairs. It is believed that increasing the number of primer pairs reduces the likelihood of successful amplification of each region, owing to interaction between primers18. It is plausible that as the number of primer pairs increases, competitive inhibition may decrease PCR efficiency, although the high annealing temperature used in this protocol should reduce this risk. Longer amplicons are preferred, as they mean fewer primer pairs are needed per reaction. They also increase the amount of linkage information that can be recovered as haplotypes, which is of importance for investigation of within-host diversity. On the Illumina platform, 600 bases is the maximum size of amplicon that can be obtained using this protocol without an additional fragmentation step (using 600 cycle kits in paired-end mode—i.e., paired 300 nucleotides without any overlap), although read accuracy may degrade during the last 50 cycles. On the Oxford Nanopore MinION, there is no limit to the maximum amplicon length that can be sequenced; the maximum length is effectively limited by the performance of the reverse transcription and PCR (practically to 5 kb). However, longer amplicons are less likely to amplify successfully when viral copy number is low or there is sample degradation (e.g., because of inadequate storage).

Optimization of primer schemes. The majority of primers are expected to work even when pooled in equimolar amounts, meaning largely complete genomes can be recovered without optimization. For example, the chikungunya virus data shown in Table 2 were generated without any optimization. However, to achieve coding-sequence-complete genomes, problem primers causing inefficient amplification of certain regions may need to be replaced or their concentrations adjusted relative to other primers in an iterative manner. Complete coverage of the genome covered by the scheme—i.e., all amplicons successfully amplified—should be achievable for the majority of samples using this protocol; however, coverage is still expected to correlate with viral abundance (Table 3).

Table 2 Results of MinION R9.4 2D sequencing after barcode demultiplexing for an isolate of Zika and for a clinical sample of chikungunya virus.
Table 3 Results of amplicon scheme sequencing of five Zika-positive clinical samples collected from Colombia in January 2016 using the ZikaAsian scheme on the Illumina MiSeq (unpublished data, contributed by K.G.A. and N.D.G.).

Multiplex PCR Protocol. Next, we developed a multiplex PCR protocol using novel reaction conditions: specifically low individual primer concentrations, high primer annealing temperatures (>65 °C) and long annealing times, which allows amplification of products covering the whole genome in two reactions (Fig. 3). In comparison with single-plex methods, this markedly reduces the cost of reagents and minimizes potential sources of laboratory error. We assign alternate target genome regions to one of two primer pools, so that neighboring amplicons do not overlap within the same pool (which would result in a short overlap product being generated preferentially). By screening reaction conditions based on the concentration of cleaned-up PCR products and specificity as determined by gel electrophoresis, we determined that lower primer concentrations and a longer annealing/extension time were optimal. Given the low cost of the assay, this step could also be performed alongside standard diagnostic quantitative PCR as a quality control measure to help reveal potential false positives50.

Figure 3: Overview of multiplex tiling PCR and pooling.
figure 3

(a) Schematic showing the regions amplified in pools 1 (upper track) and 2 (lower track), and the intended overlap between pools (as determined in Step 1). (b) Products generated by PCR in Step 9 from pools 1 (left tube) and 2 (right tube) for the hypothetical scheme shown in a. (c) In Step 12A(ii), the input amount is normalized based on the number of samples and the scheme length; pool 1 and 2 products can be pooled at this stage (shown) or kept separate if you wish to barcode them individually. In Step 12A(iv), products for each sample are then barcoded by ligation of a unique barcode. In Step 12A(vi), all barcoded products are pooled together before sequencing adaptor ligation, yielding a sequenceable library.

Sequencing protocol optimizations. Optimized library preparation methods for both the MinION and Illumina MiSeq platforms are provided and should be readily adaptable to other sequencing platforms, if required. The MinION system is preferred when portability and ease of setup in harsh environments are important5. The Illumina platform is more suited to sequencing very large number of samples, because of greater sequence yields, and the ability to barcode and accurately demultiplex hundreds of samples. Both platforms use ligation-based methods to add the required sequencing adaptors and barcodes.

For the MinION, we used the native barcoding kit (Oxford Nanopore Technologies) to allow up to 12 samples to be sequenced per flow cell. As the manufacturer's protocol is designed for 6–8 kb of fragmented genomic DNA, we have adjusted the input mass to achieve an equivalent number of moles of DNA ends; this improves the efficiency of barcode/adaptor ligation and improves run yields. In the development of the protocol, we used R9 or R9.4 flow cells (FLO-MIN105/FLO-MIN106) and the 2D barcoded library preparation kit (EXP-NBD002/SQK-LSK208). The protocol is also compatible with the current 1D barcoded library preparation kit (EXP-NBD103/SQK-LSK108). Because of the regular revisions of the kits, we have avoided including any specific component names or volumes; be sure to follow the appropriate protocol for your chosen kit version. Depending on the number of reads required, the number of samples multiplexed and the performance of the flow cell, sequencing on the MinION can take from a few minutes up to 72 h. Typically, 2–4 h of sequencing is sufficient for 12 samples. For the MiSeq platform, we used the Agilent SureSelectxt2 adaptors and the KAPA Hyper library preparation kit, allowing up to 96 samples to be sequenced per MiSeq run. Other library prep kits (e.g., Illumina TruSeq) and dual-indexed adaptors could also be used on the MiSeq. For the MiSeq, we recommend using the 2 × 250-nt read-length for 400-nt amplicons, which takes 48 h to complete.

Bioinformatics workflow; MinION pipeline. We developed bioinformatic pipelines consisting of primer trimming, alignment, variant calling and consensus generation for both the Oxford Nanopore and Illumina platforms. The MinION pipeline was developed by building upon tools previously developed for Ebola virus sequencing in Guinea and is freely available with components developed under the permissive MIT open source license at https://github.com/zibraproject/zika-pipeline. The pipeline runs under the Linux operating system and is available as a Docker image, which means that it can also be run on Mac and Windows operating systems. The MinION version of the pipeline can process the data from basecalled reads to consensus sequences on the instrument laptop, given the correct primer scheme (a BED file).

FAST5 reads containing raw nanopore signal data may be basecalled in real time using MinKNOW (accessible via the MinION Community Portal for registered users at http://community.nanoporetech.com) or off-line using Albacore. Albacore is a recurrent neural network (RNN) basecaller developed by Oxford Nanopore Technologies and also made available through the MinION Community Portal. Reads are extracted into a FASTA file using the poretools fasta command. This FASTA file may be demultiplexed by a script, demultiplex.py, into separate FASTA files for each barcode, as specified in a config file. By default, these are set to the barcodes NB01–12 from the native barcoding kit. Alternatively, the Metrichor online service (https://www.metrichor.com) and versions of Albacore 1.0.1 or later may be used to basecall read files and demultiplex samples. Each file is then mapped to the reference genome using bwa mem using the -x ont2d flag and converted to BAM format using samtools view. Alignments are preprocessed using a script (align_trim.py) that performs primer trimming and coverage normalization. Primer trimming is performed by reference to the expected coordinates of sequenced amplicons, and therefore requires no knowledge of the sequencing adaptor (Fig. 3). Signal-level events are aligned and variants are called using nanopolish variants. Low-quality or low-coverage variants are filtered out and consensus sequences are generated using a script, margin_cons.py. Variant calls and frequencies can be visualized using vcfextract.py and pdf_tree.py.

Bioinformatics workflow; Illumina pipeline. First, we use Trimmomatic51 to remove primer sequences (first 22 nt from the 5′ end of the reads) and bases at both ends with Phred quality scores <20. Reads are aligned to the genome of a Zika virus isolate from the Dominican Republic, 2016 (GenBank: KU853012), using Novoalign v3.04.04 (http://www.novocraft.com/support/download/). SAMtools is used to sort the aligned BAM files and to generate alignment statistics52. The code and reference indexes for the pipeline can be found at https://github.com/andersen-lab/zika-pipeline. Snakemake is used as the workflow management system53.

Alignment-based consensus generation. We have used an alignment-based consensus approach to generate genomes as opposed to de novo assembly. Although de novo assembly could in theory be used with this protocol, the use of a tiling amplicon scheme already assumes that the viral genome is present in a particular fixed order. This assumption may be violated in the presence of large-scale recombination. Some de novo assemblers, such as SPAdes, use a frequency-based error correction preprocessing stage, and this may result in primer sequences being artificially introduced into the reference if primer sequences are not removed in advance54. Importantly, when we compared alignment with de novo-based analysis methods for our generated Zika virus genomes, we found that we always obtained the same consensus sequences.

Preparing sequencing controls. We recommend that positive sample controls be included in each sequencing run. To check that the protocol is generating the expected results, we recommend choosing a positive sample with an established, trusted reference sequence. For the Zika virus, we used the previously sequenced World Health Organization reference strain PF13/251013-18 (GenBank accession: KX369547), which can be obtained on request from the Paul-Ehrlich-Institut55,56. Sample archives such as the National Collection of Pathogenic Viruses in the United Kingdom can provide high-quality reference materials for other viruses. Positive controls should have viral copy numbers similar to those of the clinical samples on the same run. This may require the positive control to be heavily diluted until the Ct values are comparable. Negative sequencing controls should be processed in a manner as similar as possible to that used for clinical samples and should not be simply water controls; for example, if samples are collected by swabs, then the same type of unused swab should be subjected to RNA extraction and PCR. Additional negative water controls may be added at each step (e.g., reverse transcription, PCR and library preparation) to detect the sources of contaminants. Even if amplification is not detected (e.g., by gel electrophoresis) or DNA quantity is low or undetectable by fluorimetry, a sequencing library should still be prepared as normal using the total available amount, as contamination may still be detectable by sequencing.

Contamination. Cross-contamination is a serious potential problem when working with amplicon sequencing. Contamination risk is minimized by maintaining physical separation between pre- and post-PCR areas, and performing regular decontamination of work surfaces and equipment—e.g., by UV exposure or with 1% (vol/vol) sodium hypochlorite solution. Contamination becomes harder to mitigate with decreasing viral copy numbers. Processing high-viral-count samples can lead to overamplification during PCR (e.g., generation of unnecessarily high numbers of amplicons), which can increase the risk of amplicon contamination in subsequently processed samples with low viral counts. Such 'between-sample amplification' can occur during sequencing library preparation, or may result from barcode misidentification or 'barcode hopping' (incorporation of incorrect barcode sequences during sequence library preparation) during sequencing. When determining how many PCR cycles to use, begin with a lower number and increase gradually to minimize this contamination risk.

The best safeguard for helping to detect contamination is the use of negative controls. These controls should be sequenced even if no DNA is detected by quantification or no visible band is present on a gel. Negative control samples should be analyzed through the same software pipeline as is used for the other samples, and you should assume that any contaminating amplicons in the negative control will also be present in your other samples. The relative number of reads as compared with positive samples gives a simple guide to the extent of contamination, and inspection of coverage plots can help identify any specific region involved.

Materials

REAGENTS

Tiling amplicon generation

  • Clinical sample (serum, plasma, urine) or isolate

    Caution

    Any potentially infectious clinical samples should be handled and made safe in accordance with biosafety regulations. If unsure, contact your local safety officer.

    Caution

    Please follow local institutional review board guidelines covering the collection and storage of clinical samples for research purposes. Our study was evaluated and approved by institutional review boards (IRBs) at The Scripps Research Institute and relevant local IRBs in Colombia and Brazil for Zika and chikungunya sample collection and sequencing.

  • QIAamp Viral RNA Mini Kit (Qiagen, cat. no. 52906)

    Caution

    Please consult the MSDS document for safety information on specific kits.

  • Random hexamers (50 μM; Thermo Fisher Scientific, cat. no. N8080127)

  • Protoscript II First Strand cDNA Synthesis Kit (NEB, cat. no. E6560)

  • dNTP solution mix (NEB, cat. no. N0447)

  • Q5 Hot Start High-Fidelity DNA Polymerase (NEB, cat. no. M0493)

    Critical

    The primer annealing temperatures are optimized for these reagents. While others may work, thermocycling conditions may need to be optimized.

  • PCR primers (listed in Supplementary Tables 1 and 2 (Integrated DNA Technologies))

  • Agencourt AMPure XP (Beckman Coulter, cat. no. A63881)

  • Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, cat. no. Q32854)

  • HyPure Molecular Biology Grade Water (GE Life Sciences, cat. no. SH30538.01)

  • EB buffer (10 mM Tris-Cl, pH 8.5; Qiagen, cat. no. 19086)

  • TE buffer (10 mM Tris-Cl, 1 mM EDTA, pH 8.0; Sigma-Aldrich, cat. no. 93283-500ML)

  • Ethanol, absolute (Thermo Fisher Scientific, cat. no. BP28184)

Gel electrophoresis

  • Gel Loading Dye, Purple (6×) (NEB, cat. no. B7024)

  • 100-bp DNA Ladder (NEB, cat. no. N3231)

  • SeaKem LE Agarose (Lonza, cat. no. 5000)

  • 10× TBE Buffer (Lonza, cat. no. 50843)

  • SYBR Safe DNA Gel Stain (Thermo Fisher Scientific, cat. no. S33102)

MinION sequencing

  • MinION Flow Cell (Oxford Nanopore Technologies, cat. no. FLO-MIN106)

  • Nanopore Sequencing Kit (Oxford Nanopore Technologies, cat. no. SQK-LSK108)

  • Native Barcoding Kit (Oxford Nanopore Technologies, cat. no. EXP-NBD103)

  • NEB Next Ultra II End-repair/dA-tailing Module (NEB, cat. no. E7546)

  • NEB Blunt/TA Ligase Master Mix (NEB, cat. no. M0367)

MiSeq sequencing

  • KAPA Hyper Prep Kit (Roche, cat. no. 07962363001)

  • SureSelectxt2 indexes, MSQ, 16 (Agilent, cat. no. G9622A)

  • MiSeq Reagent Kit v2 (500 cycle) (Illumina, cat. no. MS-102-2003)

  • D1000 ScreenTape (Agilent, cat. no. 5067-5582)

  • D1000 Reagents (Agilent, cat. no. 5067-5583)

  • KAPA Library Quantification Kit for Illumina platforms (Roche, cat. no. 07960140001)

EQUIPMENT

Standard equipment

  • Filtered pipette tips

  • 1.5-ml microcentrifuge tube (Eppendorf, cat. no. 0030 108.051)

  • 0.2-ml strip tubes with attached caps (Thermo Fisher Scientific, cat. no. AB2000)

  • UV spectrophotometer (Thermo Fisher Scientific NanoDrop 2000, cat. no. ND-2000)

  • 96-well thermocycler (Applied Biosystems Veriti, cat. no. 4375786)

  • Benchtop microcentrifuge (Thermo Fisher Scientific mySPIN 6, cat. no. 75004061)

  • Benchtop heater/shaker (Eppendorf ThermoMixer C)

  • Magnetic rack (Thermo Fisher Scientific DynaMag-2, cat. no. 12321D)

  • PCR cabinet or pre-PCR room

MinION sequencing

  • MinION (Oxford Nanopore Technologies, cat. no. MinION Mk1B)

  • Laptop with solid-state disk (SSD) drive

MiSeq sequencing

  • MiSeq (Illumina)

  • TapeStation 2200 (Agilent)

Gel electrophoresis

  • Mini-Sub Cell GT (Bio-Rad, cat. no. 1704406)

  • PowerPac Universal Power Supply (Bio-Rad, cat. no. 1645070)

Procedure

Design and ordering of primers

Timing 1 h

  1. 1

    (Optional) Identify representative reference sequences (e.g., from public databases such as GenBank) and generate a primer scheme using Primal Scheme by visiting http://primal.zibraproject.org; see EXPERIMENTAL DESIGN section for further information. Alternatively, predesigned primer schemes are provided in Supplementary Tables 1 and 2.

    Critical Step

    Choose an amplicon length that is suitable for your sequencing platform, and the likely viral copy number of your sample—e.g., 300–500 nt for Zika on MinION/Illumina.

    Troubleshooting

  2. 2

    Order the primers generated in Step 1 by the online tool or from the predesigned schemes provided in Supplementary Tables 1 and 2 from your oligonucleotide supplier.

    Critical Step

    Primers can be ordered prediluted in TE buffer to 100 μM (usually at additional cost) to avoid manually resuspending a large number of primers.

Extraction and preparation of nucleic acid template

  1. 3

    Tiling amplification is a general technique that can be applied to DNA or to cDNA generated from RNA by reverse transcription. Use option A to extract viral RNA from samples and prepare cDNA by reverse transcription for analysis of RNA viruses, or use option B to extract viral DNA from samples for analysis of DNA viruses:

    1. A

      RNA extraction and preparation of cDNA for analysis of RNA viruses • TIMING 2 h

      1. i

        Extract RNA from 200 μl of serum, plasma or urine using the QIAamp Viral RNA Mini Kit according to the manufacturer's instructions, eluting in 50 μl of EB buffer.

      2. ii

        Measure the absorption spectra using a spectrophotometer. Pure RNA should have a 260/280 ratio of 2.0 and a 260/230 ratio of 2.0–2.2.

        Troubleshooting

      3. iii

        Wash all surfaces with 1% (vol/vol) sodium hypochlorite solution and irradiate labware with UV light for at least 10 min.

        Critical Step

        Perform the following three steps in a hood or dedicated pre-PCR area.

      4. iv

        Mix the following components in a 0.2-ml tube:

        Component

        Amount (μl)

        Final concentration

        Template RNA (from Step 3A(i))

        7

         

        Random hexamers, 50 μM

        1

        2.5 μM

      5. v

        Denature the template RNA by incubating it on a heat block at 65 °C for 5 min before promptly placing it on ice.

        Critical Step

        This denaturation step minimizes the secondary structure in the RNA before cDNA synthesis.

      6. vi

        Complete the cDNA synthesis reaction preparation by adding the following to the tube:

        Component

        Amount (μl)

        Final concentration

        ProtoScript II Reaction mix (2×)

        10

        ProtoScript II Enzyme Mix (10×)

        2

      7. vii

        Place the tube in a thermocycler and run the following program:

        Cycle number

        Condition

        1

        25 °C, 5 min

        1

        48 °C, 15 min

        1

        80 °C, 5 min

        Pause point

        cDNA can be stored at −20 °C for a month.

    2. B

      DNA extraction and preparation for analysis of DNA viruses • TIMING 1 h

      1. i

        Extract DNA from 200 μl of serum, plasma or urine using the QIAamp MinElute Virus Spin Kit, according to the manufacturer's instructions.

      2. ii

        Measure the absorption spectra using a spectrophotometer. Pure DNA should have a 260/280 ratio of 1.8 and a 260/230 ratio of 2.0–2.2.

        Troubleshooting

Preparation of the primer pools

Timing 1 h

  1. 4

    (Optional) Resuspend lyophilized primers by prespinning tubes to make sure that the pellet is at the bottom of the tube and adding TE buffer to a concentration of 100 μM. If primers were ordered prediluted to 100 μM, continue to the next step.

    Critical Step

    The volume of TE buffer needed to yield a 100 μM solution is often given on the QC document supplied with the primer.

  2. 5

    Label two 1.5-ml Eppendorf tubes using the scheme and pool, which are numbered as either '1' or '2'; primers for adjacent regions are added to alternate pools so that individual reactions overlap between pools but not within. Add an equal volume of each 100 μM primer stock such that both the forward and reverse primers for alternate regions are pooled together. For example, Pool '1' for a ZikaAsian scheme would contain ZIKA_400_1_LEFT, ZIKA_400_1_RIGHT, ZIKA_400_3_LEFT, ZIKA_400_3_RIGHT, ZIKA_400_5_LEFT, ZIKA_400_5_RIGHT and so on. Dilute these at a ratio of 1:10 with nuclease-free water to a working concentration of 10 μM.

Performing of multiplex tiling PCR

Timing 5 h

  1. 6

    In Eppendorf tubes, prepare a mastermix for each of the 2 primer pools, as follows.

    Component

    Amount (μl)

    Final concentration

    Q5 reaction buffer (5×)

    5

    dNTPs, 10 mM

    0.5

    200 μM

    Q5 DNA polymerase

    0.25

     

    Primer pool 1 or 2 (10 μM)

    Variable

    0.015 μM per primer

    PCR-grade water

    Up to 22.5 μl (assuming 2.5 μl of cDNA template will be added in Step 7)

    Mix thoroughly by vortexing and spin down in a microcentrifuge.

    Critical Step

    The total volume of mastermix should be 22.5 μl multiplied by the number of samples plus 10% excess volume; this is done to reduce variability between reactions.

    Critical Step

    The volume of primers to use will depend on the number of primers in the pool, as the final concentration should be 0.015 μM per primer. For example, the ZikaAsian scheme from Supplementary Table 1 has 36 primers in pool 1, so the volume to use would be 1.35 μl.

  2. 7

    Label 0.2-ml PCR tubes and add 22.5 μl of mastermix from Step 6 to each tube. If using cDNA from Step 3A(vii) as template, add 2.5 μl of cDNA to each tube. If you are using extracted DNA from Step 3B(ii), however, a larger volume of template (up to 10 μl) can be added, if required, and may improve amplification efficiency.

    Critical Step

    It is recommended that cDNA volume be kept to 10% of the final volume of the PCR reaction to avoid affecting the buffer conditions.

    Critical Step

    This step should ideally be performed in a cabinet used only for template addition in order to minimize the risk of amplicon contamination.

  3. 8

    Place in a thermocycler and run the following program:

    Cycle number

    Denature

    Anneal/extend

    1

    98 °C, 30 s

     

    2–40

    98 °C, 15 s

    65 °C, 5 m

Cleanup and quantification of amplicons

Timing 2 h

  1. 9

    Transfer the contents of the tubes to 1.5-ml Eppendorf tubes. Add the volume of AMPure XP beads given in the table below, taking into account amplicon length. Perform washes following the instructions in the 1D barcoding protocol and elute in 30 μl of EB buffer.

    Amplicon length (bp)

    Ratio

    Volume of beads (μl) for a 25-μl PCR reaction

    <500

    1.0×

    25

    500–1,000

    0.8×

    20

    >1,000

    0.6×

    15

  2. 10

    Quantify 1 μl of the cleaned products using the Qubit instrument with the high-sensitivity assay per the manufacturer's instructions. You should expect concentrations in the range of 5–50 ng/μl for each reaction from the Qubit quantification, except for the PCR negative control, which should be repeated if >1 ng/μl.

    Troubleshooting

    Pause point

    Cleaned-up PCR products can be stored at −20 °C for up to a month.

  3. 11

    (Optional) Make a gel by melting 1% (wt/vol) agarose powder in 1× TBE buffer and then adding 1× SYBR Safe gel stain before allowing it to set. Place in a gel tank submerged in 1× TBE buffer. Mix 10 μl of cleaned product from Step 9 or a ladder with 2 μl of 6× loading dye and load on the gel. Perform electrophoresis at 6 V/cm until bands are distinguishable by transillumination. A specific band of the correct size for your scheme should be observed.

    Troubleshooting

Library preparation and sequencing

  1. 12

    Perform library preparation and sequencing; these procedures are platform specific and have been validated on the MinION from Oxford Nanopore Technologies (option A) and on the MiSeq from Illumina (option B).

    1. A

      Library preparation and sequencing using the MinION • TIMING 1–2 d

      1. i

        Determine the number of samples per flow cell. We recommend using two barcodes per sample or negative control (one barcode per pool per sample) initially. This means that up to five samples and one negative control can be sequenced on each flow cell, and it allows each pool to be barcoded individually, making it easier to detect contamination that may be pool- rather than sample-specific. However, a single barcode per sample can also be used to maximize the number of samples per flow cell.

      2. ii

        Normalization. Use the table below to determine the quantity of amplicons to load to achieve a total input of 0.3 pM per flow cell. Divide the total input quantity by the number of barcodes being used to calculate the quantity per barcode. Keep PCR products separate at this stage; add the appropriate volume of each sample from Step 9 to individual 1.5-ml Eppendorf tubes and then adjust the volume in each Eppendorf to 20 μl with nuclease-free water.

        Amplicon length (bp)

        Input total (ng)

        300

        60

        400

        80

        500

        100

        1,000

        200

        1,500

        300

        2,000

        400

        5,000

        1,000

        Troubleshooting

      3. iii

        End-repair and dA-tailing. For each sample, set up the following end-repair/dA-tailing reaction in a 1.5-ml Eppendorf tube and incubate for 5 min at 20 °C, followed by 5 min at 65 °C. Perform SPRI cleanup by repeating Step 9, eluting in 10 μl of EB buffer.

        Component

        Amount (μl)

        Normalized amplicons (from Step 12A(ii))

        20

        Ultra II End Prep Reaction Buffer

        2.8

        Ultra II End Prep Enzyme Mix

        1.2

      4. iv

        Barcode ligation. In a 1.5-ml Eppendorf tube, prepare the following ligation reactions—one reaction per barcode being used.

        Component

        Amount (μl)

        dA-tailed amplicons (from Step 12A(iii))

        10

        Native barcode NB01-NB12

        2.5

        Blunt/TA Ligase Master Mix

        12.5

      5. v

        Incubate at room temperature (20 °C) for 10 min, followed by 65 °C for 10 min to denature the ligase.

      6. vi

        Pool barcoded amplicons. Combine all the barcode ligation reactions into a single 1.5-ml Eppendorf tube. Perform SPRI cleanup by repeating Step 9 and elute in 30 μl of nuclease-free water.

        Critical Step

        If the pellet is large, you can speed up drying by briefly incubating at 50 °C; do not allow the pellet to overdry and crack, or recovery will be reduced.

      7. vii

        Barcoding adaptor ligation. In a 1.5-ml Eppendorf tube, prepare the adaptor ligation reaction according to the native barcoding kit protocol. As Blunt/TA ligase is a 2× master mix, we have reduced the elution volume in the previous step to accommodate this.

        Critical Step

        We have found that using Blunt/TA Ligase instead of the NEBNext Quick Ligation Module as described in the protocol improves the efficiency of this step.

      8. viii

        Library cleanup and elution. Complete library construction according to the native barcoding kit protocol.

      9. ix

        Library loading. Perform library loading according to the native barcoding kit protocol; there is a helpful video guide to loading the library on the MinION Community Portal (http://community.nanoporetech.com).

        Troubleshooting

      10. x

        Start sequencing run. By default, you will need an Internet connection before the sequencing script can be started, although off-line versions of MinKNOW can be requested from the manufacturer if an Internet connection is not available. Once the flow cell is detected, enter an experiment name and the flow cell ID into the blank fields and then choose the appropriate sequencing script for the library preparation kit and flow cell version.

        Critical Step

        Note that if a 'Live' basecalling script is run, reads will be basecalled in real time. If you do this, then you do not need to perform the basecalling step when running the subsequent bioinformatic pipeline.

        Troubleshooting

    2. B

      Library preparation and sequencing using the MiSeq • TIMING 3 d

      1. i

        Determine the number of samples per flow cell; we recommend using two barcodes per sample, which means up to 47 samples plus a negative control can be sequenced on each run and allows each pool to be barcoded individually. This makes it easier to detect contamination that may be pool- rather than sample-specific. It also results in greater yield per sample, which improves genome coverage in samples with more uneven amplification.

      2. ii

        Normalization. Keep pools in individual 1.5-ml Eppendorf tubes, add 50 ng of sample material from Step 9 and add nuclease-free water to adjust the total volume to 50 μl.

      3. iii

        End-repair and dA-tailing. Perform end-repair and dA-tailing according to the Hyper Prep Kit protocol (KAPA).

      4. iv

        Library preparation. Complete library construction with the KAPA Hyper Prep Kit according to the manufacturer's instructions, substituting the KAPA adaptors for SureSelectxt2 indexing adaptors in the adaptor ligation step. Perform a 0.8× instead of 1× SPRI cleanup during the postamplification cleanup to remove potential adaptor–dimers.

      5. v

        Library quality control. Measure the size distribution of the library using the TapeStation 2200 according to the manufacturer's instructions.

      6. vi

        Library pooling. Calculate the molarity of each library using the KAPA Library Quantification Kit according to the manufacturer's instructions and pool libraries in an equimolar manner.

      7. vii

        Library denaturation and dilution. Prepare library for loading onto the MiSeq according to the manufacturer's instructions.

      8. viii

        Start sequencing run. Generate a SampleSheet.csv file with the Illumina Experiment Manager software by entering sample and barcode information. Complete the instrument setup and start the sequencing run according to the manufacturer's recommended instructions.

      9. ix

        Basecalling and demultiplexing. Basecalling and demultiplexing will be performed automatically on the instrument using the sample information provided in the SampleSheet.csv file.

Data analysis

Timing 1–2 h

  1. 13

    Download the Docker application for Linux, Macintosh or Windows from https://www.docker.com/products/overview. Run the installer to set up the Docker tools on your machine. You should now be able to open a terminal window and run the command docker --version without getting an error.

  2. 14

    Download the Zika pipeline image from DockerHub by typing docker pull zibra/zibra into the terminal window.

    Critical Step

    The source code of the Zika analysis pipeline is also available from https://github.com/zibraproject/zika-pipeline.

    Critical Step

    The Zika pipeline is compatible with both MinION data and Illumina data, yet there are some differences in the data handling required.

  3. 15

    Start a Docker container with the following command:docker run -t -i zibra/zibra:latest

    By default, Docker containers do not have access to the file system of the computer they run within. You will need to provide access to a local directory in order to see data files. This is achieved using the -v parameter. You may need to grant access to Docker to share the drive via the 'Shared Drives' menu option under 'Settings'. For example, on Windows, if you wish to provide access to the c:\data\reads directory to the Docker container, use the following:docker run -v c:/data/reads:/data -t -i zibra/zibra:latest /bin/bash

    Then, within the Docker container, the /data directory will refer to c:\data\reads on the Windows machine.

  4. 16

    Run the platform-specific pipeline using option A for MinION data or option B for MiSeq data:

    1. A

      Running analysis pipeline on MinION data

      1. i

        Ensure that the reads are basecalled using either Metrichor or an off-line basecaller. Compatible off-line basecallers include Albacore (available as installable packages for Linux, Windows and Macintosh through the MinION Community Portal) or the freely available and open-source nanonet (https://github.com/nanoporetech/nanonet) software. nanonet is compatible with graphics processing unit cards to increase speed.

      2. ii

        Metrichor will perform demultiplexing if a barcoding workflow is selected. For other basecallers you may need to demultiplex reads manually. To do this, run the script that is provided within the Docker image with the command: demultiplex <directory of FAST5 Files> <output directory>

      3. iii

        Run the Zika pipeline using the following command: fast5_to_consensus <scheme> <sampleID> <directory>

        The pipeline takes three required items:

        sample_id —the sample name (should not contain space characters)

        directory —the directory containing the FAST5 files for a single sample (e.g., demultiplexed output directory from Step 16A(ii))

        scheme —the name of the scheme directory—e.g., ZikaAsian

        For example:fast5_to_consensus ZikaAsian Zika1 /data/NB08/downloads/pass

        Output files will be written to the current directory. The final consensus file will be named <sampleID>.consensus.fasta

    2. B

      Running analysis pipeline on Illumina data

      1. i

        Download and follow the instructions for the Illumina pipeline by referring to https://github.com/andersen-lab/zika-pipeline and using the following command: illumina_pipeline <sampleID> <fastq1> <fastq2> <scheme>

Quality control

Timing 1 h

  1. 17

    Check the coverage of the genomes by reference to the alignment file. Use an alignment viewer such as IGV57 or Tablet58 and load the <sampleID>.primertrimmed.sorted.bam file in conjunction with the reference sequence. Amplicons should be evenly spread throughout the genome. Deep piles of reads representing amplification of single regions are potential warning signs of contamination. Compare the alignments with the positive and negative control alignments to help indicate problematic samples or regions.

    Troubleshooting

  2. 18

    Use the variant frequency plot produced by the Zika pipeline to help determine the allele frequency of mutations in the sample (as compared with the reference). The variant frequency plot is given the name <sampleID>.variants.png and is generated from the <sampleID>.variants.tab file that can be opened in spreadsheet applications or statistical software. The principle of the variant frequency plot is to identify mutations that occur at lower-than-expected allele frequencies and help decide whether they are a biological phenomenon (e.g., intra-host single-nucleotide variants), potential signs of contamination or sequencing errors (for example, in homopolymeric tracts in MinION data).

Troubleshooting

Troubleshooting advice can be found in Table 4.

Table 4 Troubleshooting table.

Timing

Steps 1 and 2, design and ordering of primers: 1 h

Step 3A, RNA extraction and preparation of cDNA: 2 h

Step 3B, DNA extraction: 1 h

Steps 4 and 5, preparation of the primer pools: 1 h

Steps 6–8, performing of multiplex tiling PCR: 5 h

Steps 9–11, cleanup and quantification of amplicons: 1 h

Step 12A, library preparation and sequencing using the MinION: 1–2 d

Step 12B, library preparation and sequencing using the MiSeq: 3 d

Steps 13–16, data analysis: 1–2 h

Steps 17 and 18, quality control of consensus sequences: 1 h

Anticipated results

This protocol should achieve near-complete genome coverage.

MinION sequencing

As a demonstration of the ZikaAsian scheme on MinION, we sequenced the World Health Organization Zika reference sample 11474/1655 (Table 2) and a chikungunya clinical sample from Brazil, PEI-N11602. The Ct value for the Zika virus sample was between 18 and 20 depending on the RNA extraction method used. The Ct value for the Chikungunya sample was 20, as determined by the RealStar Chikungunya RT-PCR Kit 1.0 from Altona Diagnostics (Hamburg, Germany). The Zika virus sample generated 97.7% coverage of the genome above 25× coverage. Coverage of the genome was reasonably even, with a dropout in the middle of the genome (Fig. 4). The WHO Control Reference MinION data set is available from the CLIMB website (https://s3.climb.ac.uk/nanopore/Zika_Control_Material_R9.4_2D.tar).

Figure 4: Coverage plots for ZikaAsian scheme sequenced on MinION before (top panel) and after primer trimming and coverage normalization (bottom panel).
figure 4

During the preprocessing step, reads are trimmed using a BED file containing primer positions, and read coverage is normalized. The coverage plot was produced using the Tablet genome viewer58 with the Zika reference genome coordinates represented on the x axis and depth of coverage on the y axis. Alignments are colored by depth of coverage, with darker regions indicating higher depths of coverage—e.g., in overlapping regions.

Illumina sequencing

We compared metagenomic sequencing with the ZikaAsian scheme with the Illumina MiSeq protocol using five clinical samples of Zika from Colombia. Using a previously described method for metagenomics sequencing2,17, only a small percentage (<0.01%) of our reads aligned to Zika virus and they covered only a fraction of the genome (Table 1). Using the ZikaAsian scheme, we were able to generate high coverage of all the genomes (Table 3). Illumina sequencing reads are available from BioProject PRJNA358078 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA358078).

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.