Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence-complete genomes, comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimized library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an Internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved in 1–2 d by starting with clinical samples and following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. The protocol can be used to sequence other viral genomes using the online Primal Scheme primer designer software. It is suitable for sequencing either RNA or DNA viruses in the field during outbreaks or as an inexpensive, convenient method for use in the lab.
Genome sequencing of viruses has been used to study the spread of disease in outbreaks1. Real-time genomic surveillance is important in managing viral outbreaks, as it can provide insights into how viruses transmit, spread and evolve1,2,3,4. Such work depends on rapid sequencing of viral material directly from clinical samples—i.e., without the need to isolate the virus in pure culture. During the Ebola virus epidemic of 2013–2016, prospective viral genome sequencing was able to provide critical information on virus evolution and help inform epidemiological investigations3,4,5,6. Sequencing directly from clinical samples is faster, less laborious and more amenable to near-patient work than time-consuming culture-based methods. Metagenomics, the process of sequencing the total nucleic acid content in a sample (typically cDNA or DNA), has been successfully applied to both virus discovery and diagnostics7,8,9. Metagenomic approaches have seen rapid adoption over the past decade, fueled by relentless improvements in the yield of high-throughput sequencing instruments5,10,11,12. Whole-genome sequencing of Ebola virus directly from clinical samples without amplification was possible because of the extremely high virus copy numbers found in acute cases13,14,15. However, direct metagenomic sequencing from clinical samples poses challenges with regard to sensitivity: genome coverage may be low or absent when attempting to sequence viruses that are present at low abundance in a sample with high levels of host nucleic acid background.
Development of the protocol
During recent work on the Zika virus epidemic16, we found that it was difficult to generate whole-genome sequences directly from clinical samples using metagenomic approaches (Table 1). These samples had cycle threshold (Ct) values between 33.9 and 35.9 (equivalent to 10–48 genome copies per microliter). Before sequencing, these samples were depleted of human rRNA and prepared for metagenomic sequencing on the Illumina MiSeq platform as previously described2,17. In these cases, sequences from Zika virus comprised <0.01% of the data set, resulting in incomplete coverage. Greater coverage and depth are critical for accurate genome reconstruction and subsequent phylogenetic inference. In addition, there are substantial sequencing, analysis and storage costs associated with generating large sequencing data sets; therefore, metagenomic approaches currently do not lend themselves to the cost-effective use of lower-throughput portable sequencing devices such as the Oxford Nanopore MinION.
To generate complete viral genome coverage from clinical samples in an economic manner, target enrichment is often required18. Enrichment can be achieved directly through isolation in culture or the use of oligonucleotide bait probes targeting the virus of interest, or indirectly via host nucleic acid depletion. Amplification may also be required to generate sufficient material for sequencing (>5 ng for typical Illumina protocols and 100–1,000 ng for MinION). PCR can provide both target enrichment and amplification in a single step, and is relatively cheap, available and fast as compared with other methods. To generate coding-sequence complete coverage, a tiling amplicon scheme is commonly used19,20,21. During our work with Ebola virus, we were able to reliably recover >95% of the genome by sequencing 11 long amplicons (1–2.5 kb in length) on the MinION5.
The likelihood of long fragments being present in the sample, however, reduces with lower virus abundance. Therefore, we anticipated that, for viruses such as Zika that are present at low abundance in clinical samples, we would be more likely to amplify shorter fragments. As an extreme example of this approach, a recent approach termed 'jackhammering' was used to amplify degraded HIV-1 samples stored for >40 years; this approach used 200–300 nt amplicons to help maximize sequence recovery22. Using shorter amplicons necessitates a larger number of products to generate a tiling path across a target genome. Doing this in individual reactions requires a large number of manual pipetting steps and therefore increases the potential for mistakes, with a heightened risk of cross-contamination, as well as a greater cost in time and consumables. To solve these problems, we designed a multiplex assay to carry out tens of reactions in an individual tube. This method has been subsequently used to perform Zika sequencing in order to understand the spread of Zika virus in the Americas16,23,24,25,26. Our resulting step-by-step protocol, described here, allows any researcher to successfully amplify and sequence viruses of low abundance directly from clinical samples. The method also has other potential uses that are not demonstrated here. One potential application is multilocus sequencing typing approaches, which could be carried out by amplifying conserved genes from bacteria, fungi and yeasts. Simultaneously, antibiotic-resistance-determining genes or key virulence genes could also be targeted in the same assay. The scheme could also be used to sequence chloroplast and mitochondrial genomes.
Comparison with other approaches
The three most common approaches for sequencing viruses are metagenomic sequencing, PCR amplicon sequencing and target enrichment sequencing, recently reviewed in detail by Houldcroft et al.27. The main benefits of the PCR-based approach described here are cost and sensitivity. In theory, both PCR and cell culture require only one viral copy, making them both exquisitely sensitive. In practice, however, the reaction conditions do not allow single-genome amplification, and, typically, multiple starting molecules are required. PCR also has limited sensitivity in cases in which the template sequence is divergent from the expected because of primer-binding kinetics. However, in an outbreak situation in which isolates are highly related, and low cost per sample and rapid turnaround time are required, PCR is particularly suitable. Sequencing amplicons on the Oxford Nanopore MinION is a popular method for determining viral genomes and has been used for diverse viruses, including Ebola, influenza and poxvirus, using either single primer pair reactions generating long amplicons (>1 kb) or multiple reactions that are pooled before sequencing5,28,29,30. However, these approaches are laborious to scale up when many small amplicons are required (because of low viral copy numbers), or when multiple samples are sequenced on a single sequencing run, as in this protocol.
The most similar alternative approach to the one described here is AmpliSeq (Life Technologies), which was previously used for Ebola sequencing on the Ion Torrent PGM6. However, this method is specific to the Ion Torrent platform, and primer schemes must be ordered directly from the manufacturer; thus, it may consequently be more expensive per sample. Alternative software packages for designing primer schemes are available, some of which cater specifically to multiplex or tiling amplicon schemes20,21,31,32, and these may perform better when dealing with divergent genomes because of an increased emphasis on oligonucleotide degeneracy. Primers generated with such software may also be compatible with this protocol, although PCR conditions may require optimization, as the Primal Scheme software used in this protocol is designed with an emphasis on monitoring short-term evolution of known lineages, and primer conditions have been optimized for multiplex PCR amplification efficiency.
Propagation in cell culture is another method that has been widely used for virus enrichment33,34,35. This process is time-consuming, and requires specialist expertise and high containment laboratories for especially dangerous pathogens. There is also concern that viral passage can introduce mutations that are not present in the original clinical sample, potentially confounding analysis36,37.
Oligonucleotide bait probes have also shown promise as an alternative to metagenomics and amplicon sequencing38,39,40,41,42. These isolate viral nucleic acid sequences by hybridizing target-specific biotinylated probes to the DNA/RNA sample and then separating them using magnetic streptavidin-coated beads. Such methods, however, are limited by the efficiency of the capture step because of the kinetics of nucleic acid hybridization in complex samples such as those containing the human genome. The complete hybridization of all probes to targets can take hours (typical protocols suggest a 24-h incubation, although shorter times may be possible) and may never be achieved because of competitive binding by the host DNA. These methods suffer from a coverage bias, which worsens at lower viral abundances, resulting in increasingly incomplete genomes, as demonstrated by recent work on the Zika virus43. They work best on samples with higher viral abundances and may not have the sensitivity to generate near-complete genomes for the majority of isolates in an outbreak. Probes for hybridization capture are also more expensive than PCR primers because they are usually designed in a fully overlapping 75-nt scheme, which can run to hundreds of probes per virus and thousands for panels of viruses.
Direct sequencing of RNA has been recently demonstrated on the Oxford Nanopore MinION44,45. This method is attractive because it eliminates the need for reverse transcription, and so potentially may reduce biases resulting from nonrandom priming and copying errors introduced by reverse transcriptase. However, this method currently requires 500 ng of RNA as starting material and would suffer from the same sensitivity issues associated with cDNA metagenomics approaches when applied to samples containing very low viral copy numbers.
Limitations of tiling amplicon sequencing
Our method is not suitable for the discovery of new viruses or for sequencing highly diverse or recombinant viruses because primer schemes are virus-genome-specific. This protocol has not been validated for discovery of intra-host nucleotide variants, and we expect that minor allele frequencies will not be reliably recovered when amplifying from very small amounts of starting virus, as shown by Metsky et al.25. We expect that this method will work for larger virus genomes, but we have not tested this protocol with viral genomes longer than 12 kb. The protocol is designed for infections resulting from single clones, and may not perform well with mixed infections of diverse viruses. We have not tested performance of the method in chronic infections in which large amounts of diversity may have evolved within a patient (for example, viral quasispecies during HIV infection). Amplicon sequencing is prone to coverage dropouts that may result in incomplete genome coverage, especially at lower abundances, and the loss of both 5′ and 3′ regions that fall in regions not covered by primer pairs. Sequencing of complete 5′- and 3′-UTR regions may require alternative techniques such as RACE46. Targeted methods are also highly sensitive to amplicon contamination from previous experiments. Extreme caution should be taken to keep pre-PCR areas, reagents and equipment free of contaminating amplicons.
Description of the protocol. We describe a fully integrated end-to-end protocol for rapid sequencing of viral genomes directly from clinical samples. The protocol proceeds in four stages: (i) multiplex primer pool design, (ii) multiplex PCR, (iii) sequencing on MinION or Illumina instruments and (iv) bioinformatic analysis and quality control (QC) (Fig. 1).
Primer design. We developed a web-based primer design tool called Primal Scheme (http://primal.zibraproject.org), which provides a complete pipeline for the development of efficient multiplex primer schemes. Each scheme is a set of oligonucleotide primer pairs that generate overlapping products, the size of which is determined by the target genome length, amplicon length and overlap required, as discussed below. For Zika, we use 35 primer pairs, amplifying products of ∼400-nt length with a 100-nt overlap for the ∼11-kb viral genome. Together, the amplicons generated by the pairs span the target genome or region of interest (Fig. 2).
As input, Primal Scheme requires a FASTA file containing one or more reference genomes. The user specifies a desired PCR amplicon length (default = 400 nt, suggested values between 200 and 2,000 nt) and the desired length of overlap between neighboring amplicons (default = 75 nt). Using a shorter amplicon length may be useful for samples in which longer products fail to amplify (e.g., when the virus nucleic acid is highly degraded). However, if amplicon lengths become too short (e.g., <300 nt), it may not be possible to find suitable primer pairs; reducing the overlap parameter may help with this.
The Primal Scheme software performs the following processes:
Generation of candidate primers: The first sequence listed in the FASTA file should be the most representative genome, with further sequences spanning the expected interhost diversity. Primal Scheme uses the Primer3 software to generate candidate primer pairs (five, by default)47. It selects primers based on thermodynamic modeling, which takes into account length, annealing temperature, %GC, 3′ stability, estimated secondary structure and likelihood of primer–dimer formation, maximizing the chance of a successful PCR reaction. Primers are designed with a high annealing temperature within a narrow range (65–68 °C) that allows PCR to be performed as a 2-step protocol (95 °C denaturation, 65 °C combined annealing and extension) for highly specific amplification from clinical samples without the need for nested primers.
Testing of candidate primers: Subsequent reference genomes in the file are used to help choose primer pairs that maximize the likelihood of successful amplification of known virus diversity. A semi-global alignment score between each candidate primer and all supplied references is calculated to ensure that the most 'universal' candidate primers are picked for the scheme. Mismatches at the 3′ end are severely penalized, as they have a disproportionate effect on the likelihood of successful extension48,49. The alignment scores are summed, and the single best-scoring pair for each region is selected. If no candidates are returned by Primer3 for a region, most likely because all primers had insufficient annealing temperature, an error message prompting you to adjust the amplicon length or the overlap parameter will appear.
Output of primer pairs: Output files include a table of primer sequences to be ordered, a BED file of primer locations that can be used subsequently for primer trimming and a diagram of the primer scheme.
Choice of amplicon length. The choice of amplicon length when designing primer pools for sequencing is important. There is an inverse relationship between amplicon length and the number of primer pairs. It is believed that increasing the number of primer pairs reduces the likelihood of successful amplification of each region, owing to interaction between primers18. It is plausible that as the number of primer pairs increases, competitive inhibition may decrease PCR efficiency, although the high annealing temperature used in this protocol should reduce this risk. Longer amplicons are preferred, as they mean fewer primer pairs are needed per reaction. They also increase the amount of linkage information that can be recovered as haplotypes, which is of importance for investigation of within-host diversity. On the Illumina platform, 600 bases is the maximum size of amplicon that can be obtained using this protocol without an additional fragmentation step (using 600 cycle kits in paired-end mode—i.e., paired 300 nucleotides without any overlap), although read accuracy may degrade during the last 50 cycles. On the Oxford Nanopore MinION, there is no limit to the maximum amplicon length that can be sequenced; the maximum length is effectively limited by the performance of the reverse transcription and PCR (practically to ∼5 kb). However, longer amplicons are less likely to amplify successfully when viral copy number is low or there is sample degradation (e.g., because of inadequate storage).
Optimization of primer schemes. The majority of primers are expected to work even when pooled in equimolar amounts, meaning largely complete genomes can be recovered without optimization. For example, the chikungunya virus data shown in Table 2 were generated without any optimization. However, to achieve coding-sequence-complete genomes, problem primers causing inefficient amplification of certain regions may need to be replaced or their concentrations adjusted relative to other primers in an iterative manner. Complete coverage of the genome covered by the scheme—i.e., all amplicons successfully amplified—should be achievable for the majority of samples using this protocol; however, coverage is still expected to correlate with viral abundance (Table 3).
Multiplex PCR Protocol. Next, we developed a multiplex PCR protocol using novel reaction conditions: specifically low individual primer concentrations, high primer annealing temperatures (>65 °C) and long annealing times, which allows amplification of products covering the whole genome in two reactions (Fig. 3). In comparison with single-plex methods, this markedly reduces the cost of reagents and minimizes potential sources of laboratory error. We assign alternate target genome regions to one of two primer pools, so that neighboring amplicons do not overlap within the same pool (which would result in a short overlap product being generated preferentially). By screening reaction conditions based on the concentration of cleaned-up PCR products and specificity as determined by gel electrophoresis, we determined that lower primer concentrations and a longer annealing/extension time were optimal. Given the low cost of the assay, this step could also be performed alongside standard diagnostic quantitative PCR as a quality control measure to help reveal potential false positives50.
Sequencing protocol optimizations. Optimized library preparation methods for both the MinION and Illumina MiSeq platforms are provided and should be readily adaptable to other sequencing platforms, if required. The MinION system is preferred when portability and ease of setup in harsh environments are important5. The Illumina platform is more suited to sequencing very large number of samples, because of greater sequence yields, and the ability to barcode and accurately demultiplex hundreds of samples. Both platforms use ligation-based methods to add the required sequencing adaptors and barcodes.
For the MinION, we used the native barcoding kit (Oxford Nanopore Technologies) to allow up to 12 samples to be sequenced per flow cell. As the manufacturer's protocol is designed for 6–8 kb of fragmented genomic DNA, we have adjusted the input mass to achieve an equivalent number of moles of DNA ends; this improves the efficiency of barcode/adaptor ligation and improves run yields. In the development of the protocol, we used R9 or R9.4 flow cells (FLO-MIN105/FLO-MIN106) and the 2D barcoded library preparation kit (EXP-NBD002/SQK-LSK208). The protocol is also compatible with the current 1D barcoded library preparation kit (EXP-NBD103/SQK-LSK108). Because of the regular revisions of the kits, we have avoided including any specific component names or volumes; be sure to follow the appropriate protocol for your chosen kit version. Depending on the number of reads required, the number of samples multiplexed and the performance of the flow cell, sequencing on the MinION can take from a few minutes up to 72 h. Typically, 2–4 h of sequencing is sufficient for 12 samples. For the MiSeq platform, we used the Agilent SureSelectxt2 adaptors and the KAPA Hyper library preparation kit, allowing up to 96 samples to be sequenced per MiSeq run. Other library prep kits (e.g., Illumina TruSeq) and dual-indexed adaptors could also be used on the MiSeq. For the MiSeq, we recommend using the 2 × 250-nt read-length for 400-nt amplicons, which takes 48 h to complete.
Bioinformatics workflow; MinION pipeline. We developed bioinformatic pipelines consisting of primer trimming, alignment, variant calling and consensus generation for both the Oxford Nanopore and Illumina platforms. The MinION pipeline was developed by building upon tools previously developed for Ebola virus sequencing in Guinea and is freely available with components developed under the permissive MIT open source license at https://github.com/zibraproject/zika-pipeline. The pipeline runs under the Linux operating system and is available as a Docker image, which means that it can also be run on Mac and Windows operating systems. The MinION version of the pipeline can process the data from basecalled reads to consensus sequences on the instrument laptop, given the correct primer scheme (a BED file).
FAST5 reads containing raw nanopore signal data may be basecalled in real time using MinKNOW (accessible via the MinION Community Portal for registered users at http://community.nanoporetech.com) or off-line using Albacore. Albacore is a recurrent neural network (RNN) basecaller developed by Oxford Nanopore Technologies and also made available through the MinION Community Portal. Reads are extracted into a FASTA file using the
poretools fasta command. This FASTA file may be demultiplexed by a script, demultiplex.py, into separate FASTA files for each barcode, as specified in a config file. By default, these are set to the barcodes NB01–12 from the native barcoding kit. Alternatively, the Metrichor online service (https://www.metrichor.com) and versions of Albacore 1.0.1 or later may be used to basecall read files and demultiplex samples. Each file is then mapped to the reference genome using
bwa mem using the
-x ont2d flag and converted to BAM format using
samtools view. Alignments are preprocessed using a script (
align_trim.py) that performs primer trimming and coverage normalization. Primer trimming is performed by reference to the expected coordinates of sequenced amplicons, and therefore requires no knowledge of the sequencing adaptor (Fig. 3). Signal-level events are aligned and variants are called using
nanopolish variants. Low-quality or low-coverage variants are filtered out and consensus sequences are generated using a script,
margin_cons.py. Variant calls and frequencies can be visualized using
Bioinformatics workflow; Illumina pipeline. First, we use Trimmomatic51 to remove primer sequences (first 22 nt from the 5′ end of the reads) and bases at both ends with Phred quality scores <20. Reads are aligned to the genome of a Zika virus isolate from the Dominican Republic, 2016 (GenBank: KU853012), using Novoalign v3.04.04 (http://www.novocraft.com/support/download/). SAMtools is used to sort the aligned BAM files and to generate alignment statistics52. The code and reference indexes for the pipeline can be found at https://github.com/andersen-lab/zika-pipeline. Snakemake is used as the workflow management system53.
Alignment-based consensus generation. We have used an alignment-based consensus approach to generate genomes as opposed to de novo assembly. Although de novo assembly could in theory be used with this protocol, the use of a tiling amplicon scheme already assumes that the viral genome is present in a particular fixed order. This assumption may be violated in the presence of large-scale recombination. Some de novo assemblers, such as SPAdes, use a frequency-based error correction preprocessing stage, and this may result in primer sequences being artificially introduced into the reference if primer sequences are not removed in advance54. Importantly, when we compared alignment with de novo-based analysis methods for our generated Zika virus genomes, we found that we always obtained the same consensus sequences.
Preparing sequencing controls. We recommend that positive sample controls be included in each sequencing run. To check that the protocol is generating the expected results, we recommend choosing a positive sample with an established, trusted reference sequence. For the Zika virus, we used the previously sequenced World Health Organization reference strain PF13/251013-18 (GenBank accession: KX369547), which can be obtained on request from the Paul-Ehrlich-Institut55,56. Sample archives such as the National Collection of Pathogenic Viruses in the United Kingdom can provide high-quality reference materials for other viruses. Positive controls should have viral copy numbers similar to those of the clinical samples on the same run. This may require the positive control to be heavily diluted until the Ct values are comparable. Negative sequencing controls should be processed in a manner as similar as possible to that used for clinical samples and should not be simply water controls; for example, if samples are collected by swabs, then the same type of unused swab should be subjected to RNA extraction and PCR. Additional negative water controls may be added at each step (e.g., reverse transcription, PCR and library preparation) to detect the sources of contaminants. Even if amplification is not detected (e.g., by gel electrophoresis) or DNA quantity is low or undetectable by fluorimetry, a sequencing library should still be prepared as normal using the total available amount, as contamination may still be detectable by sequencing.
Contamination. Cross-contamination is a serious potential problem when working with amplicon sequencing. Contamination risk is minimized by maintaining physical separation between pre- and post-PCR areas, and performing regular decontamination of work surfaces and equipment—e.g., by UV exposure or with 1% (vol/vol) sodium hypochlorite solution. Contamination becomes harder to mitigate with decreasing viral copy numbers. Processing high-viral-count samples can lead to overamplification during PCR (e.g., generation of unnecessarily high numbers of amplicons), which can increase the risk of amplicon contamination in subsequently processed samples with low viral counts. Such 'between-sample amplification' can occur during sequencing library preparation, or may result from barcode misidentification or 'barcode hopping' (incorporation of incorrect barcode sequences during sequence library preparation) during sequencing. When determining how many PCR cycles to use, begin with a lower number and increase gradually to minimize this contamination risk.
The best safeguard for helping to detect contamination is the use of negative controls. These controls should be sequenced even if no DNA is detected by quantification or no visible band is present on a gel. Negative control samples should be analyzed through the same software pipeline as is used for the other samples, and you should assume that any contaminating amplicons in the negative control will also be present in your other samples. The relative number of reads as compared with positive samples gives a simple guide to the extent of contamination, and inspection of coverage plots can help identify any specific region involved.
Tiling amplicon generation
Clinical sample (serum, plasma, urine) or isolate
QIAamp Viral RNA Mini Kit (Qiagen, cat. no. 52906)
Random hexamers (50 μM; Thermo Fisher Scientific, cat. no. N8080127)
Protoscript II First Strand cDNA Synthesis Kit (NEB, cat. no. E6560)
dNTP solution mix (NEB, cat. no. N0447)
Q5 Hot Start High-Fidelity DNA Polymerase (NEB, cat. no. M0493)
Agencourt AMPure XP (Beckman Coulter, cat. no. A63881)
Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, cat. no. Q32854)
HyPure Molecular Biology Grade Water (GE Life Sciences, cat. no. SH30538.01)
EB buffer (10 mM Tris-Cl, pH 8.5; Qiagen, cat. no. 19086)
TE buffer (10 mM Tris-Cl, 1 mM EDTA, pH 8.0; Sigma-Aldrich, cat. no. 93283-500ML)
Ethanol, absolute (Thermo Fisher Scientific, cat. no. BP28184)
Gel Loading Dye, Purple (6×) (NEB, cat. no. B7024)
100-bp DNA Ladder (NEB, cat. no. N3231)
SeaKem LE Agarose (Lonza, cat. no. 5000)
10× TBE Buffer (Lonza, cat. no. 50843)
SYBR Safe DNA Gel Stain (Thermo Fisher Scientific, cat. no. S33102)
MinION Flow Cell (Oxford Nanopore Technologies, cat. no. FLO-MIN106)
Nanopore Sequencing Kit (Oxford Nanopore Technologies, cat. no. SQK-LSK108)
Native Barcoding Kit (Oxford Nanopore Technologies, cat. no. EXP-NBD103)
NEB Next Ultra II End-repair/dA-tailing Module (NEB, cat. no. E7546)
NEB Blunt/TA Ligase Master Mix (NEB, cat. no. M0367)
KAPA Hyper Prep Kit (Roche, cat. no. 07962363001)
SureSelectxt2 indexes, MSQ, 16 (Agilent, cat. no. G9622A)
MiSeq Reagent Kit v2 (500 cycle) (Illumina, cat. no. MS-102-2003)
D1000 ScreenTape (Agilent, cat. no. 5067-5582)
D1000 Reagents (Agilent, cat. no. 5067-5583)
KAPA Library Quantification Kit for Illumina platforms (Roche, cat. no. 07960140001)
Filtered pipette tips
1.5-ml microcentrifuge tube (Eppendorf, cat. no. 0030 108.051)
0.2-ml strip tubes with attached caps (Thermo Fisher Scientific, cat. no. AB2000)
UV spectrophotometer (Thermo Fisher Scientific NanoDrop 2000, cat. no. ND-2000)
96-well thermocycler (Applied Biosystems Veriti, cat. no. 4375786)
Benchtop microcentrifuge (Thermo Fisher Scientific mySPIN 6, cat. no. 75004061)
Benchtop heater/shaker (Eppendorf ThermoMixer C)
Magnetic rack (Thermo Fisher Scientific DynaMag-2, cat. no. 12321D)
PCR cabinet or pre-PCR room
MinION (Oxford Nanopore Technologies, cat. no. MinION Mk1B)
Laptop with solid-state disk (SSD) drive
TapeStation 2200 (Agilent)
Mini-Sub Cell GT (Bio-Rad, cat. no. 1704406)
PowerPac Universal Power Supply (Bio-Rad, cat. no. 1645070)
Design and ordering of primers
Timing: 1 h
(Optional) Identify representative reference sequences (e.g., from public databases such as GenBank) and generate a primer scheme using Primal Scheme by visiting http://primal.zibraproject.org; see EXPERIMENTAL DESIGN section for further information. Alternatively, predesigned primer schemes are provided in Supplementary Tables 1 and 2.
Extraction and preparation of nucleic acid template
Tiling amplification is a general technique that can be applied to DNA or to cDNA generated from RNA by reverse transcription. Use option A to extract viral RNA from samples and prepare cDNA by reverse transcription for analysis of RNA viruses, or use option B to extract viral DNA from samples for analysis of DNA viruses:
RNA extraction and preparation of cDNA for analysis of RNA viruses • TIMING 2 h
Extract RNA from 200 μl of serum, plasma or urine using the QIAamp Viral RNA Mini Kit according to the manufacturer's instructions, eluting in 50 μl of EB buffer.
Measure the absorption spectra using a spectrophotometer. Pure RNA should have a 260/280 ratio of 2.0 and a 260/230 ratio of 2.0–2.2.
Wash all surfaces with 1% (vol/vol) sodium hypochlorite solution and irradiate labware with UV light for at least 10 min.
Mix the following components in a 0.2-ml tube:
Component Amount (μl) Final concentration Template RNA (from Step 3A(i)) 7 Random hexamers, 50 μM 1 2.5 μM
Denature the template RNA by incubating it on a heat block at 65 °C for 5 min before promptly placing it on ice.
Complete the cDNA synthesis reaction preparation by adding the following to the tube:
Component Amount (μl) Final concentration ProtoScript II Reaction mix (2×) 10 1× ProtoScript II Enzyme Mix (10×) 2 1×
Place the tube in a thermocycler and run the following program:
Cycle number Condition 1 25 °C, 5 min 1 48 °C, 15 min 1 80 °C, 5 min
DNA extraction and preparation for analysis of DNA viruses • TIMING 1 h
Extract DNA from 200 μl of serum, plasma or urine using the QIAamp MinElute Virus Spin Kit, according to the manufacturer's instructions.
Measure the absorption spectra using a spectrophotometer. Pure DNA should have a 260/280 ratio of 1.8 and a 260/230 ratio of 2.0–2.2.
Preparation of the primer pools
Timing: 1 h
(Optional) Resuspend lyophilized primers by prespinning tubes to make sure that the pellet is at the bottom of the tube and adding TE buffer to a concentration of 100 μM. If primers were ordered prediluted to 100 μM, continue to the next step.
Label two 1.5-ml Eppendorf tubes using the scheme and pool, which are numbered as either '1' or '2'; primers for adjacent regions are added to alternate pools so that individual reactions overlap between pools but not within. Add an equal volume of each 100 μM primer stock such that both the forward and reverse primers for alternate regions are pooled together. For example, Pool '1' for a ZikaAsian scheme would contain ZIKA_400_1_LEFT, ZIKA_400_1_RIGHT, ZIKA_400_3_LEFT, ZIKA_400_3_RIGHT, ZIKA_400_5_LEFT, ZIKA_400_5_RIGHT and so on. Dilute these at a ratio of 1:10 with nuclease-free water to a working concentration of 10 μM.
Performing of multiplex tiling PCR
Timing: 5 h
In Eppendorf tubes, prepare a mastermix for each of the 2 primer pools, as follows.
Component Amount (μl) Final concentration Q5 reaction buffer (5×) 5 1× dNTPs, 10 mM 0.5 200 μM Q5 DNA polymerase 0.25 Primer pool 1 or 2 (10 μM) Variable 0.015 μM per primer PCR-grade water Up to 22.5 μl (assuming 2.5 μl of cDNA template will be added in Step 7)
Mix thoroughly by vortexing and spin down in a microcentrifuge.
Label 0.2-ml PCR tubes and add 22.5 μl of mastermix from Step 6 to each tube. If using cDNA from Step 3A(vii) as template, add 2.5 μl of cDNA to each tube. If you are using extracted DNA from Step 3B(ii), however, a larger volume of template (up to 10 μl) can be added, if required, and may improve amplification efficiency.
Place in a thermocycler and run the following program:
Cycle number Denature Anneal/extend 1 98 °C, 30 s 2–40 98 °C, 15 s 65 °C, 5 m
Cleanup and quantification of amplicons
Timing: 2 h
Transfer the contents of the tubes to 1.5-ml Eppendorf tubes. Add the volume of AMPure XP beads given in the table below, taking into account amplicon length. Perform washes following the instructions in the 1D barcoding protocol and elute in 30 μl of EB buffer.
Amplicon length (bp) Ratio Volume of beads (μl) for a 25-μl PCR reaction <500 1.0× 25 500–1,000 0.8× 20 >1,000 0.6× 15
Quantify 1 μl of the cleaned products using the Qubit instrument with the high-sensitivity assay per the manufacturer's instructions. You should expect concentrations in the range of 5–50 ng/μl for each reaction from the Qubit quantification, except for the PCR negative control, which should be repeated if >1 ng/μl.
(Optional) Make a gel by melting 1% (wt/vol) agarose powder in 1× TBE buffer and then adding 1× SYBR Safe gel stain before allowing it to set. Place in a gel tank submerged in 1× TBE buffer. Mix 10 μl of cleaned product from Step 9 or a ladder with 2 μl of 6× loading dye and load on the gel. Perform electrophoresis at 6 V/cm until bands are distinguishable by transillumination. A specific band of the correct size for your scheme should be observed.
Library preparation and sequencing
Perform library preparation and sequencing; these procedures are platform specific and have been validated on the MinION from Oxford Nanopore Technologies (option A) and on the MiSeq from Illumina (option B).
Library preparation and sequencing using the MinION • TIMING 1–2 d
Determine the number of samples per flow cell. We recommend using two barcodes per sample or negative control (one barcode per pool per sample) initially. This means that up to five samples and one negative control can be sequenced on each flow cell, and it allows each pool to be barcoded individually, making it easier to detect contamination that may be pool- rather than sample-specific. However, a single barcode per sample can also be used to maximize the number of samples per flow cell.
Normalization. Use the table below to determine the quantity of amplicons to load to achieve a total input of ∼0.3 pM per flow cell. Divide the total input quantity by the number of barcodes being used to calculate the quantity per barcode. Keep PCR products separate at this stage; add the appropriate volume of each sample from Step 9 to individual 1.5-ml Eppendorf tubes and then adjust the volume in each Eppendorf to 20 μl with nuclease-free water.
Amplicon length (bp) Input total (ng) 300 60 400 80 500 100 1,000 200 1,500 300 2,000 400 5,000 1,000
End-repair and dA-tailing. For each sample, set up the following end-repair/dA-tailing reaction in a 1.5-ml Eppendorf tube and incubate for 5 min at 20 °C, followed by 5 min at 65 °C. Perform SPRI cleanup by repeating Step 9, eluting in 10 μl of EB buffer.
Component Amount (μl) Normalized amplicons (from Step 12A(ii)) 20 Ultra II End Prep Reaction Buffer 2.8 Ultra II End Prep Enzyme Mix 1.2
Barcode ligation. In a 1.5-ml Eppendorf tube, prepare the following ligation reactions—one reaction per barcode being used.
Component Amount (μl) dA-tailed amplicons (from Step 12A(iii)) 10 Native barcode NB01-NB12 2.5 Blunt/TA Ligase Master Mix 12.5
Incubate at room temperature (20 °C) for 10 min, followed by 65 °C for 10 min to denature the ligase.
Pool barcoded amplicons. Combine all the barcode ligation reactions into a single 1.5-ml Eppendorf tube. Perform SPRI cleanup by repeating Step 9 and elute in 30 μl of nuclease-free water.
Barcoding adaptor ligation. In a 1.5-ml Eppendorf tube, prepare the adaptor ligation reaction according to the native barcoding kit protocol. As Blunt/TA ligase is a 2× master mix, we have reduced the elution volume in the previous step to accommodate this.
Library cleanup and elution. Complete library construction according to the native barcoding kit protocol.
Library loading. Perform library loading according to the native barcoding kit protocol; there is a helpful video guide to loading the library on the MinION Community Portal (http://community.nanoporetech.com).
Start sequencing run. By default, you will need an Internet connection before the sequencing script can be started, although off-line versions of MinKNOW can be requested from the manufacturer if an Internet connection is not available. Once the flow cell is detected, enter an experiment name and the flow cell ID into the blank fields and then choose the appropriate sequencing script for the library preparation kit and flow cell version.
Library preparation and sequencing using the MiSeq • TIMING 3 d
Determine the number of samples per flow cell; we recommend using two barcodes per sample, which means up to 47 samples plus a negative control can be sequenced on each run and allows each pool to be barcoded individually. This makes it easier to detect contamination that may be pool- rather than sample-specific. It also results in greater yield per sample, which improves genome coverage in samples with more uneven amplification.
Normalization. Keep pools in individual 1.5-ml Eppendorf tubes, add 50 ng of sample material from Step 9 and add nuclease-free water to adjust the total volume to 50 μl.
End-repair and dA-tailing. Perform end-repair and dA-tailing according to the Hyper Prep Kit protocol (KAPA).
Library preparation. Complete library construction with the KAPA Hyper Prep Kit according to the manufacturer's instructions, substituting the KAPA adaptors for SureSelectxt2 indexing adaptors in the adaptor ligation step. Perform a 0.8× instead of 1× SPRI cleanup during the postamplification cleanup to remove potential adaptor–dimers.
Library quality control. Measure the size distribution of the library using the TapeStation 2200 according to the manufacturer's instructions.
Library pooling. Calculate the molarity of each library using the KAPA Library Quantification Kit according to the manufacturer's instructions and pool libraries in an equimolar manner.
Library denaturation and dilution. Prepare library for loading onto the MiSeq according to the manufacturer's instructions.
Start sequencing run. Generate a SampleSheet.csv file with the Illumina Experiment Manager software by entering sample and barcode information. Complete the instrument setup and start the sequencing run according to the manufacturer's recommended instructions.
Basecalling and demultiplexing. Basecalling and demultiplexing will be performed automatically on the instrument using the sample information provided in the SampleSheet.csv file.
Timing: 1–2 h
Download the Docker application for Linux, Macintosh or Windows from https://www.docker.com/products/overview. Run the installer to set up the Docker tools on your machine. You should now be able to open a terminal window and run the command
docker --versionwithout getting an error.
Download the Zika pipeline image from DockerHub by typing
docker pull zibra/zibrainto the terminal window.
Start a Docker container with the following command:
docker run -t -i zibra/zibra:latest
By default, Docker containers do not have access to the file system of the computer they run within. You will need to provide access to a local directory in order to see data files. This is achieved using the
-vparameter. You may need to grant access to Docker to share the drive via the 'Shared Drives' menu option under 'Settings'. For example, on Windows, if you wish to provide access to the c:\data\reads directory to the Docker container, use the following:
docker run -v c:/data/reads:/data -t -i zibra/zibra:latest /bin/bash
Then, within the Docker container, the
/datadirectory will refer to
c:\data\readson the Windows machine.
Run the platform-specific pipeline using option A for MinION data or option B for MiSeq data:
Running analysis pipeline on MinION data
Ensure that the reads are basecalled using either Metrichor or an off-line basecaller. Compatible off-line basecallers include Albacore (available as installable packages for Linux, Windows and Macintosh through the MinION Community Portal) or the freely available and open-source nanonet (https://github.com/nanoporetech/nanonet) software. nanonet is compatible with graphics processing unit cards to increase speed.
Metrichor will perform demultiplexing if a barcoding workflow is selected. For other basecallers you may need to demultiplex reads manually. To do this, run the script that is provided within the Docker image with the command:
demultiplex <directory of FAST5 Files> <output directory>
Run the Zika pipeline using the following command:
fast5_to_consensus <scheme> <sampleID> <directory>
The pipeline takes three required items:
sample_id—the sample name (should not contain space characters)
directory—the directory containing the FAST5 files for a single sample (e.g., demultiplexed output directory from Step 16A(ii))
scheme—the name of the scheme directory—e.g., ZikaAsian
fast5_to_consensus ZikaAsian Zika1 /data/NB08/downloads/pass
Output files will be written to the current directory. The final consensus file will be named
Running analysis pipeline on Illumina data
Download and follow the instructions for the Illumina pipeline by referring to https://github.com/andersen-lab/zika-pipeline and using the following command:
illumina_pipeline <sampleID> <fastq1> <fastq2> <scheme>
Timing: 1 h
Check the coverage of the genomes by reference to the alignment file. Use an alignment viewer such as IGV57 or Tablet58 and load the
<sampleID>.primertrimmed.sorted.bamfile in conjunction with the reference sequence. Amplicons should be evenly spread throughout the genome. Deep piles of reads representing amplification of single regions are potential warning signs of contamination. Compare the alignments with the positive and negative control alignments to help indicate problematic samples or regions.
Use the variant frequency plot produced by the Zika pipeline to help determine the allele frequency of mutations in the sample (as compared with the reference). The variant frequency plot is given the name
<sampleID>.variants.pngand is generated from the
<sampleID>.variants.tabfile that can be opened in spreadsheet applications or statistical software. The principle of the variant frequency plot is to identify mutations that occur at lower-than-expected allele frequencies and help decide whether they are a biological phenomenon (e.g., intra-host single-nucleotide variants), potential signs of contamination or sequencing errors (for example, in homopolymeric tracts in MinION data).
Troubleshooting advice can be found in Table 4.
Steps 1 and 2, design and ordering of primers: 1 h
Step 3A, RNA extraction and preparation of cDNA: 2 h
Step 3B, DNA extraction: 1 h
Steps 4 and 5, preparation of the primer pools: 1 h
Steps 6–8, performing of multiplex tiling PCR: 5 h
Steps 9–11, cleanup and quantification of amplicons: 1 h
Step 12A, library preparation and sequencing using the MinION: 1–2 d
Step 12B, library preparation and sequencing using the MiSeq: 3 d
Steps 13–16, data analysis: 1–2 h
Steps 17 and 18, quality control of consensus sequences: 1 h
This protocol should achieve near-complete genome coverage.
As a demonstration of the ZikaAsian scheme on MinION, we sequenced the World Health Organization Zika reference sample 11474/1655 (Table 2) and a chikungunya clinical sample from Brazil, PEI-N11602. The Ct value for the Zika virus sample was between 18 and 20 depending on the RNA extraction method used. The Ct value for the Chikungunya sample was 20, as determined by the RealStar Chikungunya RT-PCR Kit 1.0 from Altona Diagnostics (Hamburg, Germany). The Zika virus sample generated 97.7% coverage of the genome above 25× coverage. Coverage of the genome was reasonably even, with a dropout in the middle of the genome (Fig. 4). The WHO Control Reference MinION data set is available from the CLIMB website (https://s3.climb.ac.uk/nanopore/Zika_Control_Material_R9.4_2D.tar).
We compared metagenomic sequencing with the ZikaAsian scheme with the Illumina MiSeq protocol using five clinical samples of Zika from Colombia. Using a previously described method for metagenomics sequencing2,17, only a small percentage (<0.01%) of our reads aligned to Zika virus and they covered only a fraction of the genome (Table 1). Using the ZikaAsian scheme, we were able to generate high coverage of all the genomes (Table 3). Illumina sequencing reads are available from BioProject PRJNA358078 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA358078).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
NCBI Reference Sequence
The authors thank the Brazilian Ministry of Health and the Latin American Community Engagement Networks (LACENs) of Natal, João Pessoa, Recife, Maceió and Salvador for their support. We thank T. Fredeking from Antibody Systems for providing the Zika virus samples from Colombia. We thank K. Brunker for testing the Primal Scheme software. The Zika in Brazil Real-time Analysis (ZiBRA) project (http://www.zibraproject.org) is supported by the Medical Research Council/Wellcome Trust/Newton Fund Zika Rapid Response Initiative (grant no, ZK/16-078), which also provides J.Q.'s salary. N.J.L. is supported by a Medical Research Council Bioinformatics Fellowship as part of the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project. Primal Scheme is hosted on the CLIMB platform, where pipeline development and MinION data analysis was performed59. N.D.G. is supported by National Institutes of Health (NIH) training grant 5T32AI007244-33. K.G.A. is a PEW Biomedical Scholar, and his work is supported by NIH National Center for Advancing Translational Studies Clinical and Translational Science Award UL1TR001114 and National Institute of Allergy and Infectious Diseases (NIAID) contract HHSN272201400048C. A.B. and T.B. were supported by NIH awards R35 GM119774 and U54 GM111274. T.B. is a Pew Biomedical Scholar. A.B. is supported by the National Science Foundation Graduate Research Fellowship Program under Grant no. DGE-1256082. N.R.F. was funded by a Sir Henry Dale Fellowship (Wellcome Trust/Royal Society grant 204311/Z/16/Z). Work at the Paul-Ehrlich-Institut was supported by a grant ('Sicherheit von Blut(produkten) und Geweben hinsichtlich der Abwesenheit von Zikaviren') from the German Ministry of Health. This study was supported by USAID Emerging Pandemic Threats Program-2 PREDICT-2 (cooperative agreement AID-OAA-A-14-00102). The contents of this article are the responsibility of the authors and do not necessarily reflect the views of USAID or the US government.
Supplementary Tables 1 and 2.