A MiSeq-HyDRA platform for enhanced HIV drug resistance genotyping and surveillance

Conventional HIV drug resistance (HIVDR) genotyping utilizes Sanger sequencing (SS) methods, which are limited by low data throughput and the inability of detecting low abundant drug resistant variants (LADRVs). Here we present a next generation sequencing (NGS)-based HIVDR typing platform that leverages the advantages of Illumina MiSeq and HyDRA Web. The platform consists of a fully validated sample processing protocol and HyDRA web, an open web portal that allows automated customizable NGS-based HIVDR data processing. This platform was characterized and validated using a panel of HIV-spiked plasma representing all major HIV-1 subtypes, pedigreed plasmids, HIVDR proficiency specimens and clinical specimens. All examined major HIV-1 subtypes were consistently amplified at viral loads of ≥1,000 copies/ml. The gross error rate of this platform was determined at 0.21%, and minor variations were reliably detected down to 0.50% in plasmid mixtures. All HIVDR mutations identifiable by SS were detected by the MiSeq-HyDRA protocol, while LADRVs at frequencies of 1~15% were detected by MiSeq-HyDRA only. As compared to SS approaches, the MiSeq-HyDRA platform has several notable advantages including reduced cost and labour, and increased sensitivity for LADRVs, making it suitable for routine HIVDR monitoring for both patient care and surveillance purposes.

plasmids, and clinical and proficiency specimens. Specimens at VLs of ≥1,000 copies/ml (cp/ml) were consistently amplified for all examined subtypes. The MiSeq-HyDRA protocol identified all HIVDR mutations found by SS, however in contrast to SS, only the MiSeq-HyDRA platform identified LADRVs at frequencies between 1~15%. Plasmid mixtures were used to determine the gross error rate and the detection limit for LADRVs in and outside of homopolymer regions. The MiSeq-HyDRA platform is a favoured alternative to SS for HIVDR genotyping in clinical and surveillance settings because of its increased sensitivity, cost and labour reduction, and superior detection of LADRVs in homopolymer regions.

samples.
A panel of HIV-spiked plasma with known VLs was obtained from the External Quality Assurance Program Oversight Laboratory (EQAPOL, Duke University, USA) and used to test the subtype specificity and sensitivity of our primers. The EQAPOL panel included 2 samples from each of the following subtypes; A1, B, C, D, G, F, CRF01_AE, CRF02_AG. Certificates of Analysis for each EQAPOL sample provided the reference viral load (Roche COBAS methodology). Two commercially prepared plasmids with known HIVDR mutations, P1 (G73T_K103N) and P2 (G73T_K65R) (Genscript, Piscataway, USA), were used to assess the error rate of the assay, as well as the limit of detection for minority variants in a range of P1:P2 mixtures.
Further characterization and validation were performed using a small cohort of anonymized clinical specimens (n = 58) obtained through the Research Ethics Board Exempt Strain and Drug Resistance (SDR) Surveillance Program. These clinical specimens were collected in 2012 and 2013 from treatment-naïve HIV-1 positive patients with unknown VL, and selected for testing on our MiSeq-HyDRA platform to represent a variety of clades and HIVDR mutations. Also included were two panels, each consisting of 5 HIV-1 positive plasma samples with known VL (n = 10), from the Virology Quality Assurance (VQA) Program (Rush University Medical Center, USA), originally acquired for HIVDR genotyping proficiency test. All clinical and panel samples were previously sequenced using an in-house VQA-validated SS protocol and represented all major HIV-1 subtypes including; A, B, C, D, F, G, CRF01_AE, CRF02_AG, CRF06_cpx, CRF12_BF, as well as two A1/D and one G/B recombinant viruses.
HIV RNA extraction. For all samples tested, total nucleic acid was extracted from 400 μl of HIV-1 infected plasma and eluted in 110 ul using the Nuclisens EasyMag system (Biomerieux, St-Laurent, Canada) according to the manufacturer's suggested protocol. The EQAPOL panel of HIV-spiked plasma was serially diluted using normal human plasma (NHP), prior to HIV RNA extraction, to represent a range of VL from 10,000 cp/ml to 50 cp/ml. An extraction efficiency of 90% was estimated from previous data (not shown here) in order to calculate the approximate viral RNA copy number in each RT-PCR reaction. For the clinical SDR specimens and VQA samples, the same RNA extract used for SS was used for preparing amplicons towards sequencing on the MiSeq. sample amplification. RT-PCR was performed on 10 µl of each extract of the EQAPOL, VQA, and clinical samples using Superscript ™ III One-Step RT-PCR Platinum ® Hi-Fidelity Taq System (Thermo Fisher Scientific, Canada), according to the manufacturer's suggested protocol. The primers used were as follows: PR_F1 5′-GARAGACAGGCTAATTTTTTAGGGA-3′ (HXB2 loci 2071-2095) and RT_R1 5′-ATCCCTGCATAAATCTGACTTGC-3′ (HXB2 loci 3348-3370). RT-PCR conditions were performed as follows: 50 °C for 30 minutes, 94 °C for 2 minutes, 40 cycles of 94 °C for 20 seconds, 50 °C for 30 seconds and 68 °C for 90 seconds, and a final extension at 68 °C for 5 minutes. The same RT-PCR conditions were also used to prepare amplicons for Sanger sequencing 30 .
Following RT-PCR, a 5 µl aliquot was transferred to a nested-PCR reaction, performed using Phusion ® Hot Start II Hi-Fidelity DNA Polymerase (Thermo Fisher Scientific, Canada), according to the manufacturer's recommended protocol. The primers used were as follows: PR_F2 5′-CTTTARCTTCCCTCARATCACTCT-3′ (HXB2 loci 2243-2266) and RT_R2 5′-CTTCTGTATGTCATTGACAGTCC-3′ (HXB2 loci 3304-3326 www.nature.com/scientificreports www.nature.com/scientificreports/ as described in the Reference Guide was optimally suited to our purpose. Libraries were pooled in equal volumes and diluted 30× in the provided hybridization buffer. A 20 pM PhiX control library (v3, Illumina, USA) was spiked in at 20%, in order to provide quality control measures and to increase the diversity of the amplicon libraries. Samples were sequenced on the MiSeq using either the v2 500-cycle or v3 600-cycle MiSeq reagent kits (Illumina, USA).
Miseq data analysis. MiSeq data analysis was performed using HyDRA Web (http://hydra.canada.ca), a freely available online automated pipeline for HIVDR analysis of NGS-derived data 24 . The HyDRA pipeline is coded in Python and leverages existing open source bioinformatics software for annotated reference-based mapping and variant calling against the HXB2 pol gene (loci 2253-5096, GenBank Accession number: K03455). The pipeline carries out the multi-step data processing starting from raw NGS data input (in.fastq,.fastq.gz, or.sff formats) to customizable reporting on HIVDR mutations at any defined frequency. Advanced options for data quality assurance, filtering, variant calling, and reporting thresholds are modifiable to customize the analysis.
For our analysis, we used the HyDRA Web default settings for analysis, unless otherwise indicated 24 . Briefly, reads were first filtered by a minimum quality score of Q30 and 100 bp length, mapped using Bowtie2 with the reference sequence (HXB-2 or known plasmid sequences). Variants were called based on a pre-determined error rate of 0.0021 for the MiSeq platform (see Results), a minimum variant quality of Q30, a minimum read depth of 100×, and a minimum allele count of five. HIVDR mutations detected above a 1% frequency were reported based on the default HyDRA Web Mutation Database, which is a combination of the Stanford 2015 list of HIV-1 drug resistance mutations (http://hivdb.stanford.edu), with added annotations from the WHO 2009 list of mutations for surveillance of transmitted HIV drug resistance. HIVDR mutations are reported for reverse-transcriptase and protease using the Stanford classification designations: Major, Accessory or Other ((http://hivdb.stanford. edu). Consensus sequences were generated with thresholds of 20%, 15%, and 10% for genotypic and phylogenetic comparison to corresponding Sanger data. statistical analysis. MiSeq-derived data for the pedigreed plasmids was analyzed for variant frequencies using a local install Galaxy with tools to apply Q30 quality filters to the FASTQ files, Bowtie2 for reference mapping, and an in-house tool for collating variant frequencies to a CSV file for further statistical analysis in Microsoft Excel. Clinical samples sequenced by both Sanger and MiSeq-based assays were analyzed for nucleotide and amino acid percent identity using MEGA6.0 34 .

Results
HIV-1 subtype coverage and viral load limit of detection. The HIV subtype coverage of any given protocol depends on the initial HIV viral RNA extraction and PCR amplification steps that render templates for subsequent HIV genotyping. The subtype specificity and sensitivity of the new protocol presented here were first assessed using an EQAPOL panel of HIV-spiked plasma with known viral load, including eight major HIV-1 subtypes, A1, B, C, D, F, G, CRF01_AE, and CRF02_AG. A plasma dilution series, representing VLs from 10,000 cp/ ml down to 50 cp/ml was processed in three independent assays over a 6-month period performed by 3 different technicians using different reagent lots. The estimated viral RNA copy number per PCR reaction, expected from each plasma dilution, was calculated based on described volumes with an assumed extraction efficiency of 90%.
As shown in Table 1, all examined major subtypes were successfully amplified in the three independent assays to a minimum of 500 cp/ml, except for subtype D, which failed to amplify at ≤1,000 cp/ml in one dilution series. In addition, a subset of clinical samples of unknown viral load previously tested for HIVDR were also examined in this study and all these specimens were successfully amplified and eventually sequenced. HIV-1 subtypes in the clinical subset included subtype A, B, C, D, E, F, G, CRF01_AE, CRF02_AG, CRF50_A1D and, CRF06_cpx. error rate as determined using pedigreed plasmids. To accurately assess the gross error rate associated with the MiSeq-HyDRA platform, including PCR amplification, sequencing and data processing steps, we  Table 1. Successful (+) or unsuccessful (−) PCR amplifications of 8 major HIV-1 subtypes in three independent assays. *Estimated viral RNA copy number in the PCR was calculated based on 90% extraction efficiency.
www.nature.com/scientificreports www.nature.com/scientificreports/ used two commercially prepared plasmids constructed from the PR-RT region of the HXB2 pol gene, each with two DRMs: P1 (G73T_K103N) and P2 (G73T_K65R) (Genscript, USA). Both plasmids were processed using the same nested-PCR, MiSeq library preparation and sequencing protocol in triplicate intra-assays and triplicate inter-assays (see Methods). Using an in-house developed Galaxy pipeline, MiSeq-derived FASTQ files were filtered using the same parameters applied by the HyDRA pipeline; i.e. a minimum Q30 quality and read length of 100, and mapped using Bowtie2 with the known plasmid reference sequence [35][36][37] . All detected discrepancies from the plasmid reference sequences were recorded as errors derived from the workflow.
To calculate the error rate for each plasmid, we first determined the error rate at each nucleotide position (n = 974) by dividing the sum of mismatched variant base calls, insertions, and deletions by the total number of base calls at that locus. The variant frequencies calculated for each nucleotide position were then averaged to obtain an error rate for each plasmid (Supplementary Table S1). The overall gross error rate for the MiSeq-HyDRA platform was then calculated by averaging the error rate obtained from each plasmid and was determined to be 0.21%, with a standard deviation (SD) of 0.01%.
To further assess the suitability of MiSeq-HyDRA platform for HIVDR genotyping, we calculated error rates at HIV DRM codons and at homopolymeric genomic regions. The forty-three PR and RT codons identified as surveillance drug-resistance mutation (SDRM) sites by the Stanford HIVDR database (https://hivdb.stanford. edu/) were taken into account in this analysis. Homopolymeric loci were defined as a succession of three or more identical nucleotides. For the SDRM loci, the mean error rate was determined to be 0.21% (SD = 0.01%), essentially the same as observed for the full plasmid amplicon sequence. The mean error rate at the homopolymeric loci was determined to be 0.24% (SD = 0.01%). Comparable results were obtained when these plasmids were tested in three independent assays conducted by three different technicians, showing excellent reproducibility (Supplementary Table S1).

Sensitivity for minor HIV-1 DRM detection in artificial plasmid mixtures. Artificial mixtures of
the aforementioned P1 and P2 plasmids harbouring distinct DRMs were prepared and used to assess the sensitivity of the MiSeq-HyDRA platform for detecting minor variants in a mixed population. Pure plasmids, P1 (G73T_K103N) and P2 (G73T_K65R), were mixed at various ratios and the resulting plasmid mixtures were used for 3 independent PCR amplifications and library preparations, and then sequenced on a single MiSeq run.
As expected, the G73T variant, which is present in both plasmids, is detected at >99% in the pure plasmids as well as in all the mixtures ( Table 2). In contrast, the frequency of the K103N substitution is detected at >99% in the pure P1 plasmid, and then sequentially decreases as the ratio of P2 increases in the mixtures. The same pattern was observed for K65R derived from P2. When present as the minor variant in the mixture, K103N was consistently detected to a minimum average frequency of 0.53%, but was not detected in any of the replicates at 0.1%. Similarly, the minor variant K65R in P2 was consistently detected at an average minimum frequency of 0.61%, but due to imprecision in the 99.9:0.1 plasmid mixture (possibly due to a combination of mixture quantitation error and sequencing error) the 0.1% frequency was not accurately assessed. More importantly, analysis of all the mixed plasmid sequence data by HyDRA Web revealed no other amino acid substitutions above 0.5% apart from the known expected variants. Based on the observed error rate of 0.21% and the ability of the assay to reliably detect minor variants at >0.5%, the default setting for minor variant reporting by HyDRA Web was conservatively set to 1%.
performance characteristics of the Miseq-HyDRA platform and concordance with sanger consensus sequences on clinical specimens. To further assess the performance characteristics and clinical suitability of this new platform for HIVDR genotyping, we analyzed 68 clinical samples, consisting of a cohort of 58 SDR specimens and two 5-member VQA genotyping panels, using SS and the MiSeq-HyDRA protocol. www.nature.com/scientificreports www.nature.com/scientificreports/ Consensus sequence concordance. To assess the ability of the MiSeq-HyDRA platform to simulate SS reads, we compared the concordance of the NGS consensus sequences generated from HyDRA with the corresponding SS sequences from a cohort of clinical specimens. HyDRA Web allows for consensus sequences to be generated from the MiSeq sequencing data at a user-defined mixed base calling threshold. The percent nucleotide identity between SS and MiSeq-HyDRA consensus sequences was determined using three different thresholds for the MiSeq sequence data: 20%, 15% and 10%. Overall, the concordance is high between the sequencing platforms. The percent nucleotide identity is >99% at all three thresholds, and decreases only marginally as the threshold decreases, from 99.62% with 20% threshold, to 99.58% with 15% threshold, to 99.35% with 10% threshold (Supplementary Table S2). The majority of nucleotide discordance between MiSeq-HyDRA consensus and matching SS sequence was due to either a mixed base call or a component of that mixed base, with only a fraction from complete base changes.
Accuracy. Accuracy in HIVDR genotyping refers to the ability of the assay to detect known HIV DRMs by using conventional SS readouts as a standard benchmark. Using the same RT-PCR amplicon that was used for SS, we compared the HyDRA Mutation Reports for each clinical sample obtained from the MiSeq-HyDRA assay with that of the Stanford Genotypic Resistance Report derived from the same samples using our in-house SS protocol. The HyDRA report accurately identified 100% of the 166 HIV-1 DRMs previously detected by SS, 67 of which were classified as surveillance drug resistance mutations (SDRMs) (Supplementary Table S3). In addition, the MiSeq-HyDRA assay identified another 84 HIV DRMs in this sample cohort, 33 of which classified as SDRMs, at frequencies ranging from 37% down to 1%, which were not identified in the SS data (Supplementary Table S3). The majority of these mutations (n = 76) were detected at frequencies <10%. However, three of these discrepancies between the MiSeq-HyDRA data and SS occurred at frequencies >20% (Table 3), challenging the generally accepted detection limit for SS of 20% 6,10 .
Specifically, sample VQA29-5 (Table 3), taken from the VQA HIV DR Genotyping proficiency panel, was tested by 43 independent laboratories using either an in-house SS-based genotyping protocol or the Viroseq Genotyping System (Abbott). Only 3 out of 43 labs reported the DRM L10I, according to the VQA panel report. Similarly, the clinical sample SDR-89 was sequenced in duplicate by our SS-based method and the DRM K20R was never detected by the RECall analysis software. In contrast, SDR-89 and DRM K20R were also used to validate the inter-assay reproducibility of the MiSeq-HyDRA platform as shown in Table S5. For SDR-112, there was only sufficient material for a single run on each platform, SS and MiSeq. In addition, several other mutations between 9~20% were detected by SS (Table 4), indicating that a 20% threshold consensus sequence is not ideal to use for DRM analysis.  Precision and reproducibility. Precision measures the ability of an assay to generate the same result on replicates of the same sample within a test run (intra-assay variability). In contrast, reproducibility measures the ability of an assay to generate the same result on the same sample in different test runs (inter-assay variability) 28,29 . Both parameters are used to measure the performance of a given assay. To assess the precision and reproducibility of MiSeq-HyDRA on clinical specimens, a subgroup of well-characterized specimens (n = 7) were analyzed in triplicate intra-and inter-assays (see Methods). The concordance between MiSeq-HyDRA consensus sequences at 3 different thresholds: 20%, 15% , and 10% ; and the matching SS sequences were recorded. We observed high concordance (≥99.40%) for both nucleotide and amino acid consensus sequences between the two platforms at all 3 thresholds for both intra-and inter-assay performance (Supplementary Table S4). When comparing consensus sequences with a 20% threshold from MiSeq-HyDRA and their corresponding Sanger sequences, the few discordant nucleotides and amino acids among all examined specimens were all due to mixed base calls. Notably, four of these discrepancies were found in HIVDR-associated codons resulting in two silent substitutions (data not shown), and two DRMs, both of which were identified in specimen SDR-95. The HyDRA report revealed the frequencies of the 2 discrepant HIVDR-related substitutions, L33F in PR and K101R in RT, in SDR-95 replicates (Table 4). Though neither amino acid substitution affected the overall HIVDR profile for that individual, it demonstrates the need for a lower threshold when defining a consensus sequence that encompasses all potentially relevant HIVDR mutations. The precision and reproducibility of the MiSeq-HyDRA assay were further assessed by examining its consistency in detecting the frequency of DRMs. Table 4 illustrates in detail the frequencies associated with LADRVs detected in 4 out of the 7 clinical specimens used in the intra-assay comparison. The remaining 3 samples harboured HIV DRMs at >99% frequencies, with 100% precision in intra-assay triplicates (data not shown). In contrast, a lack of precision for detecting LADRVs <10% is observed in specimens SDR-55, 95, 140, and 143 ( Table 4). The results from inter-assay comparisons of another 7 clinical specimens (Supplementary Table S5), further supports this observation that reproducibility is 100% for mutations present at or greater than 10% frequency. These results indicate that reliable detection of LADRVs begins at frequencies >10%, even with coverage at all sites of 20,000× (data not shown). Given the relatively low average error rate of our MiSeq-HyDRA platform of 0.2%, we do not consider these LADRVs to be false-positives, but rather to be false-negatives when not detected within all replicates. These findings highlight the importance of viral RNA copy number extracted from samples with unknown VL and the effect of PCR bias on variant frequencies. This interpretation is further supported by data obtained from inter-assay experiments (Supplementary Table S5).

Effect of viral load on detection of LADRVs.
To evaluate the minimum VL required to detect LADRVs we analyzed the sequences of HIV-spiked plasma specimens from EQAPOL over a range of serially diluted plasma, representing viral loads of 5000, 500, and 100 cp/ml.  Table 4. Intra-assay precision for LADRVs identified by the MiSeq-HyDRA assay. SDRMs are highlighted in bold. ¶ Mutation frequencies (%) detected in replicates; *Detection by Sanger-based sequencing assay.
www.nature.com/scientificreports www.nature.com/scientificreports/ defined as a frequency below 10%. Adopting a conservative extraction efficiency of 90%, we calculated the approximate viral RNA copy number per PCR reaction expected from each diluted specimen. Following our protocol, a plasma VL of 5000, 500, and 100 cp/ml approximately correspond to 160, 16, and 3 viral RNA copies/ PCR reaction, respectively. Though our test group is small (n = 5), due to the fact that we had a limited number of specimens with both known VL and LADRVs, our results are as anticipated. We detected LADRVs only in the samples with higher VLs.
For example, the 5% variant M230I in the CRF02_AG specimen is only detected when the initial plasma VL is 5000 cp/ml, equivalent to approximately 160 viral RNA copies/PCR reaction. At VLs of 500 cp/ml and lower, this variant would be present at <16 copy/PCR reaction, and as expected, was not detected in this dilution ( Table 5). The same holds true for the other LADRVs with frequencies of 1~4%; these LADRVs are only detected in the samples with VL of 5000 cp/ml with the exception of E138K at 1.08% in the CRF02 AG specimen at 100 cp/ml. This variant is not found in the specimens with higher VLs and when analyzed in the context of PCR copy number, approximately 3 viral RNA copies for that VL, we conclude that this is a false positive result due to intrinsic assay error, and the ND (not detected) in the higher VL specimens is, in fact, a true positive. An additional consideration for evaluating the frequency of LADRVs is the extent of re-sampling during PCR, particularly in the range of 10-20%. In the absence of single genome sequencing methodologies which are not practical for routine HIVDR testing, we have no way of determining if these variants are in fact present at an even lower frequency in the viral quasispecies and to what extent these variants influence clinical outcomes. Therefore, it is important to note that both viral load and PCR re-sampling influence the frequency of HIV DRMs and should be taken into account when analyzing LADRVs.

Discussion
HIVDR testing is recommended for persons newly diagnosed with HIV in order to guide the selection of an initial ART regimen, and also serves to provide surveillance of transmitted HIVDR in the population. In addition, testing for acquired or secondary HIVDR mutations should be performed when patients, adhering to an ART regimen, experience suboptimal virologic response or treatment failure as indicated by an increase in VL to >1000 cp/ml. Recently, there is evidence showing that low abundant drug-resistant variants (LADRVs) in the HIV quasispecies may be clinically significant and influence virologic response to ART 6,10,[13][14][15][16][17] . Current HIVDR testing methods need to be revisited in order to satisfy the requirements of accessible, low-cost testing and an increased ability to detect LADRVs.
NGS technologies allow for ultra-deep sequencing and reliable detection of LADRVs. With one of the lowest error rates and highest read coverage of the current NGS technologies available 18,21 , the Illumina MiSeq platform is the leading candidate for routine HIVDR genotyping. This is evident in the publication of additional studies supporting the use of the MiSeq for HIVDR testing 19,[38][39][40][41][42][43][44] .
One limitation of the use of NGS sequencing platforms is the absolute requirement and knowledge of bioinformatics tools that are essential for NGS data analysis. Many laboratories do not have the computational capacity to process the large amounts of data produced by the MiSeq or any NGS system. Bioinformatic pipelines are needed to ascertain the identity and frequency of HIVDR mutations, in particular, the LADRVs. Though no standardized method of analysis has been developed, there are several pipelines, including open source and commercial, that exist to carry out this analysis 22 . Here we introduce HyDRA Web, a free online application that gives users access to an automated and fully customizable NGS analysis pipeline for HIVDR mutations. HyDRA Web is designed to provide users with the ability to perform analysis of large NGS data sets while shielding them from the computational burden required to do so. Notably, HyDRA accommodates the NGS HIVDR analysis for  www.nature.com/scientificreports www.nature.com/scientificreports/ HIV-1 protease, reverse transcriptase and integrase genes although only the HIV-1 integrase is not covered in this study.
Using the Illumina MiSeq platform combined with the ease of HyDRA Web data analysis, we have developed a robust, high-throughput and cost-effective HIVDR genotyping workflow that outperforms the conventional Sanger-based protocols in both sensitivity and reliability. Following the WHO guidelines for validation of HIVDR genotyping assays 28,29 , we validated our MiSeq-HyDRA platform for cross-clade specificity, viral load and minor variant sensitivity, error rate, accuracy as compared to Sanger sequencing data, and precision and reproducibility for LADRVs.
A robust PCR-based HIVDR assay should readily amplify the main M group subtypes, despite variances in genome sequences across subtypes and within a quasispecies. Dudley et. al. reported on cross-clade HIVDR genotyping assay for PR, RT and IN 38 . However, their protocol requires concentrating the plasma prior to extraction in order to achieve successful amplification, as well as secondary algorithms for touchdown PCR for difficult samples, additional steps which are not desirable for establishing a standardized protocol. In contrast, our assay is effective to 500 cp/ml for all examined major group M subtypes with no modifications required. Although not presented here, this platform is amendable to include IN genotyping which only requires combining PR/RT and IN amplicons prior to library prep 30 HyDRA web already possesses the capacity to analyze IN sequence data if provided in the input Fastq files. In addition, our library prep followed the streamlined Nextera ™ library prep protocol using a bead-based normalization completely satisfactory for generating coverage of at least 20,000× across the PR/RT amplicon region.
With other platforms having issues resolving sequences in homopolymer regions, it was essential to determine the MiSeq-HyDRA platform error rate with a focus in these regions. Sequence analysis of pedigreed plasmid amplicons demonstrated low error rates in the homopolymer regions which were consistent with the overall average error rate of 0.21%. In addition, MiSeq-HyDRA demonstrated low error rate in known SDRM-associated codons which were also consistent with the overall error rate of 0.21%.
LADRVs present at levels as low as 1% in quasispecies infections, can have a detrimental impact on treatment efficacy 6,10,16,17 . The MiSeq-HyDRA platform has the ability to detect variants at frequencies as low as 1%, surpassing the typical 20% threshold attributed to Sanger sequencing. Here, we have demonstrated the ability of the MiSeq-HyDRA platform to detect minority variants present as low as 0.5% in mixed plasmid populations, which is consistent with other studies reporting on MiSeq sensitivity 39 .
Accurate detection of LADRVs using NGS is largely dependent on the amount of original template used in the assay. The actual RNA copy number in the PCR is determined by the VL, specimen storage condition, extraction method, and accurate laboratory technique. We have shown the influence of VL on the detection of LADRVs using serial dilutions of specimens with known VL. The ability of the MiSeq-HyDRA platform to correctly identify LADRVs and distinguish between false-positive and false-negative results is directly correlated to the VL of the specimen. In addition, previous studies have shown that in plasma samples with less than 10,000 HIV RNA cp/ml, PCR bias can occur causing the output of ultra-deep sequencing to be reflective of only a small proportion of PCR amplicons, skewing allelic frequencies, and resulting in a misrepresentation of the true variant alleles in the HIV quasispecies 13,45 .
The "one specimen, one sequence" model with SS has long been utilized as the gold standard for HIVDR genotyping and many downstream data mining strategies have been developed based on this paradigm. HyDRA Web fulfills the "one sequence, one specimen" model by allowing for Sanger-like consensus sequences to be generated from the MiSeq sequencing data at a user-defined threshold. Consensus sequences derived at three different mixed-base thresholds (20%, 15%, 10%) from the MiSeq-HyDRA assay have >99% concordance with corresponding Sanger sequences, as well as high intra-assay precision and inter-assay reproducibility. Determining the clinical relevance of minor variants, LADRVs in particular, and the ideal cut-off for reporting HIVDR is critical to defining a threshold for MiSeq-derived consensus sequences.
In addition to producing valid data, the reagent cost and turnaround time to run a test are important when considering its feasibility for use in routine HIVDR genotyping. As compared to SS assay, the MiSeq-HyDRA platform may lead to an approximately 10% reduction of the reagent cost and labour intensity or turnaround time when a batch of 96 specimens are being processed (Supplementary Table S6). The per sample cost could be further reduced by 30% if amplicon sequencing approach is applied, which bypasses the usage of the expensive Nextera XT library kit 46 . Notably, such cost and labour savings could be achieved only when specimens are processed in batches using MiSeq-HyDRA protocol. Hence, rather than individualized clinical testing, Miseq-HyDRA platform is more suited to HIVDR surveillance testing which usually involves large sizes of samples being processed.
In summary, we demonstrate a successful and valid HIV drug resistance genotyping protocol for the Illumina MiSeq to be used in conjunction with our custom data analysis pipeline, HyDRA Web. The protocol we have described here accurately and reproducibly identifies HIVDR mutations comparable with SS results, encompassing a broad subtype coverage to a sensitivity of 500 cp/ml. We have demonstrated that the MiSeq-HyDRA platform performs with a low 0.21% error rate in both homopolymer and SDRM regions and it is capable of detecting minority variants down to 0.5% in a mixed population. Consensus sequences derived from the MiSeq-HyDRA assay have > 99% concordance with corresponding Sanger sequences, as well as high intra-assay precision and inter-assay reproducibility. Using our MiSeq protocol in conjunction with HyDRA Web results in high quality data output which can be used in labs worldwide to produce complete and individualized HIV-1 surveillance drug resistance reports.

Data Availability
All data generated or analyzed during this study are included in this published article and its Supplementary Information Files or would be available upon reasonable request.