A highly sensitive method for the detection of recombinant PERV-A/C env RNA using next generation sequencing technologies

Several xenogenic cell-based therapeutic products are currently under development around the world for the treatment of human diseases. Porcine islet cell products for treating human diabetes are a typical example. Since porcine cells possess endogenous retrovirus (PERV), which can replicate in human cells in vitro, the potential transmission of PERV has raised concerns in the development of these products. Four subgroups of infectious PERV have been identified, namely PERV-A, -B, -C, and recombinant PERV-A/C. Among them, PERV-A/C shows a high titre and there was a paper reported that an incidence of PERV-A/C viremia was increased in diseased pigs; thus, it would be important to monitor the emergence of PERV-A/C after transplantation of porcine products. In this study, we developed a highly sensitive method for the detection of PERV-A/C using next generation sequencing (NGS) technologies. A model PERV-C spiked with various doses of PERV-A/C were amplified by RT-PCR and the amplicons were analysed by NGS. We found that the NGS analysis allowed the detection of PERV-A/C at the abundance ratios of 1% and 0.1% with true positive rates of 100% and 57%, respectively, indicating that it would be useful for the rapid detection of PERV-A/C emergence after transplantation of porcine products.

Scientific Reports | (2020) 10:21935 | https://doi.org/10.1038/s41598-020-78890-2 www.nature.com/scientificreports/ was found in somatic cells of miniature swine, indicating that PERV-A/C could also arise in porcine cells alone without human cell infection [22][23][24][25] . Because PERV-A/C recombinants replicate rapidly in human cells in vitro [18][19][20] and an incidence of PERV-A/C viremia is increased in diseased swine 26 , it would be important to monitor for the appearance of PERV-A/C after xenotransplantation of porcine cell-based therapeutic products 19 , even though PERV infection in humans exposed to living porcine cells has not been reported thus far 7,27 . The PCR assay is considered to be the most sensitive and rapid method for the detection of known microorganisms and there are reliable primers for detecting PERV-A/C recombinants 22,28 . However, since the recombination sites between PERV-A and PERV-C are not fixed 18,20,23 and these viral genome sequences are highly identical, there is a concern about false positive and negative. Therefore, examining the sequence of PCR-amplified products is thought to be more reliable in the case of detecting the recombinants. There are 2 classical methods based on Sanger sequencing used in detecting mutations: direct sequencing, in which the PCR-amplified target DNA region is read directly, and sequencing after cloning, in which the amplified DNA is incorporated into a plasmid, cloned in E. coli, and then the inserted DNA of the plasmid is read. Although the direct sequencing method is simple, it has the disadvantage of low sensitivity because only the majority sequence can be read. On the other hand, the sequencing after cloning requires an increased number of clones for increased sensitivity, which is laborious and time-consuming. Amplicon sequencing (Amplicon-Seq) refers to deep sequencing of PCR products using next generation sequencing (NGS) technologies, and is performed for various applications, such as to detect somatic mutations in one or several exons and introns 29 or to explore microorganisms in samples by sequencing the 16S rRNA 30 . It also has been used to detect minor variants of viral populations [31][32][33] . In this study, we performed Amplicon-Seq analysis using the primers that amplify the entire env region of PERV-C and PERV-A/C and characterized the ability of this analysis to detect the presence of PERV-A/C recombinants.

Results
Data processing. To determine PERV-A/C recombinants, a phylogenetic tree was created from the multiple alignment of the NGS read sequences and PERV env sequences obtained from the NCBI GenBank, with NGS reads being classified by whether or not the read was in the same cluster as previously reported PERV-A/C env sequences (Fig. 1). The RNA sequences of the env of PERV-C (AF038600.1) and PERV-A/C (13653) 23 Figure 1. A phylogenetic tree of PERV subgroups. PERV env sequences obtained from the NCBI GenBank were multiple aligned, and obtained alignment data were used to create the phylogenetic tree using the neighbor-joining method. www.nature.com/scientificreports/ synthesized by in vitro transcription reaction were used as model viruses, and analytic samples were prepared by mixing PERV-C and PERV-A/C at ratios of 1 to 0 (negative control), 1 to 0.1, 1 to 0.01, and 1 to 0.001, which were referred to as "0% PERV-A/C", "10% PERV-A/C", "1% PERV-A/C", and "0.1% PERV-A/C", respectively. The entire env region was amplified by reverse transcription polymerase chain reaction (RT-PCR), and the sequence of the amplified products, which were about 2 kbp, was determined by PacBio single molecule, real-time (SMRT) technology. The primers used in this reaction amplified neither PERV-A nor PERV-B. In our preliminary experiment, RT-PCRs using PERV-A&C-For and PERV-C-Rev as primers also showed no fragment amplification from total mRNA of porcine cell lines PKF (IFO50422), PKR (IFO50423), PKS (IFO50421), and LLC-PK1 (JCRB0060) (Supplementary Fig. S1 online), indicating not only the absence of PERV-C and A/C in the cell lines but also the specificity of the primers. We used the PacBio RS II as the NGS platform, because it is suitable for long-read sequencing. Raw data were initially processed through the PacBio SMRT portal and filtered for a minimum of 6 passes expected for accuracy of 99.9% 34 . Six independent analyses offered approximately 23,000 to 44,000 sequence reads from each sample ( Table 1). As the amount of the reads was too large for multiple alignment analysis, we did some processing to reduce the data. First, reads were filtered for a length of 1500 bp or longer, resulting in a reduction to approximately 7,400 to 36,000. A homology analysis was then performed to further reduce the amount of data. To determine which PERV sequence would be suitable for the query, we first performed the homology analysis among PERV-C and PERV-A/C subgroups obtained from the NCBI GenBank (Table 2). Both global and local alignments were conducted for homology analysis, and both analyses showed that PERV-C was highly conserved among the subgroups, indicating that PERV-C was suitable for the query sequence. Next, a homology analysis was performed between PERV-C and PERV-A/C ( Table 3). The identity between PERV-C and PERV-A/C was 83 to 93% by global alignment and 70 to 93% by local alignment. Therefore, we used PERV-C (AF038600.1) as the query and extracted the NGS reads with 80 to 94% identity of the global alignment or 70 to 94% identity of the local alignment. The read number after global alignment was under 1,000, which was manageable for use in the multiple alignment analysis. On the other hand, because some of the data after local alignment contained more than 1000 reads, these data were divided into several files of under 1,000 reads each.
Specificity and true positive rate of the analysis. Phylogenetic trees were created from the multiple alignment data. Based on these, we determined the spiked PERV-A/C recombinants by whether or not reads were in the same cluster as previously reported PERV-A/C recombinants (see Supplementary Figs. S2, S3, S4, and S5 online). Since PERV-A/C was not detected in the 0% of samples in all experiments, the specificity of this analysis was regarded to be 100%. PERV-A/C at the abundance ratios of 10% and 1% was always detectable, www.nature.com/scientificreports/ regardless of the alignment method ( Table 1). The number of PERV-A/C reads detected after local alignment was significantly more than that after global alignment (p < 0.003, two-way repeated measures analysis of variance and the Student-Newman-Keuls's post-hoc test), indicating that local alignment was more suitable for this analysis. The mean number of the PERV-A/C reads detected in the 0.1% PERV-A/C samples was 0.33 and 0.83 using global and local alignments, respectively. Assuming that the PERV-A/C reads follow the Poisson distribution, the true positive rate of the 0.1% PERV-A/C samples was calculated to be 28% an 57%, when global and local alignments were applied, respectively (Table 4).

Discussions
Although xenotransplantation has been expected to serve as a potential solution to the chronic donor shortage in transplantation, its application has not progressed well due to concerns regarding immunological rejection and transmission of infectious diseases. The problem of immunological rejection has been resolved in recent years with the development of superior immunosuppressive drugs and advances in gene editing technology 35,36 , but there is still much uncertainty about the transmission of infectious diseases caused by known or unknown microorganisms. The prediction of the transmission of animal viruses to humans from in vitro or in vivo studies is difficult, and thus it is important to monitor the infection in the actual transplanted patients. The Amplicon-Seq with the local alignment detected PERV-A/C recombinants at the abundance ratios of 10%, 1%, and 0.1% with true positive rates of 100%, 100%, and 57%, respectively (Table 4). Although highly sensitive PCR methods for the detection of PERV-A/C have already been established 22,28 , there is a possibility that they fail to detect the novel PERV-A/C which possess new recombination sites. Therefore, the Amplicon-Seq analysis in this study  www.nature.com/scientificreports/ could be a complementary method for the early detection of PERV-A/C in patients after xenotransplantation of porcine cell-based therapeutic products. The number of PERV-A/C reads derived from the 1% PERV-A/C samples after global alignment was significantly less than that after local alignment, and the 0.1% PERV-A/C samples showed the same trend. We examined the relationship between the sequence identity and the read number and found that there was a large peak at 40 to 49% in the global alignment (see Supplementary Fig. S6 online). This peak contained about half of total reads, suggesting that the PERV-A/C reads were lost there. Because global alignment compares the end to end sequences, reads which were not read at the full length could be scored as low identity. On the other hand, local alignment compares the most matched portion of the sequences, thus partial reads might have been unaffected by the analysis.
In this study, the Amplicon-Seq analysis using PacBio RS II showed that more than 100 recombinant reads were detected in the 10%, 6 to 49 reads in 1%, and 0 to 2 reads in 0.1% of the samples. The number of detected reads was roughly correlated with the concentration of the samples, suggesting that the first NGS data would need to be increased by a factor of 10 or more in order to detect the recombinants at a ratio of 0.01% or less in the samples. This increase in the obtained data would be possible by using other NGS equipment, such as PacBio Sequel I or II. On the other hand, the increase of data would lead to a heavy burden in analysis. Multiple alignment especially takes time to be calculated and thus it is important to reduce the number of reads before that in order to process the analysis as smoothly as possible using general computers. Sequence clustering, which sorts nucleotide or amino-acid sequences into clusters based on sequence similarity is used as one of the methods to reduce the NGS data 37,38 . Since the global alignment is used for clustering and the homology between PERV-C and PERV-A/C is less than 93%, clustering at 95% similarity or more might reduce the data without affecting the detection sensitivity.
Although PERV-A/C recombinants were detected in 1% of samples in 6 independent experiments, we were unable to detect them when we analysed low-purified RT-PCR products (see Supplementary Table S1 and Figure  S7 online). This result indicated that it is important to set a quality standard of the band purity after electrophoresis to ensure the quality of this analysis. In addition, it is important to clarify the detection limit of the lowest PERV-A/C copy number. However, the primers used in this Amplicon-Seq analysis bind to not only PERV-A/C but also PERV-C, thus the PCR amplified products consist of both PERV-C and PERV-A/C. Since PERV-C mRNA is constantly expressed in cells which possess PERV-C proviruses in their genome, the only abundance ratios contribute to the sensitivity of the analysis.
Amplicon-Seq analysis in this study is assumed to be a complementary method for the detection of PERV-A/C in patients after xenotransplantation of porcine cell-based therapeutic products. Additionally, this approach could be applied not only as a detection method for the emergence of PERV-A/C in porcine organs, tissues, or cells intended for transplantation, but also an option to estimate the amount of other PERVs. There have been several attempts of developing pigs in which their PERVs were inactivated by using genome editing technologies 39,40 . However, the previous and most of ongoing porcine pancreatic islet products were derived from normal pigs 9,27 . Therefore, the importance of the analysis would continue in terms of the safety use and public health.

Materials and methods
Model viruses and generation of PERV env standard. To generate PERV-A env cDNA, total RNA was extracted from 293-PERV-PK-CIRCE (Public Health England) 13 , which have been infected with both PERV-A and -B but not with PERV-C, using an RNeasy Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. RT reaction was conducted by using a ReverTra Ace qPCR RT Kit (TOYOBO, Osaka, Japan) and 5′-CTA GAG GTC AGT TTC TCC TTG GCT CAG AAG GCC-3′ (PERV-A-Rev) which binds to 3′ end region of PERV-A (Y12238) env as the gene specific primer. PCR was then performed by using KOD -Plus-Neo (TOYOBO) and primers of 5′-ATG CAT CCC ACG TTA AGC CGG CGC C-3′ (PERV-A-For), which binds to 5′ end region of PERV-A (Y12238) env, and PERV-A-Rev. The PCR cycling condition consisted of 2 min at 94 °C followed by 35 cycles of 98 °C for 10 s and 68 °C for 1 min (two-step PCR). Amplified products (1983 bp) were www.nature.com/scientificreports/ then cloned into the pTA2 vector (TOYOBO) and its nucleotide sequence was confirmed (pTA2/PERV-A env).
To generate PERV-C env cDNA, total RNA was extracted from pig blood (SWINE BLOOD; Rockland, Limerick, Ireland), which was collected in accordance with relevant guidelines and regulations at the licensed Rockland facilities (USDA #23-R-184, NIH BPA#00008695, and NIH Assurance OPRR #A4062-01), using a NucleoSpin RNA Blood (TAKARA, Shiga, Japan) following the manufacturer's instructions. RT reaction was conducted using a ReverTra Ace qPCR RT Kit (TOYOBO) and 5′-CTA GCG GCC AGC TTC CCT GCT AGA CGG-3′ (PERV-C-Rev) which binds to 3′ end region of PERV-C (AF038600.1) and -A/C (AF417230.1) env as the gene specific primer. PCR was then performed by using KOD -Plus-Neo (TOYOBO) and primers of 5′-ATG CAT CCC ACG TTA AAC CGG CGC CAC CTC CCG-3′ (PERV-A&C-For), which binds to 5′ end region of PERV-A (Y12238), -C (AF038600.1), and -A/C (AF417230.1) env, and PERV-C-Rev. The PCR cycling condition consisted of 2 min at 94 °C followed by 35 cycles of 98 °C for 10 s and 68 °C for 1 min. The pig blood was purchased from a commercial source (Rockland) and no pigs have directly been handled as a part of this study. Amplified products (1917 bp) were then cloned into the pTA2 vector and its nucleotide sequence was confirmed (pTA2/PERV-C env). PERV-A/C env in which a portion of the env has been recombined with that of PERV-A was generated by site-directed mutagenesis using the PCR-mediated overlap primer extension method 41 . The recombination site of the PERV-A/C was based on a previous report (PERV-AC env (13653)) 23 . To generate the PERV-A/C env, 3 successive PCR reactions were performed using pTA2/PERV-A env and pTA2/PERV-C env as templates. The first PCR reaction used pTA2/PERV-A env as a template and PERV-A&C-For and 5′-GGT GTT GGT GGG ATG GGG GAA CCT TCCC TAT GCA GGT GCC TTT TCC -3′ (underline; PERV-A (Y12238) env (1111-1131)) as primers, and the second used pTA2/PERV-C env as a template and 5′-GGA AAA GGC ACC TGC ATA GGG AAG GTT CCC CCA TCC CAC CAA CAC C-3′ (underline; PERV-C (AF038600.1) env (1068-1093)) and PERV-C-Rev as primers. The resultant first (1131 bp) and second (849 bp) fragments were used as templates in the third reaction with PERV-A&C-For and PERV-C-Rev as primers. The resultant products (1980 bp) were then cloned into the pTA2 vector and its nucleotide sequence was confirmed (pTA2/PERV-AC env (13653)). The sequence was nearly identical (98.8%) to that of the previous reported PERV-A/C (AF417230.1) env (see Supplementary Fig. S8 online).
To synthesize RNAs of PERV-C and PERV-A/C env, in vitro transcription reactions were performed using Not I-linearized pTA2/PERV-C env and pTA2/PERV-AC env (13653) as templates (RiboMAX Large Scale RNA Production System-T7, Promega, Madison, WI, USA). The synthesized RNAs were purified using PureYield RNA Midiprep System (Promega), and the purified RNAs were then used as model viruses.
RT-PCR and cDNA purification. Briefly, 1 ng of PERV-C env RNA was mixed with serial diluted 0.1, 0.01, or 0.001 ng of PERV-A/C env RNA, and the mixtures were used as templates in RT-PCR reactions. RT reaction was conducted using a ReverTra Ace qPCR RT Kit (TOYOBO) and PERV-C-Rev which binds to 3′ end region of PERV-C (AF038600.1) and PERV-A/C (AF417230.1) as the gene specific primer. PCR was then performed by using KOD -Plus-Neo (TOYOBO) with primers of PERV-A&C-For which binds to 5′ end region of PERV-C (AF038600.1) and PERV-A/C (AF417230.1) and PERV-C-Rev. The PCR cycling condition consisted of 2 min at 94 °C followed by 35 cycles of 98 °C for 10 s and 68 °C for 1 min. Amplified products were then separated by agarose gel electrophoresis and products of 2kbp were purified using the FastGene Gel/PCR Extraction Kit (NIPPON Genetics, Tokyo, Japan). The purity of the products was confirmed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).
Sequencing. DNA samples were further purified using the AMPure XP (BECKMAN COULTER, Brea, CA, USA), blunt-ended, and ligated to SMRTbell adaptors using the SMRTBell Template Prep Kit 1.0 (Pacific Biosciences, Menlo Park, CA, USA) in order to construct SMRTbell libraries. One SMRT cell was used for each library and analysed using the PacBio RS II platform (Pacific Biosciences).
True positive rate of the analysis. The mean number of PERV-A/C reads detected in a single experiment was calculated (λ), and the probability of detecting k reads in a single experiment was calculated by the following probability mass function of the Poisson distribution: where k! is the factorial of k and e is the Euler's number (e = 2.71828…). P(k) = e − k /k!