How many papillomavirus species can go undetected in papilloma lesions?

A co-infection comprising to at least seven papillomavirus (PV) types was detected by next generation sequencing (NGS) of randomly primed rolling circle amplification (RCA) products of a bovine (Bos taurus) papilloma lesion from the Brazilian Amazon region. Six putative new PV types that could not be detected by commonly used PCR protocols were identified. Their overall L1 nucleotide identities were less than 90% compared to described PV species and types. L1 nucleotide BLAST sequence hits showed that each new type was related to Beta, Gamma, Dyokappa, Dyoeta, and Xipapillomavirus, as well as two likely new unclassified genera. Our results show that the employment of NGS is relevant to the detection and characterization of distantly related PV and is of major importance in co-infection studies. This knowledge will help us understand the biology and pathogenesis of PV, as well as contribute to disease control. Moreover, we can also conclude that there are many unknown circulating PVs.


Results
NGS from RCA products reveals distinct BPV genomes. NGS from RCA products of one papilloma lesion enabled the amplification of seven full-length PV genomes. The contigs associated with PV were assembled from 113,616 high-quality reads ( Table 1). The seven contigs were named BPV13 BR/14RO and putative BPV BR/14RO-16 to BPV BR/14RO-21. Chimeric forms were not detected using the RDP4 software. Primer alignment with the Geneious software (version 9) (http://www.geneious.com) 25 found mismatches in all sequences at the 3' end of the FAP59 and MY09 primers (Supplementary Figure 1). A low number of reads probably corresponding to the sequence 14RO_12 (GenBank accession number KP701419) was also identified in the sample. Since the genome was not complete it was not further analyzed.
The genomes are 7,149 to 7,961 Kb in length and display the archetypal organization of PVs (Fig. 1). The first nucleotide in E6 was assigned the number 1 in the sequences. All putative new viruses (BPV BR/14RO-16 to BPV BR/14RO-21) were predicted to contain six (BPV BR/14RO- 18 and 19) to seven ORFs (BPV BR/14RO-16, BPV BR/14RO-17, BPV BR/14RO-20 and BPV BR/14RO-21), coding for early (E6, E7, E1, E2, and E4) and late (L1 and L2) proteins (Fig. 1). The BPV13 variant sequence showed the same characteristics as the reference genome 17 . The overall L1 nucleotide identities of the putative new types were less than 90% in comparison to other PV species and types 1 .

Phylogenetic inferences.
Phylogenetic inferences showed that the six new PV genomes clustered with three known and two unknown PV genera (Fig. 2). Their nucleotide and amino acid identities to the closest related PVs are summarized in Table 2. L1 identities to the most closely related PVs with corresponding GenBank accession numbers following phylogenetic analysis are summarized in Table 2

Discussion
Remarkable efforts have been made to identify human PVs using numerous clinical arrays that can detect dozens of distinct HPV genotypes in the same sample 13,14 using several generic PV primer systems 17 . These efforts reflect over than 200 HPV genomes that are fully sequenced, characterized and cataloged (PaVE). In comparison to HPV, only 15 BPV types have been detected and fully sequenced thus far (PaVE). This scenario is probably pictured because of the lower efforts in BPV studies when compared to HPV due clinical relevance and funding applied. Co-infections comprising HPV are commonly reported in young or immunodepressed women [26][27][28] . On the other hand, BPV co-infections comprising up to six known PV types based on multiplex BPV genotyping assay 20 or specific primers 18,29 are rarely reported. Although the majority of BPV types and putative new types have been characterized using generic or genus-specific primers 8 because they allow the discovery of only closely related PVs but rarely detect mixed infections. In the present study, the combination of RCA and NGS allowed the detection of at least seven BPVs co-infecting the same lesion, including six putative new BPV types. An isothermal protocol that uses φ 29 DNA polymerase to amplify complete PV genomes was previously developed 31 . Following the amplification of the genome, there is the need to analyze the DNA using restriction enzymes, cloning, and sequencing which is labor-intensive and time-consuming. Additionally, multiply primed RCA and primer-walking for entire genome sequencing already enabled the discovery of novel PVs 21 . Moreover, there is no evidence of co-infection. One possibility is that the specific degenerate primer pairs used in these studies selected a virus population with higher affinity to primer binding than unknown viruses that may be present in the same sample. This could lead to an underestimation of the detection of other viruses in mixed infections and even a failure to detect distant phylogenetic PVs.
A RCA followed by NGS approach was applied to analyze a sample from which have been previously found a putative new BPV type 8 . The method enabled the detection of seven PVs with six being uncharacterized so far. The randomly primed RCA followed by NGS offers the possibility to amplify and detect any circular DNA that is present in a sample without specificity, thus allowing a great overview of unknown PVs. This technique could magnify the sensitivity of all PVs present in one sample and allow the understanding of the natural history of infection by different PV types. This approach is meaningful once there is more evidence suggesting that cervical infections caused by some HPV types may also depend on the existence of other HPV types 13 , thereby suggesting a synergistic pattern. NGS present some restrictions as limiting capability to find mutations like single nucleotide polymorphisms (SNP), insertions and deletions in regions of lower coverage 32,33 . To minimize the possible base calling errors, in the present study, only high quality reads (Q ≥ 30) were used for de novo assembly. Also, although this Illumina platform displays some underestimation in AT-rich and CG-rich regions 33 all putative new types and the BPV13 described in this study presented a GC content considered normal when compared to other PV family members.
The present method enables the detection of a large number of putative new types suggesting the existence of many other BPV types that may have been underestimated thus far. Such a massive PV co-infection indicates that these mammals can harbor genetically diverse PVs similar to humans. Additionally, the Amazon region ecosystem harbors one of the largest global biodiversities and it is quite propitious for the emergence of novel strains. However, there is a need for deeper investigations on this issue by applying this method in all PV affected animals, including other cattle herds worldwide and humans.
We have shown that the enrichment method together with the Illumina NGS platform works for a range of PV genera detection such as Dyokappa, Xi and Gammapapillomavirus. Moreover, the identity of three new types showed inter-genera localization, and these types probably compose two new genera in the Papillomaviridae family. These findings indicate a high number of undetected PVs ignored in the usual assays. Therefore, it is essential to use unbiased methods for the discovery of highly divergent novel viruses as has been done with numerous human and animal agents 34 .
Most of the xipapillomaviruses present E5 or E8 gene in the E6 genomic position, including BPV4, BPV9, BPV10, BPV11 and BPV15 (GenBank accession no. X05817, AB331650, AB331651, AB543507, and KM983393, respectively) that encode E5, and BPV3, BPV6 and BPV12 (GenBank accession no. AF486184, AJ620208, and JF834523, respectively) that encode E8. The novel xipapillomaviruses detected in the present study present E6 in the E6 genomic position as well as RtPV2 that clustered in the same terminal node that BR/14RO-20. BR/14RO-17 formed a separated branch within Xipapillomavirus and probably is a novel species. In conclusion, the combination of two relatively simple and fast methods already developed to amplify and genotype PV genomes proved to be very effective in the detection of known and unknown PV viruses using small amounts of DNA derived from one PV lesion. Furthermore, viral genomes can be largely reconstituted by currently available de novo assembly algorithms. The presence of multiple PV types and variants in the same lesion will allow the development of new studies regarding the roles of these different viruses in the biology and pathogenesis of the diseases in which they are involved.  was removed using scalpels after local anesthesia (performed with 2% lidocaine, Bravet, Brazil). One putative new PV type was previously detected when a L1 fragment was sequenced using FAP primers 8 . To obtain the complete genome sequence of this putative new type, the tissue was processed and genomic DNA was extracted as described previously 8 . To amplify the full PV genomes, randomly-primed rolling circle amplification (RCA) was performed essentially as described by Rector et al. 31 using 100 ng of purified DNA from the biopsy specimen. The amplicons were analyzed by agarose gel electrophoresis stained with Blue Green Loading Dye I (LGC, Brazil) and examined under UV light with the Molecular Imaging Software Gel Logic (Kodak, USA).

Material and Methods
Following RCA, DNA was purified using a Genomic DNA Clean & Concentrator (Zymo Research). The quality and quantity of the DNA were assessed using a Nanodrop spectrophotometer (Thermo Scientific) and a Qubit fluorometer (Invitrogen). DNA fragment libraries were further prepared with one ng of purified RCA DNA using a Nextera XT DNA sample preparation kit and sequenced using an Illumina MiSeq instrument (2 × 150 paired-end reads with the Illumina v2 reagent kit).
Genome assemblies and sequence analyses. The paired-end sequence reads were assembled into contigs using SPAdes 3.5 35 and compared to sequences in the GenBank nucleotide and protein databases using BLASTn/BLASTx. Geneious software version 9 25 was used for open reading frame (ORF) and conserved domain predictions as well as genome annotation. Motif Scan (http://myhits.isb-sib.ch/cgi-bin/motifscan) was used to confirm the conserved domain prediction pointed by Geneious software version 9 25 .
Similarities searches were performed using local sequence alignments BLAST 36 . Global sequence alignments were accomplished to determine sequence identities with MUSCLE software 37 . Representative sequences within genera and sequences with the highest identities to the sequences from the present study that are available in GenBank were retrieved from the NCBI homepage (http://www.ncbi.nlm.nih.gov/) for phylogenetic analysis. Altogether, the dataset consisted of 45 sequences of the L1 gene. The multiple sequence alignments was performed through the MUSCLE software 37 .
The phylogeny was reconstructed with a maximum likelihood method using the MEGA6 software 38 . These analyses were performed using the GTR substitution model, and the algorithm was modeled with a gamma distribution (shape parameter = 5). The nucleotide substitution model was defined by the tool "find best DNA/Protein model (ML)" of MEGA6 software 38 . Statistical support was provided by 1,000 non-parametric bootstrap analyses.
RDP4 software 39 , using the RDP 40 , GENECONV 41 , BOOTSCAN 42 , MAXCHI 43 , CHIMAERA 44 , SISCAN 45 and 3 SEQ 46 methods using default settings were used to look for the presence of chimeric genomes that can arise during the building of contigs. Recombination was considered credible in sequences only if they were detected by more than three methods having significant P values coupled with strong phylogenetic support for recombination. To verify any mismatches that could make difficult the annealing of the viruses detected with the degenerate primer regions, all generated sequences were aligned with primers FAP59/64 4 and MY09/11 47 using ClustalX software 38 .