Metagenomic analysis of a throat swab sample collected in China on a patient infected with Varicella Zoster Virus

Varicella Zoster Virus (VZV) is endemic worldwide, causing varicella in children and zoster upon reactivation in adults. This study concerned a metagenomic analysis of a throat swab sample collected in China, on a young patient suffering from Systemic Lupus Erythematosus (SLE) and diagnosed with varicella. The complete genome sequence of a VZV strain of clade 2 has been generated. Clade 2 strains are the most prevalent in Asian countries. A comparison of 223 VZV genomes identified 77 clade specific markers, 20 of them specific to clade 2. The metagenomic analysis also identified sequences covering most of the genome of the bacteria Schaalia odontolytica also known as Actinomyces odontolyticus. VZV infection and bacterial infection in the context of SLE is further discussed. Even though the patient presented only mild symptoms, this study is a reminder that vaccination against VZV is critical to avoid severe complications like bacterial superinfection or even death in the case of immunodeficiency.


Introduction
Varicella Zoster Virus (VZV) belongs to the alphaherpesviruses and is also known as human herpesvirus 3 (HHV3) 1 . VZV genome is a double stranded DNA of around 120 kb. It consists of two regions, large and short ( L and S ), each consisting of unique and repeat sequence (U and R). Each U region is anked by a repeat sequence, called terminal (T) and internal (I). In addition to the genome structure TR L U L IR L IR S U S TR S , the genome contains 6 repeat regions called IR1, 2, 3, 4a, 5 and 4b. IR4 is located in IR S and is inversely duplicated in TRs. The genome encodes 73 open reading frames (ORFs), 3 of them being inversely duplicated in TR S . Sequence analysis identi ed 7 distinct clades (1,2,3,4,5,6,9). A putative clade VIII has been reported once 2 .
VZV infection usually causes varicella or monkeypox in children and zoster upon reactivation in adults. In addition to varicella and zoster, VZV infection has been associated with severe complications like bacterial superinfection, pneumonia, hepatitis, nephritis, encephalitis and even death 1 .
A live attenuated VZV vaccine has been developed based on the clade 2 Oka strain. Currently, many countries have introduced varicella vaccine in universal routine vaccination program and administered 2 doses of vaccine to control and prevent chickenpox 3 . In China, varicella vaccine became available in 1998 and is currently available in the private sector.
This study concerned a metagenomics analysis of a throat swab sample collected in China on a young patient suffering from lupus and diagnosed with varicella. In addition to generating the complete genome sequence of the VZV strain contained in the throat swab sample, the metagenomics analysis also identi ed bacterial sequences present in the sample. Finally, association between VZV and lupus is discussed.

Materials And Methods
Page 3/9

Ethical statement
The second session of the Ethics Review Committee of the National Institute for Viral Disease Control and Prevention (IVDC) at China Centers for Disease Control and Prevention (CDC) determined that the present study followed the working regulations of ethics review committee of Institute for viral disease control and prevention of Chinese Center for Disease Control and Prevention and therefore approved the study. The legal guardians of the patient involved in this study provided written informed consent to have data/samples from her medical records used in research. All methods were performed in accordance with the relevant guidelines and regulations.

Clinical sample
A throat swab sample was collected on a 10-year-old girl diagnosed with varicella at Children's Hospital of Chongqing Medical University. The date of onset was estimated on December 21 st 2018. The patient immediately found a rash all over her body after having contact with her sister with chickenpox. The patient began to exhibit macular papules that gradually turned into herpetic rash with itching, oral pain and no salivation. During the course of the disease, there was neither fever, spasm, disturbance of consciousness, headache, vomiting, unstable walking, skin/mucous membrane bleeding, abdominal pain, vomiting nor diarrhea. At the age of 9, the patient was diagnosed with Systemic Lupus Erythematosus (SLE) and Lupus Nephritis (LN) and was treated with glucocorticoids and immunosuppressants. The sample was collected in viral transport medium and stored at -80 °C until use.

Next Generation Sequencing (NGS)
Total DNA was extracted using QIAamp DNA Mini Kit (QIAGEN, Germany) then fragmented using the ultrasonicator S220 (Covaris, USA). A sequencing library was generated with the KAPA HyperPlus kit (Roche, Switzerland) and sequenced with the NovaSeq 6000 system (Illumina, USA). The paired-end reads have been deposited in the NCBI Sequencing Read Archive under the accession number PRJNA681411.

NGS analysis pipeline
Sequence quality was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Sequences were cleaned up with Trimmomatic 4 (http://www.usadellab.org/cms/index.php?page=trimmomatic). The sequences from the human genome (GRCh38) were depleted using the assembler Burrows-Wheeler Alignment tool (BWA) and Sequence Alignment/Map (SAM) tools 1.9 5,6 . The remaining sequences were processed in 2 ways: 1assembly with SPAdes 7 ( http://bioinf.spbau.ru/spades ); 2-mapping against the VZV reference strain Dumas (NC_001348) using BWA and SAMtools 1.9 5,6 . The VZV-related sequences were then de novo assembled using Sequencher 5.0 (Genecodes Corp., Ann Arbor, MI, USA). The sequence strategy as well as the number of reads at each step of the analysis pipeline are shown in Fig. S1. The full-length genome sequence was annotated in Artemis 16.0.0 8 based on the Dumas reference genome (NC_001348) and submitted to GenBank (MW316406). The viral strain was named VZV/Chongqing.CHN/2018/V [2]. For convenience, however, the sample is referred in the manuscript as SD14.

VZV sequence analysis
Two hundred thirty-two VZV genomes were downloaded from GenBank. Identical or incomplete sequences were discarded. The remaining 222 genome sequences were compared with the de novo SD14 sequence (Table S1) was submitted to GenBank (MW316406).

Results
Close to 230 million sequencing reads were analyzed (Fig. S1). VZV-related sequences (11,275 sequences mapped on the VZV reference genome sequence strain Dumas (NC_001348)) were assembled in Sequencher. The de novo assembly generated 4 contigs. Gap sizes were estimated based on the Dumas strain, from 11 to 18 nucleotides. SD14 full genome was estimated at 125,184 nt long and 46.13% G+C content. This is in the same range as other HHV 3. For example, Dumas strain was estimated at 124,884 nt and 46.02% G+C 16 . As expected, SD14 genome encoded all known 73 ORFs, ORF62, 63 and 64 being duplicated in reverse direction in the TRs. A comparison with 222 genomes identi ed 12 SNPs only found in the SD14 genomic sequence (Table 1). All 12 positions were located within ORFs but only 3 nucleotide substitutions were non-synonymous changes, L135R in ORF6, K98T in ORF37 and N47S in ORF54. None of these 3 non-synonymous substitutions were predicted to have an effect on protein function based on PROVEAN analysis.
The comparative genomic analysis of 223 genomes identi ed 2880 SNPs (Table S2). Phylogenetic trees based on the 2880 concatenated SNPs identi ed in the 223 genomes were generated (Fig. 1, Fig. S2).
Both NJ and ML trees showed that SD14 strain belonged to clade 2 and could therefore be formerly named as VZV/Chongqing.CHN/2018/V [2]. The SNPs analysis identi ed 77 positions that were conserved within a clade and could therefore be considered as clade markers (Fig. 2, Tables S2-3). The number of markers was highly variable depending on the clade, from 20 in clade 2 to only 1 in clade 3. Furthermore, the distribution of the markers throughout the genome was not random. For example, 2 markers were identi ed within the 70-80 kb region of the genome whereas 15 markers were identi ed within the 90-100 kb genomic region. Finally, 34 of the 70 ORFs featured at least one clade speci c marker. If we consider the number of markers and the size of the ORF, ORF60 with 480 amino acid (AA) and 2 markers featured the most whereas ORF31 featured the less with 2 markers among 2796 AA. Regarding the 20 clade 2 speci c markers (in purple in Fig. 2), 18 were located within ORFs and 6 were non-synonymous changes: C1159R in ORF28, T136P in ORF31, P374F in ORF33, E128D in ORF54, H69P in ORF57 and A107T in ORF60 (Tables S2-3). The vaccine strains related to clade 2 Oka strain featured an additional marker (g911191t) (Tables S2-3). SD14 did not feature this vaccine marker and was therefore not related to vaccine strains as it was shown in the phylogenetic trees (Fig. 1, Fig. S2). Phylogenetic trees on the 8 genomic regions were generated in order to identify any major recombination event (Fig. S3). Whereas some recombination events were identi ed among VZV genomes from clades 3, 6 and 9 (identi ed with * in Fig. S3), no evidence of major recombination event was identi ed for SD14 genome, as SD14 sequences were always found within the clade 2 cluster.
Close to 230 million sequencing reads were depleted from the sequences of the human genome GRCh38 (Fig. S1). The remaining sequencing reads (~33 million, 14.4% of the sequencing data) were assembled using SPAdes and 159,894 contigs were generated (Table S4). The size of the contigs was highly variable, from 0.5Mb to 78 nt, with an average of 731 nt. A blastn search among the 100 largest contigs is summarized in Table 2. The bacteria Schaalia odontolytica was the most prevalent hit with 15 hits and 2.2 Mb of cumulated contig size.

Discussion
This report concerned the metagenomic analysis of a throat swab sample collected in China from a young VZV patient. The phylogenetic analysis showed that this sample was of clade 2. Seven clades have been identi ed for VZV, one additional clade (VIII) is putative as only one strain has been reported so far (reviewed in 2 ). Clades 1 and 3 are mainly observed in Europe and Americas whereas clades 4 and 5 are frequently seen in people with African origin 17,18 . Clade 2 has been the dominant clade in Asian countries like Korea, Japan and China. Clade 6 and 9 have been reported recently and there is not enough data to conclusively assign a geographic region to these clades.
The number of complete VZV genome sequences has dramatically increased recently, 232 as of December 2020. Comparative genomic analysis showed that the VZV genome is very stable. The current analysis compared 223 genomes and identi ed 2880 SNPs, representing 2.3% of the genome. Several studies reported recombination events among VZV 9,19 . The present study did not identify any obvious recombination event within SD14 strain. Despite recombination events among VZV genomes, the present study identi ed positions that were conserved among clades meaning that the genomic regions containing these positions were not involved in recombination.
The present study concerned a throat swab sample. VZV samples are generally collected from rash vesicles. Among the 222 genome sequences analyzed in the present study, 137 (62%) were derived from vesicle uid or skin lesion. Unfortunately, the sample information was not available for 64 sequences (29%). None of the analyzed sequences were from throat swab sample. In addition to VZV-related sequences, the present metagenomic analysis identi ed multiple bacterial sequences. The most prevalent was from Schaalia odontolytica. The genome of Schaalia odontolytica has been estimated at 2.3Mb (GB ID NZ_CP040006). The present study reported a cumulative contig size of 2.2Mb suggesting that most of the bacterial genome can be detected in the sequencing data and it is likely that the patient suffered a severe oral bacterial infection. Schaalia odontolytica is also known as Actinomyces odontolyticus 20  presented an increased risk of disease ares if they were infected with VZV 29 . To our knowledge, the young patient showed only mild symptoms limited to skin rash. It is possible that the young age (10) of the patient might be the reason why she suffered a mild disease despite her immune de ciency. Most of the reports of an effect of VZV on SLE/LN patients concerned older patients with herpes zoster 29,30 .The patient was diagnosed with SLE and LN at the age of 9 and was treated orally with 5mg glucocorticoid daily. After suffering from varicella, she was prescribed with proprietary Chinese medicine, Siji Antiviral Oral Liquid, Lysine Inosite and Vitamin B12 Oral Solution and treated with Acyclovir for external use. However, as the treatment was not e cacious, hormonal treatment was replaced by Piperacillin-Tazobactam. In China, patients with SLE are often treated with immunosuppressants and glucocorticoid. As a consequence, these patients are easily affected by external factors and are physically weak. To our knowledge, any association between varicella disease and lupus remains to be reported.
In summary, this study concerned the metagenomic analysis of a throat swab sample collected in China from a young VZV patient suffering from SLE and LN. The VZV strain identi ed was of clade 2, clade prevalent in Asian countries. A comparison of 223 VZV genomes identi ed 77 clade speci c markers, among them 20 were speci c to clade 2. The metagenomic analysis identi ed sequences covering the entire genome of the bacteria Schaalia odontolytica also known as A. odontolyticus which have been linked to tooth decay as well as severe complications especially in immunocompromised patients. Even though the patient presented only mild symptoms, this study is a reminder that vaccination against VZV is critical to avoid severe complications like bacterial superinfection or even death in the case of immunode ciency.