Metagenomic analysis and identification of emerging pathogens in blood from healthy donors

Emerging infectious pathogens that threaten blood transfusions are known to be present in blood samples from healthy/qualified donors. The objective of this study was to investigate the microbiome of blood from healthy donors from the Luzhou area in southwestern China. Potential pathogens and cytomegalovirus (CMV) infection in the donor blood were identified. Total plasma nucleic acids were extracted from one pool of 5734 samples and were constructed for metagenomics analysis using Illumina sequencing. The microbiome and potential emerging/re-emerging pathogens were identified using bioinformatics analysis. Moreover, CMV antigen was measured via an enzyme-linked immunosorbent assay, and the CMV DNA level was assessed by quantitative RT-PCR. A total of 132 bacterial reads, 65 viral reads and 165 parasitic reads were obtained. The most frequent bacterium was Escherichia coli (95/132, 72%) with 95 reads in 132 bacterial reads, and the most prevalent parasite was Toxoplasma gondii (131/165, 79%). Among the viruses, cytomegalovirus (44/65, 68%) accounted for the highest frequency, followed by Hepatitis E Virus (10/65, 15%). Moreover, the positive rate of CMV-IgG was 46.25% (2652/5734), and the positive rate of CMV-IgM was 5.82% (334/5734). The positive rate of dual positive (IgG+ and IgM+) CMV was 0.07% (4/5734). Twenty-one (0.37%) specimens from 5734 donated blood samples were positive for CMV DNA. The CMV DNA levels ranged from 7.56 × 102 to 3.58 × 103 copies/mL. The current study elucidated the microbiome structure in blood from healthy/qualified donors in the Luzhou area and identified emerging/re-emerging pathogens. This preliminary study contributes to information regarding blood transfusion safety in China.

Illumina HiSeq 4000 sequencing and bioinformatics analysis. The cDNA libraries were sent to Novogene (Tianjing, China) for high-throughput sequencing using the Illumina HiSeq 4000 platform. The samples were used to construct the PE150 library, and upstream quality control (QC) of the raw data was completed. The bioinformatics analysis consisted of 3 main steps. First, the adaptor sequences were deleted. Second, very low-quality reads were removed. If a read had over 50% bases with Q ≤ 5, it was considered a low-quality read and removed. Third, duplicate reads were removed. Finally, the sequences with Q30 > 70% were identified using MCS 2.0 software, resulting in approximately 2 GB of data. The raw data contained a large amount of nontarget sequences, which were mainly from parasites (human). Therefore, data filtration was necessary before further processing to remove the human sequences. Then, all raw data were compared to the human genome using Bow-tie2 software, which is a large-scale comparison software program developed specifically for second-generation sequencing with high efficiency, speed and accuracy. Matched reads that represented data from humans and were nontarget sequences were filtered. A sensitive model was selected as the basic parameter and the others were used as defaults. After filtration, the data were applied for Blastn, Blastx and tBlastx sequence comparisons. Sequences with E > 10-3 were considered nonidentifiable. Because the input sequences were shorter, most of the data yielded results with smaller E-values.
Filtered sequencing read mapping to reference genomes was performed using the Burrows-Wheeler Aligner (BWA) alignment software that performs fast alignments of short sequences against a reference sequence. Specifically, if all results in the match set belonged to one species, then they belonged to that species. Moreover, if they belonged to a different species in a single genus, they belonged to that genus, and if they belonged to different genera in the same family, they belonged to the same family. Based on this logic, all results underwent taxonomy allocation. Once all the results were obtained, the total species and dominant species of the microbiome in each sample could be statistically analyzed.
Detection of immunoglobulin G (IgG) and IgM antibodies to CMV with a commercial ELISA kit. The CMV levels in the 5734 blood samples were measured via ELISA. The CMV antibodies in specimen serum were detected using an anti-CMV IgG/IgM ELISA kit following the manufacturer's protocol (Human anti-cytomegalovirus antibody IgG ELISA Kit,) and Human anti-cytomegalovirus antibody IgM ELISA Kit, Cusabio, USA). The selected ELISA reactive samples were used as external controls on the first and last plate during each testing day as an additional QC measure. A positive result (S/C.O. ≥ 1) was considered for samples that had an absorbance greater than or equal to the cut-off value, which indicated the presence of CMV antibodies. CMV DNA detection by real-time PCR assay. DNA was extracted from 200 μL of each serum sample using the QIAamp DNA Blood Mini Kit (Qiagen). The DNA extracts were stored at − 80 °C before PCR analysis. All the ELISA-positive samples were tested for CMV (AY186194.1). Real-time PCR was used to detect CMV DNA in the plasma samples. Standard curves were generated using the quantified DNA containing the targeted sequences in the CMV major immediate-early (MIE) gene by inserting 136-bp conserved region fragments into a PTA2 Vector. All RT-PCRs were performed on an ABI 7500 instrument (Applied Biosystems, Foster City, CA, USA) with 25 μL of the FastStart Universal SYBR Green Master (Rox) Kit (Roche) and 5 μL of DNA template. The primers used for the detection of CMV, Q-CMV-F (forward primer 5′-GAC TAT CCC TCT G TCC TCA GTA-Scientific RepoRtS | (2020) 10:15809 | https://doi.org/10.1038/s41598-020-72808-8 www.nature.com/scientificreports/ 3′) and Q-CMV-R (reverse primer 5′-AGA CAC TGG CTC AGA CTT GA-3′), were used to amplify a 136-bp segment from the MIE gene. Negative controls that used water as a template and positive controls that used 500 plasmid copies as a template were also included in each run. The cycling conditions were as follows: Informed consent. Informed consent was obtained from all individual participants included in the study.

Results
High-throughput sequencing results. After extraction, nucleic concentrations were quantified using a UV spectrophotometer (DNA/cDNA concentration should be 200 ng/µl). The sample libraries were sent to Novogene (Tianjing, China) for Illumina HiSeq 4000 high-throughput sequencing to obtain raw data. The workflow is shown in Fig. 1. The raw data from the HiSeq were deposited in the short reads archive of GenBank. The base percentage distribution and read qualities in data filtering are shown in Fig. 2A, B, respectively. The adaptor sequences, contamination and low-quality reads were removed from the raw reads. The results are shown in Table 1. A total of 1.38 GB of DNA data were obtained, including 2,967,242 clean reads. Synchronously, 2.08 GB of cDNA data was obtained, including 3,450,046 clean reads ( Table 1). The DNA pool generated 2,967,242 clean reads. The Q30 value, which indicates the percentage of bases with quality values larger than or  Sequence analysis of potential pathogens in the blood samples. To evaluate the microbial community and potential pathogens, an additional bioinformatics analysis was employed as described in Materials and Methods. The microbial community results show that 36.5% (132/362) of the sequences were from bacteria, followed by 18% (65/362) from viruses and 45.5% (165/362) from parasites, as shown in Fig. 3A-C and Table 2. Potential pathogens that had less than 5 reads were removed. Table 2 shows taxonomic categories of 132 reads from bacteria, 65 reads from viruses and 165 reads from parasites. Among the bacteria, the most frequent species were Escherichia coli (72%) with 95 reads, followed by Zymomonas mobilis (11%) with 15 reads, Burkholderiaceae (5%) with 7 reads and Ralstonia pickettii (4%), Pseudomonas sp. (4%) and Enterobacteriaceae (4%), each with 5 reads (Fig. 3A and Table 2). Among the parasites, Toxoplasma gondii (79%) accounted for the highest frequency with 131 reads, followed by Leishmania infantum (10%) with 16 reads and Plasmodium falciparum   www.nature.com/scientificreports/ (6%) and Spirometra erinaceieuropaei (5%) with 10 and 8 reads, respectively ( Fig. 3B and Table 2). Among the viruses, cytomegalovirus (68%) with 44 reads accounted for the highest frequency, followed by Hepatitis E Virus (15%) with 10 reads. Moreover, 2 viruses in Anelloviridae were detected, including Torque teno mini virus (9% with 6 reads) and Torque teno virus (8% with 5 reads) ( Fig. 3C and Table 2).
CMV antibody and CMV DNA detection by ELISA and real-time PCR. The CMV IgG and IgM antibodies in healthy/qualified blood donor samples were measured by ELISA. As shown in Table 3, a total of 5734 serum samples were collected and screened from for CMV antibodies, of which 2986 samples tested positive (IgG or IgM) with a rate of 52.08% (2986/5734). The positive rate of CMV-IgG was 46.25% (2652/5734) while the positive rate of CMV-IgM was 5.82% (334/5734). The positive rate of both CMV-IgG and CMV-IgM was 0.07% (4/5734). No significant differences in the positivity rates were detected among sex, age, residence, Table 2. Annotation statistics of potential blood sample pathogens. After high-throughput sequencing and data filtering, bioinformatics analysis was employed to evaluate the microbial community and potential pathogens. The results show that 32.5% of sequences were obtained from bacteria, 15.1% from viruses and 2.5% from parasites. The reads are specific and not an indication of environmental contaminants. After the next generation sequencing (NGS), a total of 1.98 Gb of data were obtained, including 3,967,242 paired-reads, and 1,983,621,000 bases (bp). All the sequences with Q30 > 70% were identified using the MCS2.0 software, resulting in approximately 2 GB of data. After remove human sequences, data were used for Blastn, Blastx, and tBlastx sequence comparisons with NCBI library.  Table 3. Clinical characteristics of CMV-IgG+, CMV-IgM+ and CMV-NAT+ in blood from healthy donors. The IgG and IgM antibodies of CMV in the blood samples were measured by ELISA. CMV DNA was detected by quantitative real-time PCR. No significant differences in the positivity rates were detected among sex, age, residence, profession and ethnicity. Of the 2986 positive samples (IgG or IgM), the positive rate of CMV-IgG, CMV-IgM and both CMV-IgG and CMV-IgM was 46.25%, 5.82%, 0.07%, respectively. Decreased CMV DNA was noted in the positive specimens (7.56 × 10 2 to 3.58 × 10 3 copies/mL). The positive rate of CMV-IgG was 78.91% in age 45-55 group, it's higher than other age groups. In this blood donor population, 21 CMV DNAreactive samples were found by real-time PCR, accounting for 0.37% (21/5734), and the positive rate of both CMV-IgG and -IgM was 0.07% (4/5734).

Category Columns
Total n www.nature.com/scientificreports/ profession, or ethnicity. The positive rate of CMV-IgG was 78.91% in age 45-55 group, it's higher than other age groups. These results highlight the urgent need to test for CMV antibodies in donor blood to ensure safety. A quantitative real-time PCR system was used to detect CMV DNA in the plasma samples. As shown in Table 3, 21 CMV DNA-reactive samples were found, accounting for 0.37% of the total blood samples (21/5734). Interestingly, decreased CMV DNA was noted in the positive specimens (7.56 × 10 2 to 3.58 × 10 3 copies/mL).

Discussion
The advantages of metagenomics technology in blood transfusion research include high efficiency and broad pathogen coverage. In recent years, metagenomics technology has been used to analyze inorganic environments, including the ocean 17 and soil 18 , and has also proven remarkably useful in studies of pathogens carried by animals such as birds, bats, turkeys and sea turtles 19 . The results from such analyses have allowed for the description of the microbiomes of these animals 20,21 . Here, we employed Illumina HiSeq 4000 high-throughput sequencing for metagenomics analysis to resolve the microbiome in the blood of 5734 healthy/qualified donors collected from 2017 to 2018 in the Luzhou area in southwestern China. We identified the taxonomy of emerging/re-emerging pathogens and cytomegalovirus (CMV) infection using the bioinformatics analysis, ELISA and quantitative real-time PCR.
In this study, we assessed the microbiome structure and demonstrated that healthy/qualified blood donors in southwestern China might carry emerging/re-emerging pathogens, including low-level CMV infection. We also showed that Toxoplasma gondii was the most prevalent parasitic pathogen, followed by Leishmania infantum and Plasmodium falciparum. Toxoplasma gondii infection is typically silent and is most commonly transmitted by animals 22,23 . Close contact between humans and infected animals is one of the major transmission routes of Toxoplasma gondii infection [22][23][24] . Moreover, Toxoplasma gondii infection can be transmitted through blood transfusion. Populations with low and defective immune function are particularly susceptible to acquiring Toxoplasma gondii infection from blood transfusion and can suffer severe consequences 25,26 . The blood collection and supply system in China does not perform routine screening for toxoplasmosis: however, whether Toxoplasma gondii detection should be performed for certain blood recipient populations is worth consideration. Furthermore, DNA fragments of pathogens that are considered threats blood transfusion safety in Europe and America 8 , including as P. falciparum and L. infantum, were discovered in this study. Considering that malaria and Leishmania infection are currently resurging 27,28 and that Luzhou and the surrounding region are within the endemic area, the blood collection and supply system should enhance their surveillance of these parasites in donated blood samples. Many types of bacteria were also identified in this study, including Escherichia coli, which accounted for the highest frequency. These bacteria can potentially cause chronic infection in the blood and bone marrow. Further contamination may occur due to improper disinfection during blood collection or experimental processes. Therefore, blood collection personnel should maintain high disinfection standards when manipulating blood samples.
Previous studies have shown a certain prevalence of CMV in Chinese blood donors 29 . Interestingly, we found that the viral load of CMV infection was lower (below 10 4 copies/mL) in southwestern China. Although a high number of reads with CMV were detected in samples from the Luzhou area, the positive rate of both CMV-IgG and CMV-IgM was low, and the quantitative DNA levels ranged from 7.56 × 10 2 to 3.58 × 10 3 copies/mL. CMV infection, characterized by host immunosuppression, is most commonly transmitted through blood transfusion and causes an asymptomatic infection or mild flulike symptoms 30 . Clinical trials have found that primary CMV is typically silent in pregnant women, healthy children and adults 31 . Populations with low and defective immune function are particularly susceptible to acquiring CMV infection from blood transfusion and can suffer severe consequences. The blood collection and supply system in China does not perform routine screening for CMV 32 . The data from this study suggest that CMV detection should be considered for certain blood recipient populations. Future studies are required to isolate viruses on the 21 qPCR-positive CMV samples. Interestingly, the positive rate of CMV-IgG was 78.91% in age 45-55 group and that was higher than other age groups, we speculate that it may depend their habits or immunity.Other viruses were identified in this study, including Hepatitis E Virus and 2 types of viruses in Anelloviridae: Torque teno mini virus and Torque teno virus. Anelloviridae infection causes a broad range of clinical manifestations as well as asymptomatic infection in humans 33,34 . Currently, data on these viruses in China are scarce. The infection rate of Anelloviridae in healthy/qualified populations in countries such as Japan is close to 100%, and the infection rates in Great Britain and America are approximately 10% 33,35 . A high viral load of Anelloviridae infection has been shown to cause some clinical symptoms in humans 34,36 ; however, whether these viruses can cause disease remains unclear.
This study identified pathogens in the microbiome of donated blood samples and discovered emerging pathogens that are already present in the blood supply. These pathogens therefore pose a risk yet are not being tested for in the blood supply. Because of constraints related to the number of collected samples and time, we were unable to perform a comprehensive analysis that is truly reflective of the prevalence of emerging/re-emerging pathogens in healthy/qualified blood donor samples in southwestern China. Our data suggest that parasites should be an area of focus for blood donors in the Luzhou area. These prospective results obtained using metagenomics provide references for the surveillance of certain pathogens. Large-scale epidemiological surveys targeting specific parasites should be performed to understand the actual prevalence of these parasites in blood from healthy/ qualified donors.