Main

The etiology of childhood leukaemia is unclear in 95% of all cases. The development of some acute lymphoblastic leukaemias (ALL) can be traced back to a prenatal origin, starting in utero with a pre-leukaemic clone, followed by the acquisition of postnatal genomic losses or gains originated from the prenatal cytogenetic aberration. The evidence of an in utero first event is documented by molecular studies from neonatal blood spots (NBS), cord blood, twin studies and space-time clustering data (Gustafsson and Carstensen, 2000; Greaves et al, 2003; Greaves 2005, 2006a, b, 2009; Bateman et al, 2010). Many leukaemic clones found at time of diagnosis in ALL can also be detected in NBS, such as ETV6-RUNX1 fusion, hyper diploid ALLs, infant MLL-rearrangements, Phil+ALLs and rare forms of infant T-cell ALLs (Wiemels et al, 2002; Maia et al, 2004; Eguchi-Ishimae et al, 2008; Bateman et al, 2010; Cazzaniga et al, 2011; Mansur et al, 2011, 2015; Alpar et al, 2014). Concerning the MLL gene fusions, there is evidence of a transplacental chemical carcinogenesis (Greaves, 2005). However, the cause of cytogenic aberrations in the major part of leukaemias still remains unknown.

Several human tumour DNA viruses are known to be involved in the development of a malignant clone (Lindblom et al, 2005; Weitzman and Ornelles, 2005; Hudnall et al, 2008; Chang and Moore, 2012; Chabay and Preciado, 2013). These viruses can persist in the lymphoid cells and suppress double-stranded DNA-break repair, also known as DNA damage response (DDR). As the DDR protects the genome from the accumulation of deleterious mutations, down-regulation is associated with an increased risk of clonal development. This suggests that a virus infection may generate the aberrant clones of lymphocytes that precede development of ALL (Boggs, 1999; Hollingworth and Grand, 2015; Ornelles et al, 2015).

Although epidemiological evidence proposes that ALL may be initiated by in utero infection with a common pathogen, the identification of such a pathogen has remained elusive (Gustafsson and Carstensen, 1999; Wiemels et al, 1999; Canfield et al, 2004; Isa et al, 2004). Unlike viral-specific methods, unbiased next-generation sequencing (NGS) can provide a holistic picture of the virome of ALL patients, thus facilitating the identification of viral candidates that might lead to ALL development. The use of unbiased NGS in the study of the human virome has been widespread during the last decade, aiding in the detection of known and unknown viruses from both isolated cases and from major disease outbreaks (Cox-Foster et al, 2007; Palacios et al, 2008; Towner et al, 2008; Nakamura et al, 2009; Schowalter et al, 2010; de Vries et al, 2011; Yozwiak et al, 2012).

The aim of this study is to identify viral candidates that could be involved in the first step of leukaemogenesis by investigating whether in utero infections of DNA viruses are more common in newborns who later developed ALL. To accomplish this, unbiased NGS was used to characterise and compare the DNA virome at the time of birth from ALL and non-ALL children.

Materials and methods

Patients

Children who later developed ALL were randomly identified from the Nordic Society of Paediatric Haematology and Oncology register and linked to the Swedish Medical Birth register to gain access to the personal codes of mothers and children necessary for NBS identification. Controls, matching for age and birthplace were also collected for further comparisons. In total, NBS from 95 children diagnosed with ALL between 1992–2006 and 95 non-ALL children were analysed. The patient population included 39 girls and 56 boys with a median age at time of diagnosis of 5 years (5 months–17 years). Eighty-five patients were diagnosed with a B-cell leukaemia, 9 children with a T-cell-origin and one patient was diagnosed with an undifferentiated lineage. Cytogenetic characteristics at diagnosis corresponded to 25 children with hyperdiploidy, 2 patients with a hypodiploidy, 8 children were diagnosed with t(12;21), 2 with t(1;14), 2 children with t(4;11) (MLL-rearrangements) and 3 children were positive for Philadelphia chromosomes t(9;22). One child was positive for both MLL and Philadelphia chromosomes. Overall, 20 children were diagnosed with other cytogenetic changes, 20 children were diagnosed with no chromosomal changes and no cytogenetic analysis was performed in 39 diagnosed very early. This study was approved by The Regional Ethical Review Board, Stockholm, Sweden.

Sample collection

NBS were collected at 2–5 days of age for screening for several inborn metabolic diseases. They corresponded to four droplets of capillary blood containing 3 × 104 nucleated cells/spot blotted into a filter paper (Guthrie card), which are later dried and stored at 4 °C. The median storage time for the NBS was 9.9 years (2.6–22.8).

DNA isolation

DNA was extracted using the minimal essential medium (Barbi et al, 1996; Yamagishi et al, 2006) after punching four uniform discs of 3 mm in diameter from one of four blood spots. Extracted DNA was individually tested for the human albumin gene (ALB) using TagMan Real-Time quantitative PCR (Laurendeau et al, 1999) to ensure the availability of DNA for subsequent DNA amplification and sequencing.

Unbiased NGS

Extracted DNA from the ALL patients and controls was independently pooled. DNA from both pools was randomly amplified with the illustra GenomiPhi V2 DNA Amplification Kit and sequenced using the Illumina MiSeq Sequencing System at Science for Life Laboratory, Stockholm, Sweden. A computational pipeline estimated then the composition of the viral community from resulting sequencing data. The pipeline trimmed adapters and low-quality bases and discarded human sequences by a mapping approach. Multiple assembly strategies were performed on the remaining reads. Taxonomical annotation of contigs and unassembled reads was performed with BLASTN and BLASTX (Altschul et al, 1990) with E value 10−3 to frozen versions of the NCBI databases nt and nr, respectively, applying the lowest common ancestor algorithm in MEGAN5 (Huson et al, 2007). Moreover, further analyses were conducted in sequences that could not be classified with MEGAN. In here, (i) contigs and merged paired-end reads longer than 400 bp were selected, (ii) sequences were translated to proteins in all six open-reading frames, and (iii) virus-like motifs were searched using hmmscan from the HMMER3 suite (Eddy, 2011) against the Pfam-A v28.0 (Finn et al, 2014) and vFam-A 2014 (Skewes-Cox et al, 2014) protein family databases. Viral candidates identified by the pipeline with potential associations to ALL were later investigated using agent-specific PCR.

Results

All samples were positive for the ALB gene after DNA extraction, ensuring thus the presence of genetic material and excluding PCR inhibitors. The MiSeq Sequencing System generated approximately seven million paired-end reads from each pool. About 25% of the sequences were kept after quality filtering and 95% of remaining reads mapped to the human genome. After filtering out human sequences, 5,112 and 3,259 reads kept from ALL and controls, respectively, which were later assembled into contigs. Sequencing results are summarised in Table 1. Unassembled reads and contigs were then assigned a taxonomical classification as described in the ‘Material and methods’ section.

Table 1 Summary of sequencing results

Analysis of the libraries yielded two candidates potentially involved in leukemogenesis. They corresponded to sequence hits detected only in the ALL library, assigned to HHV-6 and human parvovirus B19. Viral assignments of unassembled reads and contigs with BLASTN and BLASTX, and visualisation of virome datasets are shown in Table 2 and Figure 1, respectively. To further investigate a putative role of HHV-6 and parvovirus B19 in ALL development, the presence of the viruses in the original samples was determined by real-time PCR.

Table 2 Viral assignments of reads and contigs with BLASTN and BLASTX
Figure 1
figure 1

Visualisation of virome datasets by MEGAN. Taxonomical annotation of contigs and unassembled reads with BLASTN and BLASTX with E value 10−3 to frozen versions of the NCBI databases nt and nr, respectively, applying the lowest common ancestor algorithm in MEGAN5 (Huson et al, 2007). (A) ALL BLASTN, (B) ALL BLASTX, (C) non-ALL BLASTN, (D) non-ALL BLASTX.

Besides HHV-6 and human parvovirus 6, assignments to human endogenous retroviruses (HERV) and propionibacterium phage (PP) were detected in both libraries. HERVs are considered remnants of ancient retroviral infections, which results in around 8% of the human genome with high similarity to HERV (Belshaw et al, 2004; Nelson et al, 2004). Although HERV sequences should be depleted during human filtering step in the bioinformatics pipeline, the genomic variability of these regions with respect to the human genome reference sequence used in the mapping approach might facilitate HERVs displaying after host depletion. HERV assignments using BLASTN and BLASTX in ALL and controls are shown in Table 3. In particular, 15 sequences were assigned to HERV with BLASTX in ALL whereas 4 and 1 sequences mapped with BLASTX and BLASTN in controls, respectively. Although HERVs are generally considered to be harmless, replicatively active HERVs have been associated with carcinogenesis (Brodsky et al, 1993a, 1993b; Yin et al, 1997; Depil et al, 2002). Hence, the HERV strains found by NGS in ALL were further assessed by PCR. The same HERV composition was found in both ALL and controls, suggesting that there is no association between HERV and ALL in these samples. The other viral finding, PP is considered part of the normal human microbial flora and a likely contaminant from skin contact. One sequence was assigned to PP with BLASTX in ALL whereas 2 and 3 sequences were also assigned with BLASTX and BLASTN, respectively, in controls. No follow-up PCR was performed for PP, as it was detected in both groups and is very unlikely to be associated to disease.

Table 3 HERVs assignments of reads with BLASTN and BLASTX

Regarding additional analyses carried out in sequences not classified with MEGAN, one viral-like motif hit was found in ALL. The hit displayed was a 24 amino acid stretch located in one of the contigs mapping to a nucleopolyhedrovirus protein (vFam_71, E value 2.2e−6). Nucleopolyhedrovirus is a genus of the family Baculoviridae, primarily pathogenic for insects and not capable of replicating in vertebrate cells (Summers, 1975). Owing to the lack of pathogenicity of nucleopolyhedrovirus shown in humans together with the difficulty of oligonucleotide design because of the limited sequence information obtained by NGS, we decided not to follow this finding up using PCR.

HHV-6 and parvovirus B19 determinations by qPCR

Extracted DNA from each ALL and control samples previously analysed by NGS was subjected to Real-Time qualitative PCR for HHV-6 and parvovirus B19 to explore their possible association with the development of ALL.

HHV-6 can be chromosomally integrated (ciHHV-6) into the human genome (Pellett et al, 2012) and transmitted to the descendent after germ-line virus infection (Daibata et al, 1998). Then, the determination of an in utero HHV-6 infection by qPCR might be disturbed by the presence of vertically transmitted ciHHV-6 unless a clear difference between the number of positive PCR reactions in ALL and controls was observed. HHV-6 was only detected in 2 out 94 ALL patients (one patient sample with insufficient material) and 3 out of 95 controls by PCR (Collot et al, 2002). Interestingly, the fact that HHV-6 was found in the control group by qPCR but not by NGS could be attributed to the very low copy numbers of HHV-6 viral genomes in the samples (free or integrated viral particles) and/or high-host content masking the presence of viruses during unbiased NGS. Regarding parvovirus B19, viral determinations were carried out as in (Broliden et al, 1998). The finding of parvovirus was confirmed only in 1 out 94 patient samples and in none of the 95 controls. According to results obtained, HHV-6 and parvovirus B19 in utero infection did not have a major role in ALL development.

Discussion

The role of viral infections during pregnancy in the development of childhood ALL has been investigated in several studies but no specific virus has been defined (Smith, 1997; Eden, 2010). Our group has previously examined the presence of adenoviruses, herpesviruses (cytomegalovirus, Epstein–Barr virus and HHV-6), polyomaviruses and parvovirus B19 by PCR from NBS in children who later in life developed ALL. So far, only adenovirus was identified in the NBS (Priftakis et al, 2003; Isa et al, 2004; Gustafsson et al, 2006, 2007, 2012; Gustafsson and Bogdanovic, 2007) where an increased frequency of adenoviral DNA in ALL versus non-ALL controls was observed. However, this trend could not be replicated in an extended study (Gustafsson et al, 2012).

Methods used in previous studies for pathogen identification enabled us to detect only sequences from the chosen target of different viruses defined by the primers. In this study, unbiased NGS was applied to search for viral candidates in NBS from newborns who later developed ALL. We opted to investigate only the DNA virome because of the nature of the starting material. The available samples consisted of extracted DNA from NBS used in previous studies, where we did not expect sufficient amounts of intact RNA. This technology facilitated the screening of all potential pathogens simultaneously and was capable of inspecting the virome in NBS, which contained low numbers of free viral particles, bacteriophages, intracellular viruses and proviruses together with a large number of white blood cells. Among the commonly detected viruses, the presence of HERV in both patient and control group is likely explained by the detection of integrated retroviral genomes in the human genome, favoured by the high host background in the samples. Similarly, finding of bacteriophage sequences in both groups corresponds to a part of microbial flora in human material and as such was not considered as a potential pathogen.

HHV-6 and parvovirus B19, both ubiquitous pathogens (Cohen and Buckley, 1988; Lamont et al, 2011), were the only viruses identified in ALL group and, therefore, further investigated. HHV-6 is a double stranded DNA virus with observed oncogenic potential in vitro (Razzaque, 1990). A unique feature among the human herpes viruses is the integration of whole HHV-6 genome into one of human chromosomes and germ-line transmission (Pellett et al, 2012), being detected in 1% of the general population (Pellett et al, 2012). The frequency of ciHHV-6 in children with ALL has been reported to be similar to that of the general population (Gravel et al, 2013). The detection of HHV-6 in both patients and controls by PCR could be associated with either active infection or detection of integrated viral genomes, which complicates the assessment of the role of viral infection in leukaemias. However, and according to negative reports obtained in our previous studies (Bogdanovic et al, 2004), the fact that HHV-6 was detected in both groups and in few patients suggests that it does not have a major role in ALL development.

The role of parvovirus B19 in the pathogenesis of leukaemias has been speculated about, but not been broadly examined. In children with ALL, parvovirus infection is associated with serious complications such as cytopenia (Lindblom et al, 2005). Approximately 30–50% of pregnant women are seronegative for parvovirus B19 and vertical transmission is possible following maternal infection in pregnancy (Lamont et al, 2011). In this study parvovirus B19 was identified in one patient and none of the controls and together with the negative results in our previous study, no evidence is provided to support a putative association between parvovirus B19 and ALL. It has been shown that parvovirus B19 DNA remains detectable in human tissues after past infection (Söderlund-Venermo et al, 2002), but it is difficult to conclude whether any of the studied children actually were infected in utero based on the present findings.

All HERV strains detected by NGS in ALL were also found in controls. Although none of the strains tested seemed to be predominant in patients with ALL, the involvement of these mobile genetic elements cannot be fully discarded. Further studies are needed to investigate whether polymorphisms, recombination or transposition to oncogenes or tumour suppressor genes could possibly lead to genetic instability involved in the causation of ALL.

Torque teno virus (TTV), a single-stranded DNA virus, has been frequently detected in human samples (Okamoto, 1999, 2009) with prevalence in serum of healthy individuals of up to 90%. Moreover, TTV has been also detected during pregnancy in serum from mothers of children who later developed leukaemias (Bzhalava et al, 2012), and its transmission during pregnancy has been previously suggested in several studies (Gerner et al, 2000; Schröter et al, 2000). In our study no TTV-like sequences were identified in NBS from ALL children or controls indicating that primary infection with TTV may occur postnatally.

An important limitation of this study was the small amounts of starting DNA and high proportion of human DNA relative to viruses in the blood spots. Both factors limit the sensitivity of NGS for the detection of viruses. Despite these limitations, our approach was capable of detecting parvovirus (5.5 kb) in a pool of 95 samples containing high host background. After individual qPCR inspection only one sample was positive (Ct>27). Nonetheless, caution is needed when interpreting the significance of the negative findings.

In conclusion, unbiased NGS was employed for search for potential DNA infectious agents in neonatal samples of children who later developed ALL. Here, NGS was capable of identifying viral candidates in NBS from ALL-patients characterised by containing high host background. However, further investigation by PCR suggested that viruses reported did not have a major role in ALL development.