Genomic investigations of unexplained acute hepatitis in children

Since its first identification in Scotland, over 1,000 cases of unexplained paediatric hepatitis in children have been reported worldwide, including 278 cases in the UK1. Here we report an investigation of 38 cases, 66 age-matched immunocompetent controls and 21 immunocompromised comparator participants, using a combination of genomic, transcriptomic, proteomic and immunohistochemical methods. We detected high levels of adeno-associated virus 2 (AAV2) DNA in the liver, blood, plasma or stool from 27 of 28 cases. We found low levels of adenovirus (HAdV) and human herpesvirus 6B (HHV-6B) in 23 of 31 and 16 of 23, respectively, of the cases tested. By contrast, AAV2 was infrequently detected and at low titre in the blood or the liver from control children with HAdV, even when profoundly immunosuppressed. AAV2, HAdV and HHV-6 phylogeny excluded the emergence of novel strains in cases. Histological analyses of explanted livers showed enrichment for T cells and B lineage cells. Proteomic comparison of liver tissue from cases and healthy controls identified increased expression of HLA class 2, immunoglobulin variable regions and complement proteins. HAdV and AAV2 proteins were not detected in the livers. Instead, we identified AAV2 DNA complexes reflecting both HAdV-mediated and HHV-6B-mediated replication. We hypothesize that high levels of abnormal AAV2 replication products aided by HAdV and, in severe cases, HHV-6B may have triggered immune-mediated hepatic disease in genetically and immunologically predisposed children.

In March 2022, the report of five cases of severe hepatitis of unknown aetiology led to the UK Health Security Agency (UKHSA) identifying 278 cases in total as of 30 September 2022 1 . Cases, defined as acute non-A-E hepatitis with serum transaminases of more than 500 IU in children under 10 years of age, were found to have been occurring since January 2022 2 . In the UK, 196 cases required hospitalization, 69 were admitted to intensive care and 13 required liver transplantation 1 . Case numbers have declined since April 2022 3 .
UKHSA investigations identified HAdV to be commonly associated with the unexplained paediatric hepatitis, with 64.7% (156 of 241) testing positive in one or more samples from whole blood (the most sensitive sample type 4 ) or mucosal swabs. HAdVs from the blood of 35 of 77 patients were typed as F41. Seven of eight patients in England who required liver transplantation tested HAdV positive in blood samples, with F41 found in five of five genotyped 2 . SARS-CoV-2 infection was detected in 8.9% (15 of 169) of UK and 12.8% (16 of 125) of English cases 2 .
Given the uncertainty around the aetiology of this outbreak, and the potential that HAdV-F41, if implicated (Fig. 1a), could be a new or recombinant variant, we undertook untargeted metagenomic and metatranscriptomic sequencing of liver biopsies from five liver transplant cases and whole blood from five non-transplanted cases (Table 1 and Fig. 1b). The results were further verified by confirmatory PCRs of liver, blood, stool and nasopharyngeal samples from a total of 38 cases for which there was sufficient residual material. We compared our results with those from 13 healthy children and 52 previously healthy children presenting to hospital with other febrile illness, including HAdV, hepatitis unrelated to the current outbreak or a critical illness requiring admission to the intensive care unit. We also tested blood and liver biopsies from 17 profoundly immunosuppressed children with hepatitis who were not part of the current outbreak, in whom reactivation of latent infections might be expected.

Cases
We received samples from 38 children meeting the case definition ( Table 1). All cases were less than 10 years of age and 22 of 23 cases previously tested were positive by HAdV PCR (

Immunocompetent FFPE liver comparators
HAdV-HHV-6 (n = 1) . Source: second-generation surveillance system data, that is, laboratory reports to UKHSA of a positive HAdV result conducted by a laboratory in England and includes any sample type. Dots represent the day of presentation for the 28 of 38 cases for which we had data. b, Case and control specimens by source. CMV, cytomegalovirus; HLH, haemophagocytic lymphohistiocytosis. c, Tests carried out by specimen type. More detail on samples tested and the results can be found in Tables 1  and 2. Not all tests were carried out on all samples due to lack of material. n refers to the total number of cases or controls. The numbers of each sample type may not sum to this total because samples of more than one type were sometimes taken from the same patient. For details, see Table 1. FFPE, formalinfixed paraffin-embedded; tr, received liver transplant. Article respectively). Mapping liver RNA-seq data to the RefSeq AAV2 genome (NC_001401.2) identified high expression of the Cap open reading frame, particularly at the 3′ end of the capsid, suggesting viral replication 6 (Extended Data Fig. 1a), whereas reverse transcription (RT)-PCR of two livers confirmed the presence of AAV2 mRNA from the Cap open reading frame (Extended Data Fig. 1c). In the blood samples, which had not been treated to preserve RNA, we detected low levels of AAV2 RNA reads mapping throughout the genome (Extended Data Fig. 1b).

Nanopore sequencing of explanted livers
Ligation-based untargeted nanopore sequencing was applied to DNA from four of five frozen liver samples. All four samples were initially sequenced at a lower depth (average N50 of 8.37 kb). Six to sixteen AAV2 reads were obtained from each sample (5.57-22.24 million total reads; Supplementary Table 3). Mapping revealed concatenation of the 4-kb genome, compatible with active AAV2 replication 7 . We observed alternating and head-to-tail concatemers, which could  Table 3). Of the reads in the more deeply sequenced datasets, 42-48% comprised randomly linked, truncated and rearranged genomes, with few that were intact and of full length (Extended Data Fig. 2). The remaining reads were less than 3,000 bp long and may represent sections either of monomeric genomes or of more complex structures.

Integration analysis
There was some evidence of AAV2 integration by deeper nanopore sequencing of explanted livers (Supplementary Table 3); however, none of the integration sites was confirmed by Illumina metagenomic or targeted AAV2 sequencing. The results are likely to represent artefacts of this library preparation method; chimeric reads have been described to occur in 1.7-3% of reads 9,10 . Given the number of human reads (72-120 million), we might expect to see this artefact occurring most commonly between AAV2 and human than between AAV2 reads.

Confirmatory real-time PCR
Where sufficient residual material was available, PCR tests were performed for AAV2 (28 of 38 cases), HAdV (31 of 38 cases) and HHV-6B (23 of 38 cases). The results confirmed high levels (cycle threshold (Ct) values: 17-21) of AAV2 DNA in all five frozen explanted livers that had undergone metagenomics (Table 2 and Fig. 2d), and lower levels of HHV-6B and HAdV DNA (Ct values: 27-32 and 37-42, respectively). AAV2 DNA was also detected (Ct values: 19-25) in blood samples from four of five cases that had undergone metagenomics, whereas HAdV, at levels too low to genotype, and HHV-6B were detected in two of four and three of four cases, respectively (one case had insufficient material) ( Table 2). One of the blood metagenomics cases (case 9, JBB1) with insufficient material to test for HAdV and HHV-6B, tested positive for both viruses in the referring laboratory. The AAV2-negative blood sample (case 10, JBB15) was also negative for HAdV but positive for HHV-6B (Table 2). A further ten of ten blood samples tested from cases were positive for HAdV by PCR. Sufficient material was available for AAV2 PCR in six of these (all positive; Ct values: 20-23) and HHV-6B PCR in two (one positive Ct value: 37) (Extended Data Table 1). AAV2 PCR was positive in nine formalin-fixed paraffin-embedded (FFPE) liver samples, including seven from transplanted cases (Ct values: 23-25) and two from non-transplanted cases (Ct values: 34-36; Extended Data Table 1). HHV-6B PCR was positive in six of seven FFPE samples (not case 32) from transplanted (Ct values: 30-37) and zero of two from non-transplanted (cases 30 and 35) cases, with positive HAdV (Ct values: 40-44) in four of nine cases. Three transplanted (cases 32, 34 and 36) and three non-transplanted (cases 35, 37 and 38) cases had serum available for testing. All were AAV2 positive (Ct values: 27-32) and HHV-6B negative, with one transplanted case and one non-transplanted case testing HAdV positive (Extended Data Table 1).
Together, 27 of 28 cases tested were AAV2 PCR positive, 23 of 31 were HAdV positive and 16 of 23 were HHV-6B positive. When results from referring laboratories were included, 33 of 38 cases were positive for HAdV and 19 of 26 cases were positive for HHV-6B (Table 2 and Extended  Data Table 1).

Controls and comparators
To better contextualize the findings in cases with unexplained hepatitis, we selected control groups of children who were not part of the outbreak.

Blood from immunocompetent children
Whole blood from 65 immunocompetent children matched by age to cases (median age of 3.8 years) (Fig. 1b, Extended Data Table 2a and  Supplementary Table 4) who were healthy, or had HAdV infection, hepatitis or critical illness, including requiring critical care, were selected from the PERFORM (personalised risk assessment in febrile illness to optimise real-life management; www.perform2020.org) and DIAMONDS (diagnosis and management of febrile illness using RNA personalised molecular signature diagnosis study; www.dia-monds2020.eu) studies. Both studies recruited children presenting to hospital with an acute-onset febrile illness between 2017 and 2020 (PER-FORM) and July 2020 to October 2021, during the COVID-19 pandemic  HAdV  HHV-6B  AAV2  HAdV  HHV-6B  AAV2  HAdV  HHV-6B  AAV2  HAdV  HHV-6B   Liver   1  JBL1  17  37  29  1,343  0  8  574  0  0  97  −  3   2  JBL4  21  42  32  360  0  8  49  0  0  93  −  2   3  JBL3  20  37  30  1,189  0  4  95  0  0  98  −  2   4  JBL2  20  37  27  1,564  0  203  42  0  0  98  −  94   5  JBL5  21  37  28  266  0  One participant with an HAdV-F41-positive blood sample, originally thought to have unexplained paediatric hepatitis, was later found to have a previous condition that explained the hepatitis and was therefore reclassified as a control (referred to as 'reclassified control' or CONB40) (Supplementary Table 5). This blood sample was negative for AAV2 by PCR (Supplementary Table 5).   HAdV infection is in blue, non-HAdV hepatitis is in green and healthy is in red. e, HAdV levels in whole blood from cases and immunocompromised comparators. f, HHV-6 in whole blood from cases and immunocompromised comparators. g, HAdV, AAV2 and HHV-6 levels in frozen liver tissue from cases and immunocompromised comparators. In the box plots, the bold middle line represents the median and the upper and lower horizontal lines represent the upper (75th percentile) and lower (25th percentile) quartiles, respectively. The whiskers show maximum and minimum values. Each point represents one case or control. n Refers to the number of cases or controls. Where more than one sample for a case was tested, the midpoint of the Ct was plotted. All repeat tests had values if less than 2 Ct values apart, that is, within the limits of methodological error. The upper dashed line marked LLP indicates the LLP threshold (Ct = 38). Points below the second dashed line represent samples below the limit of PCR detection (Ct = 45). Wilcoxon non-parametric rank sum tests were conducted for e and g and a Kruskal-Wallis test followed by pairwise Wilcoxon tests with a Benjamini-Hochberg correction for multiple comparisons were used for d and f. All tests were two-tailed. Numbers show the P value compared with cases. ND, not determined (negative PCR result); NS, not significant.

Liver from immunocompromised children
Frozen liver biopsy material from four immunocompromised children (median age of 10 years) (CONL1-4) who had been investigated for other forms of hepatitis was also tested ( Fig. 1b and Extended Data Table 2b). In three children, liver enzyme levels were raised (Supplementary Table 6); no results were available for CONL4. AAV2 was detected in CONL3 (Ct value: 39) and HHV-6B was detected in CONL2 (Ct value: 34), whereas HAdV was negative ( Fig. 2d and Supplementary Table 5).

Blood from immunocompromised comparators
We also tested immunocompromised children who are more likely to reactivate latent viruses. Whole-blood samples from 17 immunocompromised children (median age of 1 year) with raised levels of liver transaminases (AST/ALT of more than 500 IU) and viraemia (HAdV or cytomegalovirus), all sampled in 2022 (Fig. 1b), were tested for AAV2, HHV-6B and HAdV (Extended Data Table 2b and Supplementary  Table 5). The majority had received human stem cell or solid organ transplants, and none was linked to the recent hepatitis outbreak (Extended Data Table 2b). Five of 15 (33%) whole-blood samples were positive for HHV-6B, whereas 6 of 17 (35%) were positive for AAV2, significantly fewer than in cases (P = 0.005957, Fisher's exact test) and at significantly lower Ct levels (P = 6.517 × 10 −5 , Mann-Whitney test) ( Fig. 2 and Supplementary Table 5). One HAdV-positive and AAV2-positive immunocompromised comparator (CONB23) was also positive for HHV-6B (Supplementary Table 5).
Four of the six AAV2-positive children from the PERFORM-DIAMONDS cohort ( Fig. 2a and Supplementary Table 5) and all six of the AAV2-positive immunocompromised children (Fig. 2a and Supplementary Table 5) were also HAdV positive.

Viral whole-genome sequencing
One full HAdV-F41 genome sequence from the stool of one case (OP174926, case 22) (Supplementary Article 2015 and 2022, including 23 contemporaneous stool samples from children without the unexplained paediatric hepatitis (Figs. 1c and 3a). Sequencing and k-mer analysis 11 of HAdV from 13 cases with partial sequences identified the genotype HAdV-F41 in 12 cases (Supplementary Tables 7 and 8). The partial sequences showed most similarity to the control sequence OP047699 (Supplementary Table 8), mapping across the entire viral genome, thus further excluding a recombinant virus. Single-nucleotide polymorphisms were largely shared between the single HAdV-positive stool from a case (OP174926) and control whole-genome sequences (Extended Data Fig. 3a). Given reported mutation rates for HAdV-F41 and other adenoviruses 12,13 , any differences are likely to have arisen before the outbreak. No new or unique amino acid substitutions were noted in HAdV sequences from cases with only two substitutions overall (Extended Data Fig. 2d) and none in proteins critical for AAV2 replication.
AAV2 sequences from 15 cases, including five from the explanted livers and ten from whole blood from non-transplanted cases, clustered phylogenetically with control AAV2 sequences obtained from four immunocompromised HAdV-positive children with elevated levels of ALT in the comparator group (Extended Data Table 2b) and two healthy children with recent HAdV-F41 diarrhoea ( Fig. 3b and Supplementary Table 9). The degree of diversity and lack of a unique common ancestor between case AAV2 genomes suggest that these are not specific to the hepatitis outbreak, but instead reflect the current viral diversity of the general population. Although comparison of the AAV2 sequences showed no difference between cases and controls, contemporary AAV2 sequences showed changes in the capsid compared with historic AAV2 (Extended Data Fig. 3c). None of these changes was shared with the hepatotropic AAV7 and AAV8 viruses (Extended Data Fig. 3b). The majority of the contemporary AAV2 genomes in cases and controls (20 of 21) contained a stop codon in the X gene, which is involved in viral replication 14 , whereas historic AAV2 genomes contained this less frequently (11 of 35). The significance, if any, of this is currently unknown.

Transduction of AAV2 capsid mutants
Using a recombinant AAV2 (rAAV2) vector with a VP1 sequence (Extended Data Fig. 4a) containing the consensus amino acid sequence from AAV2 cases (AAV2Hepcase) (Extended Data Fig. 3b), we generated functional rAAV particles that transduced Huh-7 cells with comparable efficacy to both canonical AAV2 and the synthetic liver-tropic LK03 AAV vector 15 . Unlike canonical AAV2, the AAV2Hepcase capsid, which contains mutations (R585S and R588T) that potentially affect the heparin sulfate proteoglycan (HSPG)-binding domain, was unaffected by heparin competition, a feature that is associated with increased hepatotropism 16,17 (Extended Data Fig. 4b,c).

Histology and immunohistochemistry
Histological examination of the 12 liver explants and two liver biopsies showed nonspecific features of acute hepatitis with ballooning hepatocytes, disrupted liver architecture with varying degrees of perivenular, bridging or pan-acinar necrosis. There was no evidence of fibrosis suggestive of an underlying chronic liver disease. The appearances were similar to historic cases of seronegative hepatitis of unknown cause in children. There were no typical histological features of autoimmune hepatitis, notably no evidence of portal-based plasma cell-rich infiltrates. A cellular infiltrate was present in all cases, which on staining appeared to be predominantly of CD8 + T cells but also included CD20 + B cells. More widespread staining with the CD79a pan-B cell lineage, which also identifies plasma cells, was also observed (Extended Data Fig. 5). Macrophage lineage cells showed some C4d complement staining, whereas staining for immunoglobulins was nonspecific with disruption of the normal canalicular staining seen in controls due to the architectural collapse. MHC class I and class II staining, although increased in cases, was nonspecific and associated with sinusoid-containing blood cells and necrotic tissue (Extended Data Fig. 6a). No viral inclusions were observed and there were no features suggestive of direct viral cytopathic effect.
Immunohistochemistry was negative for adenovirus. Staining of the five explanted livers with AAV2 antibodies demonstrated evidence of nonspecific ingested debris but not the nuclear staining seen in the positive AAV2-infected cell lines and infected mouse tissue (Extended Data Fig. 6b). All five liver explants showed positive staining of macrophage-derived cells with antibody to HHV-6B, with no staining of negative control serial sections (Extended Data Fig. 6b). No specific HHV-6B staining was observed in 13 control liver biopsies from patients (including three children less than 18 years of age) with other viral hepatitis, toxic liver necrosis, autoimmune and other hepatitis, and normal liver. The control set was also negative for HAdV and AAV2 by immunohistochemistry.
Liver sections were morphologically suboptimal for electron microscopy, but no viral particles were identified in hepatocytes, blood vessel endothelial cells and Kupffer cells.

Transcriptomic analysis
We quantified functional cytokine activity by expression of independently derived cytokine-inducible transcriptional signatures of cell-mediated immunity (Supplementary Table 11) in bulk genome-wide transcriptional profiles from four of the frozen explanted livers. Results were compared with published data from normal adult livers (n = 10) and adult hepatitis B-associated acute liver failure (n = 17) (GSE96851) 18 . Data from the unexplained hepatitis cases revealed increased expression of diverse cytokines and pathways compared with normal liver. These pathways included prototypic cytokines associated with T cell responses, including IFNγ, IL-2, CD40LG, IL-4, IL-5, IL-7, IL-13 and IL-15 ( Fig. 4a and Supplementary Table 12), as well as some evidence of innate immune type I interferon responses. Many of these responses showed substantially greater activity in unexplained hepatitis than in fulminant hepatitis B virus disease. The most striking enrichment was for TNF expression, and included other canonical pro-inflammatory cytokines including IL-1 and IL-6 (Extended Data Fig. 7). These data are consistent with an inflammatory process involving multiple pathways.

Proteomics
Proteomic analysis of the five frozen explanted livers did not detect AAV2 or HAdV proteins. Expression of HHV-6B U4, a protein of unknown function, was found in four of five cases; U43, part of the helicase primase complex, was found in two of five cases; and U84, a homologue of cytomegalovirus UL117, implicated in HHV-6B nuclear replication, was found in two of five cases (Extended Data Fig. 8).
The human proteome from the five frozen liver explants was compared with publicly available data from seven control 'normal' livers, taken from two different studies 19,20 . Both protein and peptide analyses (Fig. 4b,c and Supplementary Tables 13 and 14) found increased expression in unexplained hepatitis cases of HLA class II proteins and peptides (for example, HLADRB1 and HLADRB4), multiple peptides from variable regions of the heavy and light chains of immunoglobulin, complement proteins (such as C1q) and intracellular and extracellular released proteins from neutrophils and macrophages (MMP8 and MPO).
There was no evidence of HAdV, AAV2 or HHV-6B in any of the control livers.

Discussion
Despite reports implicating HAdV-F41 as causing the recent outbreak of unexplained paediatric hepatitis, we found very low levels of HAdV DNA, no proteins, inclusions or viral particles, including in explanted liver tissue from affected cases and no evidence of a change in the virus. By contrast, metagenomic and PCR analysis of liver tissue and blood identified high levels of DNA from AAV2, a member of the Dependoparvovirus genus, which has not been previously associated with clinical disease, in 27 of 28 cases. Replication of AAV2 requires co-infection with a helper virus, such as HAdV, herpesviruses or papillomavirus 21 , and can also be triggered in the laboratory by cellular damage 22 , raising the possibility that the AAV2 detected was a bystander of previous HAdV-F41 infection and/or liver damage. Against this, we found little or no AAV2 in blood from age-matched, immunocompetent children including those with HAdV infection, hepatitis or critical illness (Fig. 2d). AAV2 has been reported to establish latency in the liver 23 ; however, even in critically ill immunosuppressed children with hepatitis in whom reactivation might occur, we detected AAV2 infrequently and at significantly lower levels in the blood or in liver biopsies (Fig. 2d,g).
RNA transcriptomic and real-time PCR data from explanted livers point to active AAV2 infection, although we did not detect AAV2 proteins by immunohistochemistry (Extended Data Fig. 6b) or proteomics (Extended Data Fig. 8) or any viral particles. The abundant AAV2 genomes in the explanted liver are concatenated with many complex and abnormal configurations. AAV genome concatenation may occur during AAV2 replication 8 , whereas abnormal AAV2 DNA complexes and rearrangements have been observed in the liver following AAV gene therapy 7 . Hepatitis following AAV gene therapy has been well described 24-26 , with deaths occurring, albeit rarely 27 . The pattern of complexes typify both HAdV and herpesvirus (including HHV-6B)-mediated AAV2 DNA replication 6 . The presence of HHV-6B DNA in 11 of 12 explanted livers, but not in livers (0 of 2) of non-transplanted children, or control livers as well as the expression, in 5 of 5 cases tested, of HHV-6B proteins, including U43, a homologue of the HSV1 helicase primase UL52, which is known to aid AAV2 replication, highlight a possible role for HHV-6B as well as HAdV in the pathogenesis of AAV2 hepatitis, particularly in severe cases. Although AAV2 is also capable of chromosomal integration 28-30 , we found little evidence of this by long read sequencing, computational analysis of metagenomics data a b c Article or examination of unmapped reads, although further confirmatory studies may be required. Although the pathogenesis of unexplained paediatric hepatitis and the role of AAV2 remain to be determined, our results point strongly to an immune-mediated process. Transcriptomic and proteomic data from the five explant livers identified significant immune dysregulation involving genes and proteins that are strongly associated with activation of B cells and T cells, neutrophils and macrophages as well as innate pathways. The findings are supported by immunohistochemical staining showing infiltration into liver tissue of CD8 + , B cell and B cell lineage cells. Upregulation of canonical pro-inflammatory cytokines including lL-15, which has also been seen in a mouse model of AAV hepatitis 31 , IL-4 and TNF occurred at levels greater even than are seen in fulminant liver failure following infection with hepatitis B virus. Increased levels in the same immunoglobulin variable region peptides and corresponding proteins from both immunoglobulin heavy and light chains across all five livers point to specific antibody involvement 32 . HLA-DRB1*04:01 (12 of 13 cases tested) (Supplementary Table 1) among children in our study supports the same genetic predisposition as mooted in a parallel study conducted in Scotland 33 .
An immune-mediated process is consistent with studies of hepatitis following AAV gene therapy, in which raised AAV2 IgG and capsid specific cytotoxic T lymphocytes are observed in the affected patients; however, whether these directly mediate hepatitis remains unclear 26,34 . Although we did not find that AAV2 sequences in cases differed from those in AAV2 occurring as co-infections in HAdV-F41-positive stool collected from control children during the contemporary HAdV-F41 gastroenteritis outbreak (Fig. 3b), rAAV capsid expressing a consensus capsid sequence from the unexplained hepatitis cases (AAV2Hepcase) showed reduced HSPG dependency, compared with canonical AAV2 (Extended Data Fig. 4), while retaining hepatocyte transduction ability. This points to likely greater in vivo hepatotropism of currently circulating AAV2 than has hitherto been assumed from data on canonical AAV2 (ref. 17). Another member of the parvovirus family, equine parvovirus-hepatitis, has also been associated with acute hepatitis in horses (Theiler's disease) 35 .
There are several limitations to our study. Although other known infectious, autoimmune, toxic and metabolic aetiologies 3 have been excluded including by other studies 36,37 , the number of cases investigated here is small, the study is retrospective, the immunocompromised controls were not perfectly age-matched, and only one immunocompetent and 17 immunocompromised controls were sampled during exactly the same period as the outbreak. Age-matched, immunocompetent controls contemporaneous with the outbreak from the DIAMONDS study, although few in number, were however found to be AAV2 negative in a separate study carried out in Scotland 33 .
Finally, our data alone are not sufficient on their own to rule out a contribution from SARS-CoV-2 Omicron, the appearance of which preceded the outbreak of unexplained hepatitis (Supplementary Table 1). We did not detect SARS-CoV-2 metagenomically even in three participants who tested positive on admission. Moreover, although seropositivity was higher in our cases (15 of 20) than in controls (3 of 10), this was not the case for another UK cohort 36 (38%) or in preliminary data from a UKHSA case-control study 3 , which showed similar SARS-CoV-2 antibody prevalence between unexplained hepatitis cases and population controls (less than 5 years of age: 60.5% versus 46.3%, respectively; 5-10 years of age: 66.7% versus 69.6%, respectively). In line with UK national recommendations at the time, none of the children had received a COVID vaccine.
Although we found little evidence for SARS-CoV-2 directly causing the hepatitis outbreak, we cannot exclude the effect of the COVID-19 pandemic on child mixing and infection patterns. The contemporaneous development of unexplained paediatric hepatitis with a national outbreak of HAdV-F41 (ref. 2) and the finding of HAdV-F41 in many cases suggest that the two are linked. Enteric HAdV infection is most common in those younger than 5 years of age 2 , and infection is influenced by mixing and hygiene 38 . Few cases of HAdV-F41 occurred between 2020 and 2022 and no major outbreaks were recorded 2 . The current HAdV outbreak followed relaxation of restrictions due to the pandemic and represented one of many infections, including other enteric pathogens that occurred in UK children following return to normal mixing 39 . Under normal circumstances, the levels of AAV2 antibodies are high at birth, subsequently declining to reach their lowest point at 7-11 months of age, increasing thereafter through childhood and adolescence 40 . AAV2 is known to spread with respiratory HAdVs, infections that declined during the COVID-19 pandemic, and has not been detected by us in over 30 SARS-CoV-2-positive nasopharyngeal aspirates (data not shown). We also found AAV2 DNA to be present in HAdV-F41-positive stool from both cases and controls (Supplementary Table 5). With loss of child mixing during the COVID-19 pandemic, reduced spread of common respiratory and enteric viral infections and no evidence of AAV2 in SARS-CoV-2-positive nasopharyngeal swabs, it is likely that immunity to both HAdV-F41 and AAV2 declined sharply in the age group affected by this unexplained hepatitis outbreak. Pre-existing antibody is known to reduce levels of AAV DNA in the liver of non-human primates following infusion of AAV gene therapy vectors 41 . The possibility that, in the absence of protective immunity, excessive replication of HAdV-F41 and AAV2 with accumulation of AAV2 DNA in the liver led to immune-mediated hepatic disease in genetically predisposed individuals needs further investigation. Evaluation of drugs that inhibit TNF and other cytokines massively elevated in this condition may identify important therapeutic options for future cases.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-023-06003-w. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Ethics
Metagenomic analysis and HAdV sequencing were carried out by the routine diagnostic service at Great Ormond Street Hospital (GOSH). Additional PCRs, immunohistochemistry and proteomics on samples received for metagenomics are part of the GOSH protocol for confirmation of new and unexpected pathogens. The use for research of anonymized laboratory request data, diagnostic results and residual material from any specimen received in the GOSH diagnostic laboratory, including all cases received from Birmingham's Children Hospital UKHSA, Public Health Wales, Public Health Scotland as well as non-case samples from UKHSA, Public Health Scotland and GOSH research was approved by UCL The sample IDs for the cases and controls are anonymized IDs that cannot reveal the identity of the study participants and are not known to anyone outside the research group, such as the patients or the hospital staff.

Samples
Initial diagnostic testing by metagenomics and PCR was performed at GOSH Microbiology and Virology clinical laboratories. Further WGS and characterization were performed at UCL.

Cases
Birmingham Children's Hospital provided us with explanted liver tissue from five biopsy sites from five cases, five whole blood 500 µl from four cases and serum plasma from one case (Table 1 and Fig. 1b). These were used in metagenomics testing (Table 2), followed by HAdV, HHV-6 and AAV2 testing by PCR and, depending on the Ct value, WGS (Supplementary Tables 7, 9 and 10). We subsequently received 25 additional specimens from UKHSA, Public Health Wales and Public Health Scotland/Edinburgh Royal Infirmary, including 16 additional blood samples, four respiratory specimens and five stool samples, for HAdV WGS and, depending on residual material, for AAV2 PCR testing followed by sequencing (Tables 1 and 2, Fig. 1b and Supplementary Tables 7, 9 and 10). We also received ten FFPE liver biopsy samples and six serum samples from 11 cases from King's College Hospital (Table 1). Of these cases, seven had received liver transplants.

Controls from DIAMONDS and PERFORM
PERFORM recruited children from ten EU countries (2016-2020). PER-FORM was funded by the European Union's Horizon 2020 programme under GA no. 668303. DIAMONDS is funded by the European Union Horizon 2020 programme grant number 848196. Recruitment commenced in 2020 and is ongoing. Both studies recruited children presenting with suspected infection or inflammation and assigned them to diagnostic groups according to a standardized algorithm.

Controls from GOSH for PCR
Blood samples from 17 patients not linked to the non-A-E hepatitis outbreak were tested by real-time PCR targeting AAV2 (Extended Data  Table 2b). These comparators were patients with ALT/AST of more than 500 and HAdV or cytomegalovirus viraemia. These were purified DNA from residual diagnostic specimens received in the GOSH microbiology and virology laboratory in the previous year. All residual specimens were stored at −80 °C before testing and pseudo-anonymized at the point of processing and analysis. Viraemia was initially detected using targeted real-time PCR during routine diagnostic testing with UKAS-accredited laboratory-developed assays that conform to ISO:15189 standards.
In addition to the blood samples, four residual liver biopsies from four control patients referred for investigation of infection were tested by AAV2 and HHV-6B PCR. The liver biopsies were submitted to the GOSH microbiology laboratory for routine diagnosis by bacterial broad-range 16S rRNA gene PCR or metagenomics testing in 2021 and 2022. Three of four control patients were known to have elevated levels of liver enzymes. Two adult frozen liver samples previously tested by metagenomics were negative for AAV2 and positive for HHV-6B (Supplementary Table 5).

Controls from UKHSA
We received a blood sample from one patient with elevated levels of liver enzymes and HAdV infection. We also received one control stool sample from Public Health Scotland/Edinburgh Royal Infirmary and 22 control stool samples for sequencing.

Controls from King's College Hospital
A single FFPE liver biopsy control of normal marginal tissue from a hepatoblastoma from a child was negative for AAV2 and HAdV, but positive for HHV-6B (Ct = 37).

Controls from Queen Mary University of London
We received FFPE liver control samples from ten adults and three children (under 18 years of age) with other viral hepatitis, toxic liver necrosis, autoimmune and other hepatitis, and normal liver, from Queen Mary University of London. PCR gave valid results for samples from two children and eight adults, all of which were negative by PCR for AAV2 and HHV-6, apart from one adult sample, which was positive for HHV-6 at a high Ct value (Supplementary Table 5).

Metagenomic sequencing
Nucleic acid purification. Frozen liver biopsies were infused overnight at −20 °C with RNAlater-ICE. Up to 20 mg biopsy was lysed with 1.4-mm ceramic, 0.1-mm silica and 4-mm glass beads, before DNA and RNA purification using the Qiagen AllPrep DNA/RNA Mini kit as per the manufacturer's instructions, with a 30 µl elution volume for RNA and 50 µl for DNA.
Up to 400 µl whole blood was lysed with 0.5-mm and 0.1-mm glass beads before DNA and RNA purification on a Qiagen EZ1 instrument with an EZ1 virus mini kit as per the manufacturer's instructions, with a 60 µl elution volume.
For quality assurance, every batch of samples was accompanied by a control sample containing feline calicivirus RNA and cowpox DNA, which was processed alongside clinical specimens, from nucleic acid purification through to sequencing. All specimens and controls were spiked with MS2 phage RNA internal control before nucleic acid purification.
Library preparation and sequencing. RNA from whole-blood samples with an RNA yield of more than 2.5 ng µl −1 and from biopsies underwent ribosomal RNA depletion and library preparation with KAPA RNA HyperPrep kit with RiboErase, according to the manufacturer's instructions. RNA from whole blood with an RNA yield of less than 2.5 ng µl −1 did not undergo rRNA depletion before library preparation.
DNA from whole-blood samples with a DNA yield of more than 1 ng µl −1 and from biopsies underwent depletion of CpG-methylated DNA using the NEBNext Microbiome DNA Enrichment Kit, followed by library preparation with the NEBNext Ultra II FS DNA Library Prep Kit for Illumina, according to manufacturer's instructions. DNA from whole blood with a DNA yield of less than 1 ng µl −1 did not undergo depletion of CpG-methylated DNA before library preparation.
Sequencing was performed with a NextSeq High output 150 cycle kit with a maximum of 12 libraries pooled per run, including controls.

Metagenomics data analysis
Pre-processing pipeline. An initial quality control step was performed by trimming adapters and low-quality ends from the reads (Trim Galore! 42 0.3.7). Human sequences were then removed using the human reference GRCH38 p.9 (Bowtie2 (ref. 43), version 2.4.1) followed by removal of low-quality and low-complexity sequences (PrinSeq 44 , version 0.20. 3). An additional step of human sequences removal followed (megaBLAST 45 , version 2.9.0). For RNA-seq, rRNA sequences were also removed using a similar two-step approach (Bowtie2 and megaBLAST). Finally, nucleotide similarity and protein similarity searches were performed (megaBLAST and DIAMOND 46 (version 0.9.30), respectively) against custom reference databases that consisted of nucleotide and protein sequences of the RefSeq collections (downloaded March 2020) for viruses, bacteria, fungi, parasites and human.
Taxonomic classification. DNA and RNA sequence data were analysed with metaMix 5 (version 0.4) nucleotide and protein analysis pipelines. metaMix resolves metagenomics mixtures using Bayesian mixture models and a parallel Markov chain Monte Carlo search of the potential species space to infer the most likely species profile. metaMix considers all reads simultaneously to infer relative abundances and probabilistically assign the reads to the species most likely to be present. It uses an 'unknown' category to capture the fact that some reads cannot be assigned to any species. The resulting metagenomic profile includes posterior probabilities of species presence as well as Bayes factor for presence versus absence of specific species. There are two modes: metaMix-protein, which is optimal for RNA virus detection, and metaMix-nucl, which is best for speciation of DNA microorganisms. Both modes were used for RNA-seq, whereas metaMix-nucl was used for DNA-seq.
For sequence results to be valid, MS2 phage RNA had to be detected in every sample and feline calicivirus RNA and cowpox DNA, with no additional unexpected organisms, detected in the controls. Confirmatory mapping of AAV2. The RNA-seq reads were mapped to the AAV2 reference genome (NCBI reference sequence NC_001401) using Bowtie2, with the -very-sensitive option. Samtools 47 (version 1.9) and Picard (version 2.26.9; http://broadinstitute.github.io/picard/) were used to sort, deduplicate and index the alignments, and to create a depth file, which was plotted using a custom script in R. De novo assembly of unclassified reads. We performed a de novo assembly step with metaSPADES 48 (v3.15.5), using all the reads with no matches to the nucleotide database that we used for our similarity search. A search using megaBLAST with the standard nucleotide collection was carried out on all resulting contigs over 1,000 bp in length. All of the contigs longer than 1,000 bp matched to human, except two that mapped to Torque Teno virus. Nanopore sequencing. DNA from up to 20 mg of liver was purified using the Qiagen DNeasy Blood & Tissue kit as per the manufacturer's instructions. Samples with limited amount of DNA were fragmented to an average size of 10 kb using a Megaruptor 3 (Diagenode) to reach an optimal molar concentration for library preparation. Quality control was perform using a Femto Pulse System (Agilent Technologies) and a Qubit fluorometer (Invitrogen). Samples were prepared for Nanopore sequencing using the ligation sequencing kit SQK-LSK110. DNA was sequenced on a PromethION using R9.4.1 flowcells (Oxford Nanopore Technologies). Samples were run for 72 h including a washing and reload step after 24 h and 48 h.
All library preparation and sequencing were performed by the UCL Long Read Sequencing facility.
Passed reads from Minknow were mapped to the reference AAV2 genome (NC_001401) using minimap2 (ref. 49) using the default parameters. Reads were trimmed of adapters using Porechop v0.2.4 (https:// github.com/rrwick/Porechop/), with the sequences of the adapters used added to adapters.py, and using an adapter threshold of 85. Reads that also mapped by minimap to the human genome (Ensemble GRCh38_v107), which could be ligation artefacts, were excluded from further analysis. The passed reads were also classified using Kraken2 (ref. 50) with the PlusPF database (17 May 2021). The data relating to AAV2 reads in Supplementary Table 3 refer to reads that were classified as AAV2 by both minimap2 and Kraken2 (version 2.0.8-beta), as the results from both methods were similar. Four reads across all four lower-depth samples were classified as HHV-6B by the EPI2ME WIMP 51 pipeline. No reads were classified as HAdV or HHV-6B by Kraken2 in the two higher-depth samples. Alignment dot plots were created for the AAV2 reads using redotable (version 1.1) 52 , with a window size of 20. These were manually classified into possible complex and monomeric structures.
Integration analysis of Illumina data. We investigated potential integrations of AAV2 and HHV-6 viruses into the genome using the Illumina metagenomics data for five liver transplant cases. We first processed the pair-end reads (average sequence coverage per genome = 5×), quality checking using FastQC 53 , with barcode and adaptor sequence trimmed by TrimGalore (phred-score = 20). Potential viral integrations were investigated with Vseq-Toolkit 54 (mode 3 with default settings except for high stringency levels). Predicted genomic integrations were visualized with IGV 55 , requiring at least three reads supporting an integration site, spanning both human and viral sequences. Predicted integrations were supported by only one read, thus not fulfilling the algorithm criteria. Sequencing was performed at a lower depth than optimal for integration analysis, but no evidence was found for AAV2 or HHV-6B integration into the genomes of cases.

PCR.
Real-time PCR targeting a 62-nt region of the AAV2 inverted terminal repeat sequence was performed using primers and probes previously described 56 . This assay has been predicted to amplify AAV2 and AAV6. The Qiagen QuantiNova probe PCR kit (PERFORM and DIAMONDS controls) or the Qiagen Quantifast probe PCR kit (all other samples) were used. Each 25-µl reaction consisted of 0.1 µM forward primer, 0.34 µM reverse primer and 0.1 µM probe with 5 µl template DNA.
Real-time PCR targeting a 74-bp region of the HHV-6 DNA polymerase gene was performed using primers and probes previously described 57 multiplexed with an internal positive control targeting mouse (mus) DNA spiked into each sample during DNA purification, as previously described 58 . In brief, each 25-µl reaction consisted of 0.5 µM of each primer, 0.3 µM HHV-6 probe, 0.12 µM of each mus primer, 0.08 µM mus probe and 12.5 µl Qiagen Quantifast Fast mastermix with 10 µl template DNA.
Real-time PCR targeting a 132-bp region of the HAdV hexon gene was performed using primers and probes previously described 59 multiplexed with an internal positive control targeting mouse (mus) DNA spiked into each sample during DNA purification, as previously described 58 . In brief, each 25-µl reaction consisted of 0.6 µM of each HHV-6 primer, 0.4 µM HHV-6 probe, 0.12 µM of each mus primer, 0.08 µM mus probe and 12.5 µl Qiagen Quantifast Fast mastermix with 10 µl template DNA.
PCR cycling for all targets, apart from the controls from the PER-FORM and DIAMONDS studies, was performed on an ABI 7500 Fast thermocycler and consisted of 95 °C for 5 min followed by 45 cycles of 95 °C for 30 s and 60 °C for 30 s. For the PERFORM and DIAMONDS controls, PCR was performed on a StepOnePlus Real-Time PCR System and consisted of 95 °C for 2 min followed by 45 cycles of 95 °C for 5 s and 60 °C for 10 s. Each PCR run included a no template control and a DNA-positive control for each target.
Neat DNA extracts of the FFPE material were inhibitory to PCR, so PCR results shown were performed following a 1 in 10 dilution.
AAV2 quantitative PCR with reverse transcription. RNA samples were treated with the Turbo-DNA free kit (Thermo) to remove residual genomic DNA. Complementary DNA (cDNA) was synthesized using the QuantiTect Reverse Transcription kit. In brief, 12 µl of RNA was mixed with 2 µl of genomic DNA Wipeout buffer and incubated at 42 °C for 2 min and transferred to ice. For reverse transcription, 6 µl mastermix was used and incubated at 42 °C for 20 min followed by 3 min at 95 °C.
PCR was performed on a StepOnePlus Real-Time PCR System and consisted of incubation at 95 °C for 2 min followed by 45 cycles of 95 °C for 5 s and 60 °C for 10 s. Each PCR run included a no template control, a DNA-positive control and a RNA control from each sample to verify efficient removal of genomic DNA.
Immunohistochemistry. All immunohistochemistry was done on FFPE tissue cut at a thickness of 3 µm.

Adenovirus.
AdV immunohistochemistry was carried out using the Ventana Benchmark ULTRA, Optiview Detection Kit, PIER with protease 1 for 4 min and antibody incubation for 32 min (AdV clone 2/6 and 20/11, Roche, 760-4870, pre-diluted). The positive control was a known HAdV-positive gastrointestinal surgical case.
Preparation of AAV2-positive controls. The plasmid used for transfection was pAAV2/2 (addgene, plasmid #104963; https://www.addgene. org/104963/), which expresses the genes encoding Rep/Cap of AAV2. This was delivered by tail-vein hydrodynamic injection 60  Negative reagent control slides were stained using the same antigen retrieval conditions and staining protocol incubation times using only BondTM Primary Antibody Diluent #AR9352 for the antibody incubation.
Electron microscopy. Samples of liver were fixed in 2.5% glutaraldehyde in 0.1 M cacodylate buffer followed by secondary fixation in 1.0% osmium tetroxide. Tissues were dehydrated in graded ethanol, transferred to an intermediate reagent, propylene oxide and then infiltrated and embedded in Agar 100 epoxy resin. Polymerization was undertaken at 60 °C for 48 h. Ultrathin sections of 90 nm were cut using a Diatome diamond knife on a Leica UC7 ultramicrotome. Sections were transferred to copper grids and stained with alcoholic urynal acetate and Reynold's lead citrate. The samples were examined using a JEOL 1400 transmission electron microscope. Images were captured on an AMT XR80 digital camera.

WGS
Bait design. To produce the capture probes for hybridization, biotinylated RNA oligonucleotides (baits) used in the SureSelectXT protocols for HAdV and HHV-6 WGS were designed in-house using Agilent community design baits with part numbers 5191-6711 and 5191-6713, respectively. They were synthesized by Agilent Technologies (2021) (available through Agilent's Community Designs programme: SSXT CD Pan Adenovirus and SSXT CD Pan HHV-6 and used previously 61,62 ).
Library preparation and sequencing. For WGS of HAdV and HHV-6B, DNA (bulked with male human genomic DNA (Promega) if required) was sheared using a Covaris E220 focused ultrasonication system (PIP 75, duty factor of 10, 1,000 cycles per burst). End-repair, non-templated addition of 3′ poly A, adapter ligation, hybridization, PCR (pre-capture cycles dependent on DNA input and post-capture cycles dependent on viral load) and all post-reaction clean-up steps were performed according to either the SureSelectXT Low Input Target Enrichment for Illumina Paired-End Multiplexed Sequencing protocol (version A0), the SureSelectXT Target Enrichment for Illumina Paired-End Multiplexed Sequencing protocol (version C3) or the SureSelectXTHS Target Enrichment using the Magnis NGS Prep System protocol (version A0) (Agilent Technologies). Quality control steps were performed on the 4200 TapeStation (Agilent Technologies). Samples were sequenced using the Illumina MiSeq platform. Base calling and sample demultiplexing were performed as standard for the MiSeq platform, generating paired FASTQ files for each sample. A negative control was included on each processing run. A targeted enrichment approach was used due to the predicted high variability of the HHV-6 and HAdV genomes.
For AAV2 WGS, an AAV2 primer scheme was designed using primalscheme 63 with 17 AAV2 sequences from NCBI and one AAV2 sequence provided by GOSH from metagenomic sequencing of a liver biopsy DNA extract as the reference material. These primers amplify 15 overlapping 400-bp amplicons. Primers were supplied by Merck. Two multiplex PCRs were prepared using Q5 Hot Start High-Fidelity 2X Master Mix, with a 65 °C, 3 min annealing/extension temperature. Pools 1 and 2 multiplex PCRs were run for 35 cycles. Of each PCR, 10 µl was combined and 20 µl nuclease-free water was added. Libraries were prepared either manually or on the Agilent Bravo NGS workstation option B, following a reduced-scale version of the Illumina DNA protocol as used in the CoronaHiT protocol 64 . Equal volumes of the final libraries were pooled, bead purified and sequenced on the Illumina MiSeq. A negative control was included on each processing run.
All library preparation and sequencing were performed by UCL Genomics.
AAV2 sequence analysis. The raw fastq reads were adapted, trimmed and low-quality reads were removed. The reads were mapped to the NC_001401 reference sequence and then the amplicon primers regions were trimmed using the location provided in a bed file. Consensus sequences were then called at a minimum of 10× coverage. The entire processing of raw reads to consensus was carried out using the nf-core/ viralrecon pipeline (https://nf-co.re/viralrecon/2.4.1; https://doi. org/10.5281/zenodo.3901628). Basic quality metrics for the samples sequenced are in Supplementary Table 9. All samples that gave 10× genome coverage over 90% were then used for further phylogenetic analysis. Samples were aligned along with known reference strains from GenBank using MAFFT 65 (version v7.271), and the trees were built with IQ-TREE 66 (multicore version 1.6.12) with 1,000 rapid bootstraps and approximate likelihood-ratio test support. The samples were then labelled based on type and provider on the trees (Fig. 3a).
For each AAV2 sample, we aligned the consensus nucleotide sequence to the AAV2 reference sequence. From these alignments, the exact coordinates of the sample capsid were determined. We then used the coordinates to extract the corresponding nucleotide sequence and translated it to find the amino acid sequence. Next, we compared each sample to the reference to identify amino acid changes. Amino acid sequences from AAV capsid sequences were retrieved from Gen-Bank for AAV1 to AAV12. Amino acid sequences of capsid constructs designed to be more hepatotropic were retrieved from refs. 16,67. These sequence sets were then aligned to the AAV2 reference sequence using MAFFT 65 . We then compared each construct to the AAV2 reference to identify amino acid changes present, while retaining the AAV2 coordinate set.
For HAdV, genotyping is performed using AYUKA 11 (version 22-111). This novel tool is used to confidently assign one or more HAdV genotypes to a sample of interest, assessing inter-genotype recombination if more than one genotype is detected. The results from this screening step guide which downstream analyses are performed and which reference genome (or genomes) is used. If mixed infection is suspected, reads are separated using bbsplit (https://sourceforge.net/projects/ bbmap/), and each genotype is analysed independently as normal. If recombination is suspected, a more detailed analysis is performed using Recombination Detection Program (RDP) and the sample is excluded from phylogenetic analysis. After genotyping, the cleaned read data are mapped using BWA to the relevant reference sequence (or sequences), and SNPs and small insertions and deletions are called using bcftool (version1.15.1, https://github.com/samtools/bcftools) and a consensus sequence is generated also with bcftools, masking with Ns positions that do not have enough read support (15× by default). Consensus sequences generated with the pipeline are then concatenated to previously sequenced samples and a multiple sequence alignment is performed using the G-INS-I algorithm in the MAFFT software (MAFFT G-INS-I v7.481). The multiple sequence alignment is then used for phylogenetic analysis with IQ-TREE (IQ-TREE 2 2.2.0), using modelfinder and performing 1,000 rapid bootstraps.
Proteomics data generation. Liver explant tissue from cases was homogenized in lysis buffer, 100 mM Tris (pH 8.5), 5% sodium dodecyl sulfate, 5 mM tris(2-carboxyethyl)phosphine and 20 mM chloroacetamide then heated at 95 °C for 10 min and sonicated in an ultrasonic bath for another 10 min. The lysed proteins were quantified with NanoDrop 2000 (Thermo Fisher Scientific). One-hundred micrograms was precipitated with the methanol/chloroform protocol and then protein pellets were reconstituted in 100 mM Tris (pH 8.5) and 4% sodium deoxycholate (SDC). The proteins were subjected to proteolysis with 1:50 trypsin overnight at 37 °C with constant shaking. Digestion was stopped by adding 1% trifluoroacetic acid to a final concentration of 0.5%. Precipitated SDC was removed by centrifugation at 10,000g for 5 min, and the supernatant containing digested peptides was desalted on an SOLAµ HRP (Thermo Fisher Scientific). Of the desalted peptide, 50 µg was then fractionated on Vanquish HPLC (Thermo Fisher Scientific) using a Acquity BEH C18 column (2.1 × 50 mm with 1.7-µm particles from Waters): buffer A was 10 mM ammonium formiate at pH 10, whereas buffer B was 80% acetonitrile and the flow was set to 500 µl per minute. We used a gradient of 8 min to collect 24 fractions that were then concatenated to obtain 12 fractions. These 12 fractions were dried and dissolved in 2% formic acid before liquid chromatography-tandem mass spectrometry analysis. An estimated total of 2,000 ng from each fraction was analysed using an Ultimate3000 high-performance liquid chromatography system coupled online to an Eclipse mass spectrometer (Thermo Fisher Scientific). Buffer A consisted of water acidified with 0.1% formic acid, whereas buffer B was 80% acetonitrile and 20% water with 0.1% formic acid. The peptides were first trapped for 1 min at 30 µl per minute with 100% buffer A on a trap (0.3 mm × 5 mm with PepMap C18, 5 µm, 100 Å; Thermo Fisher Scientific); after trapping, the peptides were separated by a 50-cm analytical column (Acclaim PepMap, 3 µm; Thermo Fisher Scientific). The gradient was 9-35% buffer B for 103 min at 300 nl per minute. Buffer B was then raised to 55% in 2 min and increased to 99% for the cleaning step. Peptides were ionized using a spray voltage of 2.1 kV and a capillary heated at 280 °C. The mass spectrometer was set to acquire full-scan mass spectrometry spectra (350:1,400 mass:charge ratio) for a maximum injection time set to auto at a mass resolution of 120,000 and an automated gain control target value of 100%. For a second, the most intense precursor ions were selected for tandem mass spectrometry. Higher energy collisional dissocation (HCD) fragmentation was performed in the HCD cell, with the readout in the Orbitrap mass analyser at a resolution of 15,000 (isolation window of 3 Th) and an automated gain control target value of 200% with a maximum injection time set to auto and a normalized collision energy of 30%. All raw files were analysed by MaxQuant 69 v2.1 software using the integrated Andromeda search engine and searched against the Human UniProt Reference Proteome (February release with 79,057 protein sequences) together with UniProt-reported AAV proteins and specific fasta created using EMBOSS Sixpack translating patient's virus genome. MaxQuant was used with the standard parameters with only the addition of deamidation (N) as variable modification. Data analysis was then carried out with Perseus 70 v2.05: proteins reported in the file 'proteinGroups.txt' were filtered for reverse and potential contaminants. Figures were created using Origin pro version 2022b.
Transduction of AAV2 capsid mutants. A transgene sequence containing enhanced green fluorescent protein (eGFP) was packaged into rAAV2 particles to track their expression in transduced cells, compared with rAAV capsids derived from canonical AAV2, AAV9 and a synthetic liver-tropic AAV vector called LK03 (ref. 15).
rAAV vector particles were delivered to Huh-7 hepatocytes at a multiplicity of infection of 100,000 vector genomes per cell before analysing eGFP expression by flow cytometry 72 h later.
Recombinant AAV capsid sequence. The VP1 sequence was generated by generating a consensus sequence from a multiple sequence alignment of sequenced AAV2 genomes derived from patient samples, using the Biopython 71 package AlignIO. The designed VP1 sequence was then synthesized as a 'gBlock' (Integrated DNA Technologies) and incorporated into an AAV2 RepCap plasmid (AAV2/2 was a gift from M. Fan, Addgene plasmid #104963) between the SwaI and XmaI restriction sites, using InFusion cloning reagent (product 638948, Clontech).
AAV vector production. rAAV particles were generated by transient transfection of HEK 293T cells as previously described 72 . In brief, 1.8 × 10 7 cells were plated in 15-cm dishes before transfecting the pAAV-CAG-eGFP transgene plasmid (a gift from E. Boyden, Addgene plasmid #37825), the relevant RepCap plasmid and the pAdDeltaF6 helper plasmid (a gift from J. M. Wilson, Addgene plasmid #112867), at a ratio of 10.5 µg, 10.5 µg and 30.5 µg, respectively, using PEIPro transfection reagent (PolyPlus) at a ratio of 1 µl per 1 µg DNA. Seventy-two hours post-transfection, cell pellets and supernatant were harvested and rAAV particles were purified using an Akta HPLC platform. rAAV particle genome copy numbers were calculated by quantitative PCR targeting the vector transgene region. The rAAV2 vector used in this study was purchased as ready-to-use AAV2 particles from Addgene (Addgene viral prep #37825-AAV2).
Analysis of rAAV transduction. Huh-7 hepatocytes (a gift from J. Baruteau, UCL) were plated in DMEM medium supplemented with 10% FBS and 1% penicillin-streptomycin supplement. The cell line was validated by testing for glypican-3 and was not tested for mycoplasma contamination. Cells were plated at a density of 1.5 × 10 3 cells per square centimetre and transduced with 1 × 10 5 viral genomes per cell. Transductions were performed in the presence or absence of 400 µg ml −1 heparin, which was supplemented directly to cell media. Seventy-two hours after transduction, cells were analysed by microscopy using an EVOS Cell Imaging System (Thermo Fisher Scientific) before quantifying eGFP expression by flow cytometry using a Cytoflex Flow Cytometer (Beckman). eGFP-positive cells were determined by gating the live-cell population and quantifying the level of eGFP signal versus untransduced controls.
Human short-read data analysis Cytokine transcriptomics analysis. Cytokine inducible gene expression modules were derived from previously published bulk tissue genome-wide transcriptomes of the tuberculin skin test that have been shown to reflect canonical human in vivo cell-mediated immune pathways 73 using a validated bioinformatic approach 74 . Cytokine regulators of genes enriched in the tuberculin skin 73 test (ArrayExpress accession number E-MTAB-6816) were identified using Ingenuity Pathway Analysis (Qiagen). Average correlation of log 2 -transformed transcripts per million data for every gene pair in each of the target gene modules were compared with 100 iterations of randomly selected gene modules of the same size, to select cytokine-inducible modules that showed significantly greater co-correlation (adjusted P < 0.05), representing co-regulated transcriptional networks for each 59 cytokines. We then used the average log 2 -transformed transcripts per million expression of all the genes in each of these co-regulated modules to quantify the biological activity of the associated upstream cytokine within bulk genome-wide transcriptional profiles from AAV2-associated hepatitis (n = 4) obtained in the present study, compared with published log 2 -transformed and normalized microarray data from normal adult liver (n = 10) and hepatitis B adult liver (n = 17) (Gene Expression Omnibus accession number GSE96851) 18 . To enable comparison across the datasets, we transformed average gene expression values for each cytokine-inducible module to standardized (Z scores) using mean and standard deviation of randomly selected gene sets of the same size within each individual dataset. Statistically significant differences in Z scores between groups were identified by Student's t-tests with multiple testing correction (adjusted P < 0.05).

Proteomics differential expression.
To compare the proteomics data from the explanted livers of cases with data from healthy livers, we downloaded the raw files from two studies 19,20 from PRIDE. The raw files were searched together with our files using the same settings and databases.
We performed differential expression analyses at the protein level and peptide level using a hybrid approach including statistical inference on the abundance (quantitative approach), as well as the presence or absence (binary approach) of proteins or peptides. DEP R package version 1.18.0 was used for quantitative analysis 75 . Proteins or peptides were filtered for those detected in all replicates of at least one group (case or control). The data were background corrected and variance was normalized using variance-stabilizing transformation. Missing intensity values were not distributed randomly and were biased to specific samples (either cases or controls). Therefore, for imputing the missing data, we applied random draws from a manually defined left-shifted Gaussian distribution using the DEP impute function with parameters fun:"man", shift:1.8 and scale:0.3. The test_diff function based on linear models and the empirical Bayes method was used for testing differential expressions between the case and control samples.

HLA typing methods.
Typing was undertaken in the liver centre units. Next-generation sequencing (sequencing by synthesis (Illumina) using AllType kits (VHBio/OneLambda), a high-resolution HLA typing method, was used.

Statistical analysis
Fisher's exact test and two-sided Wilcoxon (Mann-Whitney) nonparametric rank sum test were used for differences between case and control groups. Where multiple groups were compared, Kruskal-Wallis tests followed by Wilcoxon pairwise tests using a Benjamini-Hochberg correction were performed. All analysis were performed in R version 4.2.0.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The consensus genomes from viral WGS data are deposited in GenBank. IDs can be found in Supplementary