ATP7B variant c.1934T > G p.Met645Arg causes Wilson disease by promoting exon 6 skipping

Wilson disease is a recessive genetic disorder caused by pathogenic loss-of-function variants in the ATP7B gene. It is characterized by disrupted copper homeostasis resulting in liver disease and/or neurological abnormalities. The variant NM_000053.3:c.1934T > G (Met645Arg) has been reported as compound heterozygous, and is highly prevalent among Wilson disease patients of Spanish descent. Accordingly, it is classified as pathogenic by leading molecular diagnostic centers. However, functional studies suggest that the amino acid change does not alter protein function, leading one ClinVar submitter to question its pathogenicity. Here, we used a minigene system and gene-edited HepG2 cells to demonstrate that c.1934T > G causes ~70% skipping of exon 6. Exon 6 skipping results in frameshift and stop-gain, leading to loss of ATP7B function. The elucidation of the mechanistic effect for this variant resolves any doubt about its pathogenicity and enables the development of genetic medicines for restoring correct splicing.

Supplementary figure 2: UCSC browser 100 vertebrate multiple sequence alignment showing amino acid substitution patterns for Met645 (exon 6 is on the right, exon 7 is on the left).
Supplementary figure 3: WGS results for the 2F3 HepG2 clone. Top: alignment to the human reference sequence, visualized using the Integrative Genomics Viewer (IGV), reveals three major read clusters, one corresponding to the edited c.1934T>G allele (allele 1) and the others suggesting a partial exon 6 duplication and plasmid insertion (allele 2). Bottom: the reconstructed sequence of allele 2, showing the human genome reference sequence as gray blocks, the contigs obtained by de-novo assembly as green and violet lines, the assembly gap as a gold line, the PCR primer sets used for validation as red arrow pairs (see Supplementary Note 1 for more details, including PCR results); note that the plasmid insertion length is not proportional to the exon 6 length and that the de-novo assembly contigs span the exon 6 as well as nearby genomic reference sequence as suggested by the dashed lines.
Supplementary figure 4: MiSeq amplicon sequencing of ATP7B exon 6 demonstrates that HepG2 clones 1E8 and 1F6 are homozygous for c.1934T>G (chr13:52535985:A>C). Reads were down-sampled and visualized in IGV; nucleotide counts reflect the full read set.
Supplementary figure 5: Sanger sequencing of ATP7B exon 6 demonstrates that HepG2 clone 2A1 is homozygous for c.1931dupA (chr13:52535987:C>CT). Sanger reads correspond to the negative strand and thus are in reverse complement of the human reference sequence; the A insertion is identified by an orange box.
Supplementary Figure 6: qPCR for relative quantity (RQ) of transcripts containing exons 5, 6 and 7 between wild-type HepG2 and edited 2F3 cells. (A) Compared to wild-type cells, edited HepG2 cells have reduced exon 5, 6, 7 spanning transcript; the barplot displays the mean RQ of 12 independent RNA extractions for each cell line, with error bars corresponding to minimum or maximum RQ. (B) Melt curve for all biological replicates is indicative of a single PCR product. (C) PCR strategy with forward (FW) and reverse (RV) primers spanning the boundaries between exons 5 and 6 and 6 and 7 respectively. Supplementary Figure 7: PCR conducted on a single replicate each of HepG2 edited clones 1F6 (homozygous), 1E8 (homozygous) and 2F3 (compound heterozygous with a large structural rearrangement) shows partial exon 6 skipping, as expected, whereas no skipping is detected in WT HepG2. Note that NMD is expected to remove isoforms lacking exon 6 or produced by the allele with a large structural rearrangement, thus band intensity ratios are not directly reflective of percentage exon inclusion.
Supplementary figure 8: full gel images for the two western blot replicates, with different time exposures (top: 2 minutes; bottom: 1 minute, same as displayed in the main text Figure 3 ). Replicates were obtained by repeating protein isolation.
Supplementary figure 9: 2F3 HepG2 cells exhibit greater oxidative stress than WT HepG2 in response to copper challenge. Cells were treated with 0-700 µM CuCl. Reactive oxygen species were measured using fluorescence induced by CellROX Green oxidation (y axis). Error bars represent one standard deviation unit and each fluorescence measurement was performed on four replicate wells.

Supplementary Note 1
(1) ATP7B Isoforms 2 (2) In-silico predictions specificity 9 (  Figure 1.1 ). ENST00000242839 is almost identical to NM_000053, since they differ by only a few nucleotides at the 3'UTR exon 21 end. An isoform lacking only exon 6 is not expected to be stable, as it would be liable to NMD-mediated degradation and, in any case, the mutant protein would present a shift of the majority of its reading frame and it would probably misfold and undergo degradation. GTEx does not suggest the presence of an isoform lacking only exon 6. Similarly, RT-PCR in HepG2 cells using primers on exon 5 and exon 7 shows no evidence of exon 6 skipping (see Figure 1.2 ). A shorter isoform lacking exons 6, 7, 8 and 12 (corresponding to NM_001005918 and ENST00000344297) has been described in the literature as expressed in brain but not in liver [Petrukhin 1994]. Inspecting GTEx junctions, this isoform appears to be present at relatively low levels in cerebellum, but not in liver or other brain regions (see Figure 1.3 ). This shorter isoform lacks two transmembrane domains and localizes predominantly to the cytoplasm rather than to the Golgi [Yang et al. 1997]. The functional role of this isoform is unclear; its incapability to localize to the Golgi and its absence in hepatocytes suggest that it cannot replace the main isoform in regulating copper levels by promoting copper excretion into the bile and ceruloplasmin copper loading. Two additional isoforms are even shorter and do not include exon 6. ENST00000400370 skips exons 3-10, but the corresponding junction is not expressed in any GTEx tissue or cell type. ENST00000635406 contains only four exons, and its junctions are not expressed either (see Figure 1.4 ). Other isoforms lacking exon 6 contain few exons or lack many exons from the 5' portion and were not reviewed in detail. Incidentally, cerebellum (but not other brain regions) presents an additional alternate splicing event, the inclusion of a micro-exon of 20 bp between exon 12 and 13 (exon numbering based on ENST00000242839). The inclusion of this exon must be combined with alternative splicing of at least another exon for the resulting transcript to be in-frame (and presumably stable). ENST00000634308 presents the inclusion of this micro-exon and the exclusion of exon 8, but the junction reads supporting exon 8 skipping are not present in cerebellum (see Figure 1.5 ).     It is also noteworthy that western blot showed only one band in WT and edited HepG2 cells (see main text Figure 3 and Supplementary Figure 7 ). According to the manufacturer , the antibody 1 we used recognizes the protein region comprised between amino acid 150 and 250, which is 1 https://www.abcam.com/atp7b-antibody-epr6793-ab131208.html located in exon 2 of NM_000053, thus we expect that it should be able to recognize the shorter protein isoform lacking exons 6-8 and 12 if it were present.
In conclusion, multiple lines of evidence support using NM_000053 (or ENST00000242839) as the principal transcript for ATP7B in hepatocytes.
(2) In-silico Predictions Specificity Using our splicing predictor (DSN), a large delta score corresponds to a large change in exon recognition, resulting in exon skipping or sometimes in alternative splice site usage. As a reference: • When considering the splicing consensus sequence (acceptor: 3 bp exonic and 20 bp intronic around the splice site; donor: 3 bp exonic and 6 bp exonic around the splice site; while excluding the highly conserved GT/AG dinucleotide, for both donor and acceptor), and classifying ClinVar pathogenic or likely pathogenic variants versus benign or likely benign variants, a delta score < -0.9621 corresponds to a false positive rate of 5% and a true positive rate of 93% • When considering the MaPSy mini-gene assay variants, and classifying variants with skipping > 50% versus variants with limited or no skipping, a delta score < -0.3871 corresponds to a false positive rate of 5% and a true positive rate of 65% The score for c.1934T>G Met645Arg (chr13:52535985:A:C) is -0.7337561, which suggests skipping > 50%, as discussed in the main text of the manuscript.
To demonstrate specificity of DSN splicing predictions, we downloaded all ATP7B variants from gnomAD v2.1.1, excluding c.1934T>G Met645Arg (N = 2,152), and ran the predictor. We then calculated the max allele frequency based on the African, European non-Finnish, Latin American, East Asian and South Asian gnomAD populations (pooled over exomes and genomes), and grouped variants in three bins: • Variants with max allele frequency (0.005,1] were deemed very unlikely to be pathogenic, because mere homozygosity for one of these variants would result in a disorder up to 0.005^2 = 2.5 / 100,000, where the Wilson Disease prevalence is broadly reported as 3.3 / 100,000 • Variants with max allele frequency (0.001,0.005], like c.1934T>G Met645Arg, were considered potentially pathogenic, but some might have a mild effect and be pathogenic only in combination with other variants. • Variants with frequency ≤ 0.001 were considered potentially pathogenic Therefore, we expected that very few or no variants in the first group would exceed the c.1934T>G Met645Arg DSN delta score, and that the third group should be the most enriched in variants with large splicing effect predictions. This is exactly what we observed (see Table 2.1 , Figure 2.1 , Figure 2.2 ).    (3) 2F3 HepG2 Clone Characterization by WGS WGS alignment to the human reference sequence revealed three major read clusters at the ATP7B exon 6 locus. One corresponded to the c.1934T>G allele, showing no other edits or alterations. The other two clusters, corresponding to the second allele, had one portion of the reads aligning to the reference sequence, whereas the other portion of the reads was completely different than the human reference sequence and instead corresponded to a section of the co-transfection plasmid (see Figure 3.1 for the alignment results and Figure 3.2 for the plasmid structure). Based on these results, we inferred the presence of a partial exon 6 duplication flanking a plasmid sequence insertion; we used genomic PCR to validate the breakpoints joining human reference sequence to plasmid sequence (see Figure 3.1 for the diagram describing the reconstructed allele). To resolve the full length of the plasmid insertion, we aligned all reads to the human reference extended by the plasmid sequence as an additional chromosome, and retrieved all reads mapping to the plasmid and to 2 kb of reference sequence around the breakpoints; we then performed de-novo assembly and identified two contigs, which included the human-plasmid breakpoints but were separated by a small assembly gap when compared to the original plasmid sequence (see see Figure 3.1 for the de-novo assembly results). We hypothesized that the small assembly gap was due to a region with low sequencing coverage, and we confirmed this by genomic PCR spanning the two contig ends (see Figure 3.3 for genomic PCR results). Therefore, we concluded that this clone presents one allele with a clean c.1934T>G edit, whereas the other allele has a partial duplication of exon 6 and a ~ 5 kb plasmid insertion. alignment to the human reference sequence, visualized using the Integrative Genomics Viewer (IGV), reveals three major read clusters, one corresponding to the edited c.1934T>G allele (allele 1) and the others suggesting a partial exon 6 duplication and plasmid insertion (allele 2). Bottom: the reconstructed sequence of allele 2, showing the human genome reference sequence as gray blocks, the contigs obtained by de-novo assembly as green and violet lines, the assembly gap as a gold line, the PCR primer sets used for validation as red arrow pairs; note that the plasmid insertion length is not proportional to the exon 6 length and that the de-novo assembly contigs span the exon 6 as well as nearby genomic reference sequence as suggested by the dashed lines.   • Probability of encountering a homozygous patient: spHom = snHom_e / snTot = 0.075625 • One-sided binomial test p-value with alternative hypothesis that the observed number of homozygous patients is lower than expected (spHom): 0.04305 2 Therefore we can conclude that there is a significant depletion of c.1934T>G homozygotes.