Recently, the Mucin-1 (MUC1) gene has been identified as a causal gene of autosomal dominant tubulointerstitial kidney disease (ADTKD). Most causative mutations are buried within a GC-rich 60 basepair variable number of tandem repeat (VNTR), which escapes identification by massive parallel sequencing methods due to the complexity of the VNTR. We established long read single molecule real time sequencing (SMRT) targeted to the MUC1-VNTR as an alternative strategy to the snapshot assay. Our approach allows complete VNTR assembly, thereby enabling the detection of all variants residing within the VNTR and simultaneous determination of VNTR length. We present high resolution data on the VNTR architecture for a cohort of snapshot positive (n = 9) and negative (n = 7) ADTKD families. By SMRT sequencing we could confirm the diagnosis in all previously tested cases, reconstruct both VNTR alleles and determine the exact position of the causative variant in eight of nine families. This study demonstrates that precise positioning of the causative mutation(s) and identification of other coding and noncoding sequence variants in ADTKD-MUC1 is feasible. SMRT sequencing could provide a powerful tool to uncover potential factors encoded within the VNTR that associate with intra- and interfamilial phenotype variability of MUC1 related kidney disease.
Autosomal dominant tubulointerstitial kidney disease (ADTKD; formerly known as medullary cystic kidney disease (MCKD)) constitutes a prototypic group of usually slower progressive nephropathies. ADTKD can manifest in childhood, but typically comes to medical attention with chronic kidney disease from early adulthood onwards1,2,3,4.
With the recently identified ADTKD-MUC1 subtype (OMIM #174000; originally referred to as MCKD type 1), four genes (UMOD (OMIM #603860, #162000 and #609886; ADTKD-UMOD; MCKD type 2), REN (OMIM #613092; ADTKD-REN), and HNF1B (OMIM #137920; ADTKD-HNF1B)) have so far been repeatedly assigned to the ADTKD-spectrum1,5,6,7,8. Shared clinical features in ADTKD and phenotype overlap with other renal kidney diseases (RKDs) frequently make a definite clinical or histopathological diagnosis difficult4,8,9. Sanger sequencing or NGS based analyses of the first three identified genes is a widely available standard, while analysis of the MUC1 gene still poses a substantial diagnostic problem. ADTKD-MUC1 constitutes the first kidney disorder where mutations reside within a coding variable number of tandem repeat (VNTR contained in exon 2). The inaccessibility of many genomic repeats to direct short-read based sequencing approaches explains why the association of the MUC1 gene to ADTKD has been obscured for long5.
In contrast to simple tandem repeats commonly seen in (coding and noncoding) repeat expansion diseases (e.g. Huntington Disease or Fragile X Syndrome etc.) the actual size of the coding 60 basepair VNTR in ADTKD-MUC1 seems irrelevant to the pathomechanism of ADTKD (Fig. 1)5,10,11. As the VNTR size of each parental MUC1 allele is highly polymorphic (usually between 20 to 125 units per allele) with no expansion or retraction of units, commonly used methods to detect repeat size alterations are per se useless here12.
The prototypic causative variant in MUC1 associated ADTKD, the insertion of a eighth cytosine base (insC) in a seven cytosine stretch within one unit of the VNTR composed of almost identical 60-mer units (according to HGVS nomenclature recommendations a duplication, but for the sake of clarity we stay with term insertion used in most publications), has been repeatedly found worldwide (also arbitrarily referred to as c.428dupC)4,5. Based on the VNTR assembly from three families, the exact location of the insC with regard to the unit number seemed to be variable, although always occurring within the 7C-homopolymer at relative base positions 53–59 of the 60-mer unit (Fig. 1)5. Due to the nature of the VNTR, insC within different located 60-mer unit numbers as well as other small insertions or deletions located at different positions, would result in a frameshift leading to a similar, presumably toxic MUC1-neoprotein that is retained within the cell (Fig. 1)5.
The signal transducing transmembrane MUC1 protein is abundantly expressed in most epithelia and over-expressed in epitheloid malignancies. MUC1 is also expressed in renal development and later found at the apical surfaces of normal kidney epithelia that form luminal surfaces13,14,15,16. The role of MUC1 in the kidney is not very clear apart from the N terminal ectodomain of the glycoprotein being released into the urine. At urothelial surfaces MUC1 and uromodulin form protective polymer layers. Recently it was shown that MUC1 regulates the renal calcium channel TRPV5 and potentially is protective against calcium nephrolithiasis by increasing urinary calcium reabsorption17.
Since the high GC content in combination with the genomic architecture of the VNTR largely prevents direct sequencing, a probe extension (snapshot) assay refined by mass spectrometry validation has been developed as a first diagnostic test. The test needs to accomplish the detection of a single 60-mer unit carrying the insC against a high background of normal units (expected frequency between 0.8% (1/125 units) to 5% (1/20 units) depending on individual VNTR allele sizes)4,5,18. The evident limitations of the assay, the loss of almost all structural and positional information, led us to hypothesize on the existence of other causative alterations located within the VNTR that cannot be interrogated by the current assay.
To generate complete VNTR assemblies we thus adapted targeted single molecule real time (SMRT) sequencing on the PacBio RS II platform. We argued that this method could improve the testing in general and especially of sporadic cases, a group that has been largely omitted from current studies4,5,19,20. We show precise VNTR structural assembly and location of the causative insC, a prerequisite for identification of additional causative variants within the VNTR and detailed characterisation of genotype-phenotype correlations in ADTKD-MUC1.
Material and Methods
All investigations were conducted in accordance with the principles of the Declaration of Helsinki and after obtaining written informed consent. The study was approved by the local institutional review boards of the Universities of Cologne (IRB approval Nr 237/2013). Out of our (non-syndromal) ADTKD cohort previously tested negative for the ADTKD-genes UMOD, REN and HNF1B (including MLPA analysis for HNF1B, SALSA MLPA P241; MRC-Holland) we selected DNA samples from 9 families with positive snapshot findings as well as six families and one sporadic case with negative snapshot results who were still suspected to have ADTKD-MUC1 based on their clinical course and kidney biopsy findings for SMRT sequencing. Clinical and biochemical data were collected retrospectively from medical charts.
MUC1 VNTR PCR
Proband DNA was extracted from blood samples by NaCl precipitation and 30ng genomic DNA were used as templates21. The PCR protocol was adapted from a previous study with slightly modified cycling conditions and PCR composition22. Custom-designed 16 bp barcodes were added to the PCR primers for multiplexing different individuals following the guidelines for using PacBio Barcodes for SMRT Sequencing (Pacific Biosciences). Briefly, 25 µl reactions contained 30 ng of genomic DNA, 0.25 µM of PS2 (5′-XXXXXXXXXXXXXXXGGAGAAAAGGAGACTTCGGCTAC CCAG-3′; X stands for each nucleotide) and PS3 (5′-XXXXXXXXXXXX XXXXGCCGTTGTGCACCAGAGTAGAAGCTGA-3′) primers, 5% DMSO, 0.4 µM dNTP´s, 1 × reaction buffer with 1.5 mM MgCl2, 250 µM MgCl2, and 0.9 U DyNAzyme EXT DNA polymerase (Finnzymes, GRI Research, Braintree, UK). Thermocycling on C1000 touch instruments (Bio-Rad) included initial denaturation (1 min 30 s at 96°), 24 cycles (40 s at 96°, 30 s at 65° and 6 min at 68°) and final extension (10 min at 68°). The reaction was conducted three times for each sample, which were subsequently pooled. Amplification of VNTR (25 µl of each pooled sample) was verified by agarose gel electrophoresis (0.7%). The GeneRuler DNA Ladder (1 kb, 0.5 µg/µl, Fermentas) was used as a size marker. The remaining PCR products (50 µl) were sorted by allele length: (<4000 bp (≤66 variable repeat units) or >4000 bp (≥66 variable repeat units)), diluted 1:2 with H2O and purified using AMPure beads (Agencourt) according to the manufacturer´s instructions. AMPure bead:DNA ratio was adapted to the length of the allele of interest. Molar concentration was verified using the Qubit 3.0 Fluorometric Quantitation system (ThermoFischer).
Library preparation and SMRT sequencing
Target amplicons of equal size were pooled equimolar and a SMRT bell library was prepared as recommended by Pacific Biosciences (10 kb Template Preparation and Sequencing with Low-Input DNA) without an initial fragmentation. Raw sequence data is available upon request. Libraries were quantified by fluorometry and quality was assessed by capillary electrophoresis (Agilent DNA 12000 reagents and chips, Agilent). SMRT bell templates were bound to polymerases using the DNA/polymerase binding kit P6 and v2 primers. Polymerase-template complexes were bound to magnetic beads with the Magbead Binding Kit and sequencing was done on the PacBio RS II sequencer with C4 sequencing reagents with a movie length of 180 or 360 min. Cluster analysis to extract haplotypes of individuals was done with the in house algorithm (Wenzel and Pannes unpublished). Amplicons were split according to the 16mer barcodes of PacBio. Additionally amplicons were split with a Perl script.
Assembly of the MUC1 VNTR using an in-house algorithm
Assembly of both paternal VNTR alleles was performed with the help of an in-house developed algorithm using a database-software (Microsoft ACCESS). The barcode-separated reads were converted into text files using Notepad++ editor and loaded into the database. Each single read was recognized by “ > m” (fasta format) and at the end by barcodes, respectively. The second step included the identification of start and end of each single 60-mer repeat unit by known and universal sequences of fixed repeat units 1 and 9. Part three included the determination of the repeat type based on sequences of actual known repeat variants described in Kirby et al. Certain repeat types are recognized based on selected base pair sequences within one repeat unit. The 60 basepair sequences of all known repeat types were subdivided into 9 “identification sequences”, respectively, each with a length of 4 bases in a row. This procedure overcomes the sequence variation based on reading failures made by the polymerase. Reconstruction of the repeat type assembly can be transferred into Microsoft EXCEL including two sheets: 1. Identified repeat variants corresponding to the capital letter code provided by Kirby et al.5 are shown in the first sheet for every single read allowing complete VNTR assembly. If one repeat unit type cannot be identified this position remains blank providing the possibility to identify so far unknown repeat variants. 2. The second sheet contains the sequence of each single read including the identified repeat variant, respectively, allowing the verification of identified and not identified single repeat units by manual inspection. For each individual two different alleles were identified.
The publicly accessible database listed below:
National Center for Biotechnology Information (NCBI; http://ncbi.nlm.nih.gov/),
UCSC Genome Bioinformatics (http://genome.ucsc.edu/),
1000 Genomes (http://1000genomes.org/) and
Broad Institute (http://exac.broadinstitute.org).
The datasets generated during and/or analysed during the current study using the in-house algorithm are available from the corresponding author on reasonable request.
Prerequisites for MUC1-VNTR long-read sequencing
Using a modified PCR protocol published by Fowler et al. allowed stable generation of PCR amplicons across a wide range of MUC1 VNTR allele sizes (from 25 to 74 60-mer units corresponding to ca. 1.5 to 4.5 kb amplicons) that could be visualized by agarose gel electrophoresis (Fig. 2)22.
The fidelity of the PCR was validated by allele length determination using Southern blot analysis (data not shown).
We observed preferential amplification of shorter VNTR alleles (<4 kb), a phenomenom that is known as short allele dominance23,24. PCR amplification had to be repeated for 8 out of 43 individuals with at least one long allele, for PCR failure.
Targeted SMRT sequencing of the MUC1-VNTR
Direct sequencing of the long-range MUC1-VNTR on the PacBio RS II platform was performed in 43 individuals (16 unaffected, 32 affacted) in 6 independent runs and resulted in an average sequencing depth of 160 complete reads (minimum 4, maximum 600 reads). Libraries were prepared by pooling 5 to 18 long-range (LR) PCR products depending on VNTR size that were subsequently analyzed on single SMRT cells.
Short allele dominance was observed in the first run. Shorter VNTR alleles (<4 kb) were overrepresented (range 32 to 600 complete reads) compared to longer alleles (range 4 to 229 complete reads). To minimize this effect PCR products were pooled for library preparation according to their length in all further runs into two libraries containing amplicons <4 and >4kb, respectively (for detailed information please see Material & Methods). This measure improved read depth for the longer alleles (range 36 to 440 complete reads).
Sequencing had to be repeated for 8 individuals for low complete VNTR coverage, which was mostly associated with allele size >4kb.
MUC1 VNTR SMRT sequencing in ADTKD families previously tested positive
In all 9 families SMRT sequencing reconfirmed the insC previously detected by the probe extension assay with the exception of F7 in whom the insC was first identified by SMRT sequencing and later reconfirmed by the snapshot method (data not shown). In three families (F3*-F5*) the insC has been previously reported by Ekici et al., while the remaining six families (F1-F2, F6-F9) represent new families (Fig. 3)4.
Interfamilial variability of insC position within the X-unit in the VNTR
SMRT sequencing consistently enabled us to generate sequence reads that allowed for complete assembly of both the wildtype and the ADTKD risk allele carrying the insC.
The precise location of the 60-mer unit location harbouring the insC within the MUC1-VNTR could be solved in eight of the nine positive families studied. In family F3 the insC was detected in three affected family members, but for unclear reasons we were not able to locate the precise 60-mer unit here.
Position of the insC ranged (counting only the variable units; see Fig. 4a, Table 1) from the second up to the 39th unit and always occurred on the background of the X unit which is the most abundant 60-mer unit found within the VNTR (51.4–62.0%). The exact position of the causative variants was consistent in all affected individuals within families. Two families (F5 and F7) carried the insC in the 39th variable unit, but their VNTR assembly and allele size was different (Table 1, Figs 3 and 4a). Of note the Swiss family F1 demonstrated exactly the same risk allele assembly and position of the prototypic mutation within the second X unit as one family (family 4) previously reported by Kirby et al.5 Identical assembly could indicate that these two families are related.
SMRT sequencing in unclear non UMOD/HNF1B/REN ADTKD families
To further investigate our hypothesis that the VNTR might be a mutational hotspot prone to other mutations (e.g. small deletions etc.) within or outside of the 7C homopolymer a small cohort of three large families (F12, F13, and F16), three smaller families (F10, F14 and F15) and one sporadic case (F11), all compatible with a diagnosis of ADTKD-MUC1, were analysed (Table 2, Fig. 4b).
No other small insertion or deletion resulting in a frameshift consequence and no causative structural variant could be detected by complete VNTR assembly, although family F12 demonstrated linkage to a 12.9 Mb interval encompassing the MUC1-locus on chromosome 1q21 with a maximum parametric LOD score of 1.8 under assumption of autosomal dominant inheritance (data not shown; only affected individuals analyzed)25. No linkage data were available of the latter two larger families F13 and F16. Large deletions and loss of heterozygosity (LOH) could be excluded in this cohort as we were always able to reconstruct two parental VNTR alleles.
MUC1 mutations outside the VNTR were excluded by Sanger sequencing in all families (plus whole exome sequencing in F12).
Identification of novel repeat variants
We found altogether nine novel repeat variants and annotated them in continuation of the one letter code introduced by Kirby et al. (Fig. 5) since there is no suitable HGVS nomenclature for complex tandem repeat (TR). Three of these repeat variants include the following non-synonymous SNPs on relative repeat position 7G > A (p.G3S), position 22C > A (p.P8T), position 40C > A (p.P14T) termed Q, R, and N. In addition three novel repeat variants with synonymous SNPs were found, which were termed L (rel. position 18G > A, p.P6=), O (rel. position 42G > A, p.P6=), and P (36G > A, p.P6=). The remaining repeat variants 5C, M and S consist of previously described SNPs in a novel combination. Variant 5C is a fusion of previously described variants 5 (fixed variant) and C (non-synonymous SNPs at rel. positions 8G > A (p.G3D) and 59C > A (p.P20Q)). The repeat variant M contains one synonymous SNP at rel. position 21C > A (p.P7=) and variant S three non-synonymous known SNP´s at relative positions 27C > G (p.D9E), 29C > G (p.T10S), and 58C > G (p.P20A).
Sequencing with the PacBio RS II system based on single molecule real time (SMRT) technology offers the benefit of extraordinarily long sequencing reads with high accuracy. We adapted SMRT for targeted sequencing of the mutational hotspot in the coding VNTR of the MUC1 gene. The need for double stranded DNA and the comparatively low capacity of the system made the generation of PCR amplicons spanning the VNTR indispensable. SMRT sequencing has been previously performed for complex genomic regions, including another mucin, encoded by the MUC5AC gene, mainly secreted in the respiratory tract26.
The discovery of the genetic basis of this long sought ADTKD type cannot be overemphasized as it has opened the field of tandem repeats to the spectrum of hereditary kidney disease. The paper by Kirby et al. also points out to the pitfall of missing hereditary disease contained in those repetitive sequences with the current strategies employed in gene identification studies5.
Tandem repeats (TR) are a major class of repetitive DNA which makes up for a significant amount of the humane genome. TR the size of 9 to 80 bp that are repeated a few to more than a hundred times are also called VNTRs as they can be highly polymorphic within a population. From a molecular point of view tandem repeat polymorphisms (TRPs) provide a dynamic source of genomic variability as their mutation rate is higher and the extent of polymorphism (multiallelic) is more diverse than that contained in single nucleotide polymorphisms (SNPs). Expansions in short/simple TR´s are known to cause many monogenic disorders, which mainly result in neurodegenerative disease.
The technical dilemma caused by the difficult nature of the VNTR’s genomic sequence (tertiary structure resulting from the repetitive sequence, GC content, and repeat length) is well illustrated by the surprisingly few publications on ADTKD-MUC1 after the initial description. These inherent problems have created a diagnostic bottleneck for MUC1 testing worldwide with only a few laboratories performing the probe extension assay and only one laboratory being able to validate the snapshot assay by mass spectrometry4,5,20,27.
SMRT sequencing allowed complete assembly of both the wildtype and the ADTKD risk allele and precise positioning of the insC in all families (except family F3). In contrast even mutations occurring at the 5′ and 3′ borders of the VNTR are usually not accessible to Sanger sequencing or to short read based NGS technologies.
The reason why several affected family members from F3 repeatedly showed ambiguous position of the insC on various X units remains unexplained, but could not be related to a low number of complete VNTR reads. Interestingly, the prototypic mutation insC was exclusively found on the background of the X unit in our cohort. In contrast the only other publication providing data on VNTR assembly identified the insC on the variable X- and B- unit and on the”fixed” 5-unit5. Our observation could be an accidental finding as the number of analysed families in our study was still small and the X unit constitutes the most abundant unit within the VNTR. A binomial test calculates a one-tailed P-value of 0.016 for the chance of exactly 8 mutational hits in the X unit in 8 families, which does not support the hypothesis of a chance finding.
By pinpointing the precise repeat unit carrying the prototypic insC we could confirm the allelic heterogeneity of MUC1 associated kidney disease. The mutational spectrum is further expanded by a recent study that identified the first causative MUC1 mutation close to but outside of the VNTR that is accessible to standard sequencing technology and an abstract of Morniere et al. who reported the presence of a new five base pair deletion within the VNTR identified by the probe extension assay in one of 15 families tested positive. It would be interesting to see SMRT-sequencing results of the latter family28,29.
In the largest study on ADTKD-MUC1 24 families, with the insC identified by the snapshot method, have been described with high intra- and interfamilial variability regarding the progression of renal disease and onset of ESRD20. They speculated on the existence of anticipation (worsening of the phenotype in succeeding generations) in some families, a phenomenon that has been noted in the very first publication on the MUC1 locus on chromosome 1q2125. Although the underlying molecular mechanism in a non-expansion TR would be unclear at the moment, precise structural information on the MUC1-VNTR (allele sizes, position of the frameshift mutation, polymorphisms that could possibly alter the structure and thereby the toxicity of the frameshift protein) could provide valuable information to predict a potential role of the hypervariable VNTR in anticipation/disease progression from observer bias. To our knowledge this study contains the most comprehensive data on MUC1-VNTR topology, but was neither designed to nor is powered to answer those questions. Assessing the impact of coding and noncoding variation for potential genotype phenotype correlations in ADTKD-MUC1 will require a larger number of resolved VNTR datasets. Thus, our study provides a methodological basis to sequence this region which might help to uncover the factors explaining the considerable intra- and interfamilial variability in progression of MUC1 related kidney disease.
As the pathomechanisms in ADTKD-MUC1 are likely linked to inflammatory and immune processes in the kidney tissue and structural variation in the VNTR domain has been shown to alter B lymphocyte response, it is not beyond reason to assume a potential role of the VNTR’s topology here (please see Fig. 5 for further information)30,31,32.
In summary we recommend to perform first line or complementary SMRT sequencing in all snapshot positive and negative suspected ADTKD-MUC1 cases (to potentially identify novel mutations occurring within the VNTR in the latter). We propose to document next to the precise position of the causative mutation, allele lengths and (sequence) topology of both the wild type and the disease causing allele in all future studies until the role of the VNTR becomes clearer.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by intramural grants from Koeln Fortune Program/Faculty of Medicine (grant KF Nr 245/2011 to BBB, grant KF Nr 172/2013 to SS), University of Cologne, Germany is acknowledged. We would like to thank all participating patients and family members. We thank the group of Prof. Corinne Antignac and Dr. Vincent Morinière for sharing their probe extension protocol with us. We also are indebted to Prof. Dallas Swallow and Dr. Karine Rousseaus as well as Prof. Stanislav Kmoch for discussion on PCR amplification of the MUC1- VNTR.