Introduction

Due to advances in sequencing technology, genetic variations causative for dystonia are being discovered more rapidly in recent years. Although a number have already been validated in independent studies, whether some putative disease-causing variants are truly specific has yet to be confirmed, and the challenge of proving causation remains for some diseases. X-linked dystonia-parkinsonism (XDP, DYT3, OMIM #314250) is a well-described X-linked recessive syndrome of combined dystonia-parkinsonism, with an underlying genetic cause that has not yet been unequivocally determined.1 The condition is indigenous to Panay Island in the Philippines, and all cases described so far have been linked to Filipino ancestry, suggesting a single genetic founder and genetic homogeneity. Although extremely rare globally, the prevalence of XDP in the Philippines is 0.31 per 100 000; and in Panay Island, 5.74 per 100 000. Ninety-five percent of affected individuals are males; the average age is 44 years (20–70 years); and the average age at onset is 39 years (12–64 years).2

The DYT3 disease locus has been mapped to an ~427-kb region on Xq13.1 flanked by DXS10015 proximally and DXS559 distally,3 however, previous studies did not reveal mutations in the coding regions of known genes inside this interval.3, 4, 5, 6 Instead, five disease-specific single-nucleotide changes (DSC1, DSC2, DSC3, DSC10, DSC12), one 48-bp deletion, and one 2627-bp SVA (SINE-VNTR-Alu element) retrotransposon insertion, all found either within ‘deep intronic’ or intergenic DNA segments, or in a nonconventional exon of the nearby TAF1 gene, have been identified only in affected individuals.7, 8 None of these disease-associated genetic variants are located in protein-coding segments based on current annotations of the human genome, although the regulation of neuron-specific TAF1 isoforms or of other genes in the region has been postulated as disease mechanisms.7, 8, 9 Nevertheless, the exact disease-causing variant has been a matter of debate,9, 10, 11, 12 and theoretically, only one—if any—is functionally related to disease causation, whereas the remaining are merely benign variants.

As it is difficult to conceive that seven independent genetic variants arose de novo simultaneously in a single individual to produce the first XDP case (ie, the genetic founder), we pursued genotyping of all currently known genetic variants associated with XDP in a large group of patients and in a carefully chosen cohort of geographically and ethnically matched controls. Our aim was to detect recombination events in patients or single occurrences of any of the genetic variants in controls, which would inform us through segregation analysis which variant/s is/are truly specific to the phenotype. We took great effort to include informative controls fulfilling two criteria: (1) absence of family history of dystonia or parkinsonism; and (2) ancestry from Panay Island, as it was possible that in previous sequencing studies, mixed-race controls or Filipinos not from Panay Island were investigated. Thus, the great majority of our controls were offspring of the population in which the XDP founder also belonged, as these individuals were to our mind the most likely to harbor benign variants found also in the genomes of XDP patients.

Parallel to genotyping the known genetic variants, we used next-generation sequencing (NGS) to discover new potentially disease-causing single-nucleotide variants (SNV) within and outside the linked region on the X chromosome. We used non-segregating variants that we identified via genome sequencing to define the boundaries of the XDP haplotype and narrow the disease locus. Furthermore, we screened for copy number variants (CNV) within the implicated region and genome-wide using complementary microarray-based analyses. Finally, as none of the genetic variants are translated into protein, we used available online databases to investigate the possibility that the variants alter regulatory regions within the XDP locus.

Materials and methods

Samples

The patient group included 163 adult males (average age±SD: 49±9.6 years, range: 31–73 years) from the Philippines who were clinically diagnosed with XDP upon assessment by movement disorders specialists. In addition, three affected females were analyzed (average age: 62 years). We previously reported on the genetic underpinnings of the manifestation of XDP, an X-linked recessive syndrome, in two of these females, that is, extremely skewed X chromosome inactivation,13 and Turner mosaicism.14 From these patients, five males and one (homozygous) female were selected for genome sequencing. Apart from two samples being a mother and son pair, no other individuals selected for NGS were obviously related. Of note, the patients included in this study represent already a fourth of all cases ever documented in the Philippine XDP Study Group’s registry.

The control cohort consisted of 452 neurologically normal individuals from Panay Island. An additional 21 controls composed of six sons of XDP males and 15 healthy Filipinos who do not come from Panay Island were included. Thus, in total, we investigated 473 controls representing 695 healthy X-chromosomes (251 male, 222 female, average age±SD: 37±12.5 years in males and 38±13.5 years in females). All subjects gave written informed consent prior to genetic analysis and institutional review boards from Germany and from the Philippines approved the study.

Sanger sequencing and short tandem repeat (STR) polymorphism analysis

We genotyped samples via Sanger sequencing using an ABI3500XL Genetic Analyzer (Applied Biosystems, Darmstadt, Germany); the 48-bp deletion and the SVA insertion were detected via PCR amplification and analysis on agarose gels as previously described.15 Haplotype analysis using STR markers within or surrounding the disease locus (DXS10015, DXS10016, DXS10017, DXS10018, and DXS559)7 were analyzed on polyacrylamide gels or on an ABI3130XL Genetic Analyzer running GeneMapper 4.0 software (Applied Biosystems). Primer sequences and conditions are available upon request.

Genome sequencing

One genome (G01) was sequenced by Complete Genomics (Mountain View, CA, USA). The five other genomes (G02–G06) were sequenced by Knome Inc. (Boston, MA, USA) on an Illumina platform. Annotation and filtering of variant calls from both service providers were performed using Annovar,16 and using the following criteria to determine variants of interest: (1) rare variants (minor allele frequency <0.01 or undetermined) based on the 1000 Genomes Project (Annovar version 1000g2012apr), the Exome Variant Server (Annovar version 6500), or the dbSNP database (Build 137); (2) not within known regions of segmental duplication; and (3) not found in three in-house genomes from unrelated individuals without XDP. Variants of interest were submitted to the Leiden Open Variant Database for TAF1 at http://databases.lovd.nl/shared/genes/TAF1, with screening number 0000019895.

Microarray-based copy number analysis

To search for the presence of novel or disease-specific CNVs near the implicated XDP region and elsewhere, we subjected 121 individuals to microarray-based SNP genotyping using two different technologies: the Genome-wide Human SNP Array 6.0 (Affymetrix, Inc., Santa Clara, CA, USA), and the HumanOmniExpress-24 Beadchip Array (Illumina, Inc., San Diego, CA, USA). Detailed methods for both analyses can be found in the accompanying Supplementary Methods.

Database search for regulatory regions

RegulomeDB (http://regulome.stanford.edu/)17 was used to search for predicted regulatory elements within the XDP locus. We further annotated each DSC (±100 bases) using the UCSC Browser (http://genome.ucsc.edu). To predict regulatory regions within the SVA insertion, we first obtained the sequence from GenBank (AB191243.1) then used the BLAT function of the UCSC Browser to search for similar regions in the genome. Chromosomal regions with >99% sequence identity to the SVA insertion were then annotated using the UCSC Browser.

Results

Sequencing of previously described genetic variants

We found that all disease-associated genetic variants, that is, the five DSCs, the 48-bp deletion, and the SVA insertion, completely cosegregate in a single haploblock (an ‘XDP haplotype’) in all but five XDP patients tested (‘XDP haplotype-carriers’: XDP001-XDP158, Table 1); in these samples, there was no disease-associated genetic variant that was observed without the co-occurrence of the others. However, five patients did not carry any of these genetic variants (‘non-carriers’: XDP159-163, clinical features described in Supplementary Table 1). The three affected females that we included also carried all the variants in homozygous (XDP164) or heterozygous (XDP165-166) states.

Table 1 Genotypes at STR markers, previously described, and novel variants in the XDP-linked region in affected and control individuals

Analysis of STR polymorphisms revealed that the 158 XDP haplotype-carriers shared alleles at DXS10016, DXS10017, and DXS10018, except for differences accountable by ‘slippage mutations’. DXS10015 is excluded proximally, and DXS559 distally, from this haploblock, subtending a 427.4-kb region. Notably, these markers were reported by Németh et al.3 previously to include a 350-kb region, however, coordinates have since changed and using the current human genome assembly (GRCh38) calculates the included region to 427 kb.

Among controls, 2 of 473—one 35-year-old male and one female—carried the entire haploblock of XDP-associated genetic variants (C472–C473 in Table 1). At the time the male control was examined, he was 4 years below the average disease onset; it is unknown whether he has exhibited symptoms since then. We did not find any control carrying isolated or singly occurring disease-associated variants.

Genome sequencing and segregation analysis

We then selected three XDP haplotype-carriers (XDP001–003=G01–G03) and two of the four non-carriers (XDP159–160=G05–G06) for genome sequencing; we also included the homozygous female patient (XDP164=G04). Genome sequences were obtained with mean coverages of 51x, 36x, 47x, 35x, 31x, and 39x (G01–G06) and with an average of 97% of known sites mapped. Average read depth for the X chromosome was 30x, and was 23x for the region between DXS10015 and DXS559. Variant search within the linked region in the four genomes carrying the XDP haplotype (G01–G04) revealed 15 variants of interest shared by at least three genomes (Table 2). Comparing G01–G04 with the two sequences not carrying the XDP haplotype (G05–G06), only rs4844149 (a repeat polymorphism) was found shared. Later validation by Sanger sequencing confirmed that this allele was also present in controls (5/43 X-chromosomes). Notably, the paucity of variants found within the linked region is similar to previous sequencing studies7, 8 which used traditional methods (ie, not NGS) to sequence this locus. Outside of the linked region, we found no exonic SNV located on the X chromosome that is shared between G01–G04 and either G05 or G06.

Table 2 Variants in the linked region common among four genomes (G01–G04) sequenced

Out of the 15 variants shared by G01–G04 within the linked region, only two were completely novel (chrX.GRCh38:g.71301439delG, chrX.GRCh38.:g.71653235C>T). The remaining variants included the five DSCs, rs4844149, and seven SNVs previously detected in other studies but have not yet been validated in controls. As we had already determined that the DSCs cosegregate in patients, we then identified six SNVs that flank the DSCs and genotyped these in all of our patient and control samples. The aim was to determine the boundaries of the XDP haplotype through recombination analysis. Non-segregating alleles in five patients establish the distal recombination event at rs41438158 (chrX.GRCh38:g.71633571C>T). In the proximal part of the linked region, alternative alleles at four variants place the boundary of the linked region downstream to rs41532445 (chrX.GRCh38:g.71339837G>A)(Table 1). Thus, we successfully narrowed the XDP disease locus to a 294.7-kb region (chrX.GRCh38:g.71339837_71633571) by validating polymorphisms discovered via genome sequencing in our large patient and control cohorts. This narrowed XDP haplotype subtends a region that includes the genes TAF1, OGT, ACRC, and CXCR3 (Figure 1).

Figure 1
figure 1

Narrowed XDP haplotype on chromosome X. Validation of variants discovered through genome sequencing enabled us to narrow the disease locus and define the exact boundaries of the XDP haplotype (from 427 kb between DXS10015 and DXS559 to 294 kb between rs41416246 and rs41438158, respectively), subtending a region that includes four known genes: TAF1, OGT, ACRC, and CXCR3.

Analysis of copy number variations

To screen for potentially disease-associated CNVs, we first subjected one XDP haplotype-carrier (G02), and one non-carrier (G05) to microarray-based genotyping using the Affymetrix 6.0 array. Although known gains around Xq21.31–21.32 were detected in both individuals, we detected no disease-specific CNV in either sample on chromosome X (Supplementary Figure 1) and genome-wide (data not shown). We then extended the search by genotyping 120 XDP haplotype-carriers using the Illumina OmniExpress array. This extended analysis revealed CNV events in 14 of 105 samples (merged into eight distinct CNV regions of which seven were deletions) (Supplementary Figure 2). All but one loci (ie, Xq21.1) detected in one sample had been reported previously, and the loci detected were only present in a small subset of the patients. Furthermore, no CNV loci were observed within the region of the XDP haplotype on the X chromosome (71.21–71.61Mb, GRCh38). Collectively, the results rule-out CNVs as the molecular cause underlying XDP in these individuals.

Search for regulatory regions

Finally, we used RegulomeDB to search for predicted regulatory elements within the XDP haplotype we have narrowed. This online tool computes a score predicting DNA-acting elements based on multiple public data sets and from the Encyclopedia of DNA Elements (ENCODE) Project. Regions surrounding the DSCs showed either no hit or a score of 6, equivalent to the lowest level of evidence/weak or no data supporting regulatory functions. The only SNV within the linked region with a reasonably high score (2b, evidence of transcription factor binding and a DNaseI footprint) was rs5981113, which is not among the DSCs, and is 10 000-bp away from the nearest XDP-specific genetic variant. We further annotated by hand each DSC using the UCSC Browser, which also collates predicted regulatory functions based on data from ENCODE. Among the DSCs, only the region surrounding DSC1 harbors a DNAseI hypersensitivity site based on experiments done in a lymphoblastoid cell line. Otherwise, the loci of the different DSCs contain no chromatin immunoprecipitation and sequencing (ChIP-Seq) peaks, nor is there evidence of brain histone modification in these regions. There seem to be areas of CpG island methylation based on weak methylated DNA immunoprecipitation and sequencing (MeDIP-Seq) signals but in cell lines other than neuronal.

To predict regulatory regions within the SVA insertion, three chromosomal regions with >99% sequence similarity to the SVA insertion sequence were annotated (chr19.GRCh38:g.40107194_40109853, chr11.GRCh38:g.64170742_64173483, chr4.GRCh38:g.150658643_150661524). Raw neuronal MeDIP-Seq signals were seen in all three regions, indicating sites of CpG island methylation.

Discussion

The combination of conventional sequencing, NGS, CNV analysis, and database search for regulatory functions that we performed in this study did not identify a novel, unique, and unambiguous genetic alteration that can be said to be the pathogenic mutation in XDP. Instead, what we observed is a disease-specific haplotype in almost all XDP patients, consisting of genetic variants occurring together in complete linkage disequilibrium. Thus, the first question raised by our findings is: how did these variants come about in the genetic founder?

It is difficult to formulate a scenario wherein seven distinct genetic variants arose simultaneously in the first XDP case. More plausibly, the original proband was a member of a healthy population within whose genome six of seven XDP-associated variants were already present, and that a subsequent, seventh variant was introduced into the genome de novo. However, among the large number of ethnically and geographically matched controls that we included in this study, we were not able to find any member or offspring of this pre-XDP population carrying only the original six genetic variants. As such, an intriguing hypothesis is that a partial haplotype was brought into Panay Island from another geographical region, and this, in combination with a de novo variant occurring locally later, gave rise to the disease haplotype in the XDP founder. Interestingly, there is one well-known Filipino legend that tells the story of 10 foreigners from the island of Borneo (part of present-day Malaysia) who traveled to and eventually settled in Panay Island. Irregularities, however, have contested the accuracy of these oral histories, and they are currently not considered reliable from an anthropological point of view.18 Nevertheless, we looked at publicly available genome sequences,19 including those from a cohort of 100 Malaysians,20 but found none of the XDP-associated genetic variants in these data sets, although one has to keep in mind the limited number of individuals investigated in these studies against the rarity of XDP-associated alleles.

Based on current knowledge, a likely explanation for the non-carriers of XDP haplotype is that they represent phenocopies. In two, we were able to search genome sequences for possible disease-causing variants in known genes for isolated dystonia (THAP1, GNAL, TH/GCH1/SPR), dystonia-parkinsonism syndromes (ATP13A2, FBXO7), and an adult-onset ataxia (KCNC3) that was previously described in a Filipino pedigree from Panay Island.21 We found no changes that point to another genetic cause in either. Although XDP is at the moment considered to be due to a founder mutation, and all patients are expected to share the same genetic alteration, genetic heterogeneity has also been described in other forms of dystonia occurring in other homogeneous communities/genetic isolates. In Amish-Mennonites with primary dystonia in whom the founder mutation in THAP1 has not been found, other variants in the same gene (but not the original indel), and in other genes for primary dystonia (GNAL and TOR1A) were later discovered to be causative for the phenotype.22 Of note, in the original linkage study3 and in the article that first described XDP-specific genetic variants,7 there was also one patient who did not carry any of the genetic variants reported. This was, however, dismissed as a misdiagnosed case. An explanation for the unaffected male carrying the haplotype could be late age at onset, or reduced penetrance, although this is not supported by the pedigrees we have analyzed to date.

Alternatively, if one considers that the genetic variants we and others have reported are present in the great majority of patients but not in all (ie, non-carriers), while also considering that we have detected the variants in controls, one arrives at the possibility of a yet undiscovered genetic variant that is truly specific for XDP. It is furthermore conceivable that the XDP haplotype and the variants within merely denote the location of this currently unknown genetic alteration. In this study, we used the complementary methods of NGS and microarray-based analyses to detect both SNVs and CNVs; however, other genetic alterations that escape our methods have not been completely ruled out, for example, translocations, chromosomal inversions, and mosaicism. Of note, one study has detected an 8.5-Mb inversion in Xq13.1,23 that spans and includes the entirety of the XDP haplotype, including 57 genes (esv1584074, chrX.GRCh38:g.63249949_71722146). Furthermore, a region in Xq13.1 centromeric to the XDP haplotype has been found to harbor breakpoints for chromosomal inversions associated with X-linked mental retardation.24 Although no formal experiments have been done to detect inversions of and within the haplotype, Makino et al. previously used overlapping contigs and was able to obtain an entire sequence spanning 463 kb between GJB1 (centromeric to DXS10015) and CXCR3,7 likely indicating that no small inversions exist within. Southern blotting of ACRC also has not revealed rearrangements within this gene.5 Interestingly, all five patients who are non-carriers of the disease-associated variants have the same alleles at the DXS10017 marker. Also, two non-carriers of the XDP haplotype share the distal STR marker (DXS10018) with haplotype-carriers.

It must be noted that all disease-associated genetic variants that others and we have identified so far lie in noncoding, deep intronic, or intergenic regions. In particular, the previously described variants are situated either within introns of TAF1 (eg, the SVA retrotransposon insertion in intron 32), or in an untranslated region termed the ‘multiple transcript system’.7 In theory, though not protein-coding themselves, any of the genetic variants within the XDP haplotype could be altering the function of silencers or enhancers of protein-coding genes within or interacting with the region. Although sequence analyses of OGT have revealed no XDP-specific genetic variants,3 this gene has been linked by functional analysis to DYT6, by means of a common THAP motif that associates with it.25 Our own database search, however did not reveal strong evidence of regulatory elements among the DSCs, although this conclusion is highly limited by the paucity of studies available on neuronal tissue, and specifically targeting this region of the X chromosome. We also predicted regulatory regions within the SVA insertion and found possible sites of CpG island methylation in chromosomal regions with similarity to the SVA insertion sequence. One previous study likewise reports that the SVA insertion itself is hypermethylated in XDP patients and further hypothesizes that this status affects the function of a cis-regulatory element within the region.8 However, the purported regulatory element has not been found as of yet, and its function unverified. Thus, studies characterizing the regulatory and functional consequences of all disease-specific variants within the XDP haplotype are warranted, and are part of our future investigations.

In summary, we describe here a narrowed disease-specific haplotype for XDP, inclusive of untranslated variants within introns or nonconventional exons of TAF1. These genetic variants occur in linkage disequilibrium and are found in almost all XDP patients in this study. Any alteration within the haplotype may be the disease-causing variant, but at the moment, the genetic cause of XDP is far from completely understood.