Introduction

Primary ciliary dyskinesia (PCD, MIM no. 242650) is a rare genetic disorder caused – as far as is known – by mutations in genes encoding proteins important for ciliary beating.1 The major clinical consequences fall into three categories: (1) in the respiratory tract, immotility or dyskinetic beating of cilia in epithelial cells impairs the mucociliary clearance, leading to recurrent infections, sinusitis and bronchiectasis; (2) in the urogenital tract, infertility occurs in some patients because of the dysmotility of sperm tails; (3) mirror reversal of body organ positioning (situs inversus) is seen in approximately half of PCD patients. It is believed to occur at random and to be caused by the immotility of primary cilia of cells on the ventral surface of the embryonic node. The presence of situs inversus in an affected family defines a PCD subtype, known as Kartagener syndrome (KS, MIM no. 244400).

For diagnostic purposes, cilia in bronchial scrapings can be visualized under a light microscope, whereas the clinical diagnosis of PCD is routinely verified by electron microscopy analysis of respiratory cilia ultrastructure. About 20 different ultrastructural defects have been described in PCD patients, with lesions of outer and/or inner dynein arms being the most frequent defects of the internal anatomy of cilia.2, 3 Recently, it has been shown that immunofluorescence staining of ciliated epithelium with antibodies targeting DNAH5 can detect outer dynein arms defects and therefore aid diagnosis of PCD.4

Inheritance of PCD is autosomal recessive in most cases, although pedigrees with an X-linked mode of inheritance have also been described.5, 6 To date, mutations in four genes have been found as causative for PCD, exclusively in patients with outer dynein arm defects. DNAI1, coding for intermediate dynein chain 1, was selected for mutation screening based on homology with the Chlamydomonas reinhardtii gene, which causes a slow-swimming phenotype when mutated,7, 8 whereas DNAH5, coding for heavy dynein chain 5, was identified by homozygosity mapping in a large family and subsequent sequence analysis.9 DNAH5 and DNAI1 are responsible for 28 and 10% of PCD cases, respectively,10, 11 clearly indicating that other genes are involved in PCD/KS etiology. DNAH11,12 coding for dynein heavy chain, and TXNDC3,13 coding for a thioredoxin family member, were also found to be responsible for PCD, but so far each has been found in only one family.

Analysis of the ciliary proteome in unicellular alga C. reinhardtii led to the identification of a number of proteins potentially involved in cilia formation and maintenance: 360 proteins with high and 293 with moderate confidence.14 Besides the known components of ciliary ultrastructure, there were also 90 signal transduction proteins, and many membrane proteins and metabolic enzymes. Given that our knowledge of ciliary genes comes mostly from studies of simple organisms and that the functional spectrum of the possible candidate genes is very wide, it is difficult to predict which of these should be considered as candidate genes for PCD.

On the basis of genetic linkage studies, extensive genetic heterogeneity of PCD is postulated. A total-genome scan performed in 31 multiplex families did not reveal any predominant locus, rather it showed several peaks with suggestive and indicative LOD scores.15 Another genome scan, performed in five families of Arabic origin and with reported consanguinity, revealed linkage with chromosome 19q13.3 in only three of the families, thus confirming locus heterogeneity.16 Two additional loci, on 16p12 and 15q, were indicated by studies on genetically isolated, but heterogeneous, populations from the Faroe Islands and the Israeli Druze.17

We earlier performed genome-wide linkage analysis in 52 families with KS and reported a region on chromosome 15q24–25, between D15S973 and D15S1037, linked to KS.18 After obtaining a significant LOD score of 4.34 with D15S154 (using a strictly defined disease model), we defined a region of 3.5 cM (2.82 Mb) as containing the candidate gene.18 In the present study, we have performed further fine-mapping of the region of interest and screened all 18 genes in the linked region and three neighboring genes for mutations in seven KS patients from the families that contributed most to the linkage results.18

Materials and methods

Biological material

DNA samples from the existing PCD collection18 came from 31 Caucasian families (25 of Polish and 6 of Slovak origin) classified as KS families; none was from a genetically isolated population. Each family had at least one member diagnosed with PCD and exhibiting situs inversus, but with no other major anomalies or dysmorphologies present. The group studied consisted of 38 affected individuals and 99 unaffected family members. The primary complaints in these patients were derived from the respiratory tract and the clinical picture included symptoms of sinusitis, nasal polyps, bronchiectasis, recurrent infections of the upper respiratory tract. All cases were confirmed by a low concentration of nitric oxide (NO) measured in exhaled air.19 NO was measured from the nasal cavity by a chemiluminescence analyzer with a threshold value of 200 ppb for diagnosing PCD. In 60% of the families, the clinical diagnosis was confirmed by transmission electron microscopy analysis of bronchial cilia ultrastructure; the analysis revealed defects of outer or outer and inner dynein arms; a representative EM picture is shown in Supplementary Figure 1. Direct microscopy of bronchial scrapings was used to confirm the clinical diagnosis of PCD in the patients that did not have electron microscopy imaging of cilia. KS inheritance in these pedigrees was in agreement with general assumptions for an autosomal recessive disease model. KS in these families has previously been shown to be linked with the region between D15S973 and D15S1037, assuming full penetrance, 0.0001 phenocopies and the disease allele frequency of 0.005. The genome-wide significant LOD score of 4.34 was obtained using a cohort of 52 families.18 Pair-wise LOD score analyses were performed using the FASTLINK program, and multipoint LOD scores for markers on chromosome 15 were calculated using GENEHUNTER. Multipoint LOD scores allowing for locus heterogeneity were calculated using Simwalk2. The 95% confidence interval localized the KS locus to a 3.5 cM region between D15S973 and D15S1037.

Recombination mapping using microsatellite markers

To refine the candidate gene region between D15S973 and D15S1037, seven markers (D15S1027–380 kb–D15S524–33 kb–AFMA085AWB9–580 kb–D15S989–219 kb–D15S1005–58 kb–D15S969–695 kb–D15S551), were genotyped in all 38 affected individuals and 99 unaffected family members. Amplification products, obtained using 5′ end fluorescent-dye-labeled primers, were pooled and analyzed on an automated ABI Prism 3730 DNA sequencer (Applied Biosystems, Foster City, CA, USA). Genotypes were scored using GENMAPPER (v3.5NT) software. Inheritance was checked using PEDCHECK20 and haplotypes in the 31 pedigrees were constructed from microsatellite data using SIMWALK2.21

Analysis of candidate genes

Sequence analysis was performed in seven affected individuals from the families with the highest LOD score for marker D15S1005.18 Sequences of 135 exons and surrounding intron sequences of the 20 genes located in the region were obtained using Ensembl genome browser version 42 (Figure 1; Table 1). AL109678, not present in Ensembl, was also sequenced as it is an mRNA coding sequence annotated in the UCSC genome browser. At least 200 bp upstream of the transcription start site were also analyzed and the 3′UTR was sequenced in all genes except for TMC3, KIAA1199, ARNT2 and TMED3. The Primer3 program was used for designing PCR primers yielding products of 400–500 bp;22 primers were in the intronic sequences flanking the exons to be analyzed. PCR reactions were performed using standard protocols (data available upon request). Amplification products were purified, quantified on a 2% agarose gel, and diluted for direct sequencing on an automated ABI Prism 3730 DNA sequencer using BigDye Terminator Sequencing Standards (Applied Biosystems). Sequences were assembled using ContigExpress (Vector NTI suite v8.0, Informax Inc.).

Figure 1
figure 1

The location of 21 genes in the linkage region on chromosome 15q24-25. The minimal KS gene-containing region is indicated by an arrow.

Table 1 Genes included in the Kartagener syndrome sequencing study

Sequencing reactions were performed first with the forward primer, but if the quality of the sequencing electropherogram was unsatisfactory, or if single-nucleotide polymorphisms (SNPs) were found, they were repeated with the reverse primer. Each of the newly identified variants was validated through an independent PCR and sequencing reaction. The sequenced fragments were only included in the results if data were obtained for at least six of the seven patients. Seven reactions could not be amplified despite at least three attempts with two different sets of primers (nucleotides in mRNA sequence: BCL2A1, 602–710; FAM108C1, 1–343; C15ORF37, 421–640; TMED3, 260–299; ENSG00000180725, 457–618; FAH, 683–783, 991–1037). Reactions failed to use control as well as patients' DNA, indicating PCR problems related to the local sequence composition of those genomic fragments. All variations found in the protein-coding sequence and not previously described in public databases were sequenced in a control population group consisting of 48 unrelated, healthy individuals of Polish origin.

Results

Seven microsatellite markers located in the 3.5 cM region reported to be linked to KS were genotyped in 31 families that had positive LOD scores for chromosome 15q in our genome linkage scan18 (Figure 1). Haplotype analysis was concordant with linkage to chromosome 15q24–25 (data not shown). A recombination was mapped in family numbers 114 and 126 (Figure 2), 1 Mb more telomerically from the previous location defined by marker D15S937, thus narrowing the minimal gene-containing region down to 1.8 Mb, between markers AFMA085AWB9 and D15S1037.

Figure 2
figure 2

Pedigrees of the KS families included in the sequencing. The results of the pair-wise LOD score for marker D15S1005 are indicated below each family. The recombination (marked by a red arrow) defining the minimal KS gene-containing region can be observed in pedigrees 114 and 126. Individuals included in the sequencing are indicated by black arrows. Red dots indicate individuals exhibiting situs inversus. The numbers beside marker names indicate the intermarker distances in kilobase pairs. Missing data are denoted by zeros. The genotypes of the parents in pedigrees 117 and 126 were deduced from the genotypes of the siblings.

The analysis of haplotype sharing between nonrelated KS patients was inconclusive. The most promising block, encompassing three markers (D15S524, AFMA085WB9 and D15S989) and spanning 0.6 Mb, was present on five nonrelated, affected chromosomes (families 103, 117, 136, 142 and 147) and on one untransmitted parental chromosome (data not shown).

Seven families with the highest positive LOD scores were selected for sequence analysis of all the genes located in the 1.8 Mb region (Figure 2). These families contributed 70% of the LOD score for marker D15S1005 on chromosome 15q,18 and it was interesting that the DNAI1 mutation analysis was negative in all seven families. The DNAH5 contribution was excluded in four families (114, 117, 136 and 147; Figure 2), based on the analysis of SNPs located within the gene (discordant genotypes in the affected sibs), while in the three remaining families, the analyzed SNPs in DNAH5 were uninformative (unpublished data). As the microsatellite data confirmed linkage of the disease to 15q24–25 in all seven families, only one affected individual, representative of each pedigree, was subjected to DNA sequencing. Supplementary Table 2 summarizes the clinical findings of the seven families included for sequencing.

We analyzed the 18 coding sequences located between markers AFMA085WB9 and D15S1037 (Table 1). We also sequenced three genes located in the vicinity of the region, which could be functionally related to cilia: MESDC2, which is important in embryonic development; C15ORF26, which is expressed in lung and testis and displays homology to a flagellar protein identified in Chlamydomonas;14 and TMC3, a membrane protein with partial homology to nephrocystine, as indicated by Blast search of the NCBI nr database (Figure 1). Except for C15ORF26, none of the genes have been reported to be a part of cilium.

In total, we analyzed 83 kb of genomic sequence in seven individuals (35 kb of coding sequences and 48 kb of intronic/flanking DNA). We found 60 SNPs, 45 of which were already present in public SNP databases. Twelve SNPs, including the four previously unknown ones, were located in the protein coding regions. None of the identified SNPs in the coding regions resulted in a stop codon change or a frame shift; six were amino-acid substitutions and seven were synonymous codon changes. Twenty-one SNPs, including seven novel ones, were located in the untranslated regions. Four SNPs were located in regions upstream of the transcription start site. Twenty-seven variations were located in noncoding regions, six of which were previously unknown. None of the SNPs found were closer than 10 bp to a splicing site. Variations in the protein-coding sequence, previously not described in SNP databases, were sequenced in 48 individuals of Polish origin to estimate population frequencies. Except for two SNPs (W399R in ARNT2 and A506G in AWP1), all the variations were found in the healthy controls. A complete list of the SNPs found is given in Table 2 and the distribution of all SNPs in the patients can be found in Supplementary Table 1.

Table 2 List of all sequence variations found in 7 Kartagener syndrome patients

Discussion

We have performed a follow-up study to our previous finding18 of linkage to chromosome 15q24–25 in families with KS. Fine-mapping of the 3.5 cM (2.82 Mb) linked region in 31 KS families allowed us to narrow down the minimal candidate gene region to 1.8 Mb. Analysis of haplotypes composed of microsatellite markers revealed that there was no extensive haplotype sharing among the affected individuals, arguing against the presence of only one or a few frequent mutations occurring on a common haplotype background. On the other hand, the absence of haplotype sharing does not exclude the possibility of the presence of a causative gene or genes in the linked region. The linkage analysis is independent of a founder effect. There are examples in literature of genes involved in rare recessive diseases, but not showing evidence of haplotype sharing. This happens because most of the patients are compound heterozygous for unique mutations in the given gene.23

In searching for the KS gene in the 15q24-25 region, we found 37 SNPs in the mRNA-coding sequences and promoters of 14 genes, including 6 nonsynonymous codon changes. No evidently pathogenic mutations, such as the introduction of stop codons or frame-shift mutations, were detected.

Given the rare occurrence of the disease (approximately 1:40 000), the expected cumulative frequency of all the disease variant(s) in the general population should be below 1%. We also sequenced fragments harboring variants in mRNA-coding sequence in 96 control chromosomes (Table 2). One nonsynonymous amino-acid change (W399R in ARNT2), and one silent codon change (A506G in AWP1) were not seen in the controls, suggesting that their frequencies might be as low as 1%. However, each of these two variants (in two different genes) was found in heterozygous form in only a single patient, with no other possibly pathogenic complementary allele; these are therefore unlikely to represent the disease-causing mutations. None of the remaining SNPs were particularly rare and their occurrence in patients did not significantly deviate from population frequencies (Table 2).

In conclusion, we did not find any variants that directly fulfilled the criteria of a pathogenic mutation for KS in any of the 21 genes located in the minimal gene-containing region, as indicated by linkage analysis (while assuming the rare occurrence of the disease and the recessive mode of inheritance). The analyzed region may, however, contain pathogenic variants that we failed to find. We did not sequence introns that could contain regulatory elements or harbor gain-of-splicing site mutations. It is also possible that some of the genes have alternative exons not annotated in the current databases and thus not analyzed in our study, and we cannot exclude the possibility that some of the genes harbor heterozygous deletion(s) of one or more exons. We did not detect any extended homozygous regions, in neither the microsatellite data nor in the SNP data, but the relatively low density of heterozygous SNPs could have left some regions noninformative with respect to their purported hemizygosity (see Supplementary Table 1).

As the pedigree structures in the cohort we studied were simple, we cannot exclude the possibility that some of the families are false positives, although none of the families was crucial for the significant LOD score. Hence, despite the statistical significance of the LOD score value, it cannot entirely be excluded that the 15q locus is a false positive locus. The current location of the linkage interval is based on two families (114 and 126; Figure 2) that contribute to the left recombination and on one family18 that contributes to the right recombination. This last family contains only one affected individual and, therefore, contributed very little to the overall LOD score. Under less stringent disease models, the linkage peak would still be significant with a 95% confidence interval matching the minimal gene-containing region. However, the linkage curve would be positive over a larger region with more than 200 genes. Hence, we cannot exclude that the KS gene is located closer to the 15q telomere. Given that this larger region does not contain obvious candidate genes, more families will be needed to establish the exact KS region.