Introduction

Shwachman-Diamond syndrome (SDS, MIM 260400) is a rare multi-organ disorder with estimated incidence of 1 : 77 000.1 It presents in infancy with failure to thrive and recurrent infections. The phenotypic spectrum can be broad and variable, but exocrine pancreatic dysfunction and haematological abnormalities are consistent features.2 Other common manifestations include short stature, skeletal abnormalities and liver dysfunction. Serious infections and acute myeloid leukaemia are primary concerns for morbidity and mortality in the syndrome. Segregation analysis in SDS has established autosomal recessive mode of inheritance,3 but the molecular defect remains unknown. Recently, a locus co-segregating with the disease on chromosome 7 was identified in a genome-wide scan, and recombination events defined a minimal 2.7 cM interval spanning the centromere.1 All the families analysed supported linkage to 7p12-q11, consistent with a single locus for SDS.

To further refine the disease locus, and facilitate positional cloning of the disease gene, a linkage disequilibrium mapping approach was taken. Transmission disequilibrium test (TDT) was used as a test of association in the presence of linkage,4 and haplotypes were constructed to identify founder chromosome(s) and ancestral recombination events. We constructed a physical map of the disease locus using somatic cell, radiation hybrid and STS-content mapping to establish the order of published and newly characterised markers used in genetic analysis, as well as to identify candidate genes for SDS.

As the best characterised gene mapped to the critical genetic interval, tyrosylprotein sulfotransferase 1 (TPST1) was selected for mutation analysis, given that both its function and expression pattern are consistent with multi-organ defects observed in SDS. It is widely expressed, and codes for a Golgi transmembrane protein involved in posttranslational modification of secreted and membrane-bound proteins.5 Involvement of TPST1 in the syndrome was assessed by exon sequencing, Southern blotting and expression analysis.

Materials and methods

Patients

Most of the families with SDS included in the study have been described elsewhere.1,2,6 Additional families were obtained through ongoing recruitment. Diagnosis of SDS was based on documented evidence of exocrine pancreatic and bone marrow dysfunction, with the latter most commonly involving chronic neutropenia.2 Consent was obtained from all participating families, and procedural approval was obtained from the human subjects review board of The Hospital for Sick Children (HSC). Genomic DNA was isolated from leukocytes or EBV transformed lymphoblasts.7

Genotyping and genomic library screening

Primer sequences were obtained from the Genome Database (http://gdbwww.gdb.org/) or designed using Primer3 software.8 Microsatellite marker alleles were amplified by PCR, resolved on a 6% sequencing gel, transferred to Hybond-N+ membrane and visualised by hybridisation to one of the PCR primers end-labelled with [α-32P] dCTP.9 SNP markers were typed by allele-specific PCR.9 New polymorphic markers derived from CEPH YAC 763g2 were identified by hybridisation of (CA)15 oligonucleotide to 800 YAC subclones,11 followed by sequencing.

Library screening of the chromosome 7-specific YAC library12 and the RPCI-11 BAC library13 was performed by PCR and oligonucleotide hybridisation, respectively.

Mutation analysis

DNA from anonymous unrelated individuals of Canadian origin was obtained from The Centre for Applied Genomics, HSC. Primers for amplification and sequencing were selected by the Primer3 software. Sequencing was performed using radiolabelled terminators (Amersham Pharmacia Biotech), and included at least 19 nucleotides of intron sequence at the intron-exon junctions. Sequence variants were characterised by bi-directional PCR amplification of specific alleles (Bi-PASA).14 For Southern blot analysis, genomic DNA was digested separately with HincII and BglI. The three probes used for hybridisation were amplified from genomic DNA or cDNA, and correspond to partial exon I (nucleotides 164–859 of TPST1 mRNA), complete exons II–IV (nucleotides 928–1223), and partial exon V (nucleotides 1282–1754). For RT–PCR analysis, total RNA was isolated from EBV transformed lymphoblasts,15 and reverse transcribed with SuperScriptTMII (Life Technologies) using random hexamers. The BLAST algorithm was used for database search and alignment.16

Results

Genetic analyses

The linkage disequilibrium study included 34 families of diverse ethnic origin from Canada, USA, Central and South America, Europe and Australia. Patients, parents and any unaffected siblings were genotyped at polymorphic marker loci extending across the SDS locus. Alleles at 19 marker loci (from D7S1485 to D7S482 in Figure 1, excluding RL15, RL14, B236I1 and D7S2503) were evaluated for association with disease using TDT. Transmissions from heterozygous parents to affected offspring of 34 parent–child trios were included in the test. After correction for multiple testing, none of the marker alleles showed significant excess transmission (positive association) or non-transmission (negative association) (data not shown).

Figure 1
figure 1

Shared disease haplotypes in SDS. Each of the six extended haplotypes (I–VI, outlined) was observed to co-segregate with the disease in two unrelated families of common ethnic origin (G, German; D, Dutch; I, Italian; FC, French Canadian; E, English). The families are from the US, the Netherlands (Ne), Italy (It), Canada (Ca) and the UK. All haplotypes span a smaller interval defined by recombination events on haplotypes II and IV. Chromosome designation refers to the SDS pedigree, the affected individual, and the parental origin (M, maternal; P, paternal). Alleles assumed to have arisen by microsatellite mutation (in brackets) differ in size by one or two repeat units. All markers involve dinucleotide or tetranucleotide repeats, with the exception of two SNPs, WIAF-2179 and WIAF-183 (Table 1).44 Marker order is based on the sex-averaged genetic map (cM),31 the Stanford G3 radiation hybrid map, v 2.0,18 and the physical map in Figure 3. Markers D7S473 and D7S494 were resolved by recombinations in two families (data not shown). Relative order of markers D7S1485 vs. WIAF-2179, and D7S494 vs. D7S1480 remains undetermined, but has no effect on either the extent of haplotype sharing, or the interpretation of recombination events.

Linkage disequlibrium, however, was detected by haplotype analysis. Haplotypes were constructed on disease chromosomes (chromosomes of affected individuals) and normal chromosomes (parental chromosomes not transmitted to affected offspring). Their comparison identified six haplotypes (I–VI in Figure 1), each of which occurred on two disease chromosomes of common ethnic origin and on none of the normal chromosomes. Notably, such extensive haplotype sharing was not detected among normal chromosomes. These observations suggest existence of multiple founder chromosomes in SDS.

To detect historical recombination events that may have occurred on these chromosomal backgrounds, haplotype analysis was extended to other available SDS families. Ancestral recombinants associated with disease haplotype II were observed in families SW93 and SW18, and those associated with haplotype IV were detected in family SW69 (Figure 1). The recombination events reduce the most likely location for the disease defect to an interval that is flanked by markers BS126 and D7S502, and is spanned by all six SDS haplotypes.

Refinement of the disease locus was supported and extended by observation of an intrafamilial recombination event in SDS family SW158, where the phase of the recombinant maternal chromosome was confirmed by genotyping first degree maternal relatives in an extended pedigree (Figure 2). The region of concordance between the affected and an unaffected sibling could be excluded from involvement in disease, thereby defining a new boundary for the SDS locus at marker D7S2429. Taken together, the ancestral and the intrafamilial recombinants position the SDS gene in the minimal 1.9 cM interval between D7S2429 and D7S502 (summarised in Figure 3).

Figure 2
figure 2

Recombination mapping in family SW158. The exclusion of the short arm and the centromeric region of chromosome 7 from disease involvement is based on concordant and discordant chromosomal regions in the affected and an unaffected sibling, determined by a recombination event on the maternal chromosome. The phase of the maternal chromosomes was confirmed by genotyping maternal relatives (dashes indicate undetermined genotype at uninformative markers).

Figure 3
figure 3

Refinement and physical map of the SDS locus. The SDS locus refinement is based on ancestral recombinants observed on shared disease haplotypes (minimal interval is shown as hatched bar), as well as on the pattern of concordance (open bar) and discordance (solid bar) of the affected and an unaffected sibling in SDS family SW158. Marker order is based on radiation hybrid mapping (G3 bin) and STS-content mapping. The clone contig across the disease locus includes YAC clones from CEPH, Washington University and chromosome 7-specific libraries, and BAC clones from RPCI-11 library. In addition to genetic markers (in bold; amplification shown as solid circles), the physical map incorporates BAC-end STSs (open circles) and STSs derived from known genes and ESTs (hatched circles). The refined SDS locus is at 7q11 based on mapping of centromeric repeats D7Z1 and D7Z2, and on characterisation of somatic cell hybrids RuRag 14-4-7-44 (7p) and RuRag 6-20-12 (7q). In addition, many of the YAC clones in the contig (underlined) have been analysed by FISH and localised to 7q11.2 (Kunz et al;45 Integrated Chromosome 7 Database: http://www.genet.sickkids.on.ca/chromosome7/; YAC/BAC FISH Mapping Resource at MPIMG). For simplicity, a large number of BAC clones are omitted from the contig, as well as YAC clones with apparent deletions, with the exception of 763g2, and E1864; the former is shown as a reference point for polymorphic markers derived from this clone.

Detection of the critical recombination events described above was facilitated by characterisation of six new polymorphic markers (Table 1), with at least six alleles observed for each marker in the SDS pedigrees. Markers BS126 and B236I1 were identified using the publicly available sequence of RPCI-11 BAC clones R-211B24 (TIGR: http://www.tigr.org/) and R-458F8 (GenBank17 at NCBI), respectively. Markers RL11, RL12, RL14 and RL15 were derived from CEPH YAC 763g2, selected for its large size and unambiguous localisation to 7q11.2 by fluorescence in situ hybridisation (YAC/BAC FISH Mapping Resource at MPIMG: http://www.mpimg-berlin-dahlem.mpg.de/cytogen/).

Table 1 Polymorphic and STS markers developed in this study

Physical mapping

In order to establish the relative order of published genetic markers, as well as those identified in this study, we used several complementary mapping methods. Initially, radiation hybrid mapping was performed, using the Stanford G3 panel.18 Several markers across the SDS locus (D7S499, D7S659, D7S2429, WIAF-183, D7S2549, D7S663, D7S502, D7S2503 and D7S482) had already been mapped at the Stanford Human Genome Center (SHGC),18 or by others (D7S1480).19 An additional 11 markers were typed twice on the panel, and data were submitted to the SHGC RH server for analysis. All markers were placed in high-confidence (1000:1) bins on the Stanford G3 RH map, v 2.018 (Figure 3) with lod scores greater than 7. Markers positioned in the same bin could not be ordered with high confidence, but were resolved by observation of intrafamilial recombination events (see Figure 1 legend), and by mapping on a YAC/BAC clone contig constructed across the 1.9 cM SDS locus.

The contig, shown in Figure 3, was established by STS-content mapping of YAC clones retrieved from CEPH and Washington University libraries based on information in published physical maps.20,21 In addition, the chromosome 7-specific YAC library and the RPCI-11 BAC library were screened for marker-positive clones. At later stages, physical mapping incorporated STSs developed from available BAC-end sequence (TIGR Database; Table 1), as well as EST-derived STSs predicted to map to the SDS locus on GeneMap'99.22

Based on integrated maps,21,23,24 it was evident that the 2.7 cM SDS locus spanned the centromere of chromosome 7. As the interval was saturated with genetic markers, the question arose as to which markers flank the centromere on the p and the q arms. This was of special interest in determining the chromosomal location of the critical recombination event in family SW158. To orient markers with respect to the alphoid domains of the centromere, two distinct chromosome 7-specific centromeric repeats,25 D7Z1 (Table 1) and D7Z2,26 were radiation hybrid mapped on the Stanford G3 panel and correlated with mapping results for the genetic markers (Figure 3). Chromosome arm assignment was directly confirmed by mapping markers and centromeric repeats on somatic cell hybrids RuRag 14-4-7-44 and RuRag 6-20-12 containing the short and the long arm of human chromosome 7, respectively27 (Figure 3). Together, the above findings position the refined 1.9 cM disease locus on the long arm of chromosome 7.

The refined SDS locus exceeds 1.9 Mb, the size predicted by the average 1 cM/Mb, based on radiation hybrid mapping and the constructed clone contig (see Discussion). The large physical size, as well as the pericentromeric location of the disease locus present difficulties for identification and characterisation of candidate genes. GeneMap'99 positions 28 transcripts within the critical genetic interval. To date, 13 ESTs and three known genes, TPST1, PMS2L4, and ZFD25, have been confirmed by us to map to the disease locus (Figure 3).

Mutation analysis of TPST1

In contrast to PMS2L4 and ZFD25, TPST1 is known to encode a functional protein product,5 and thus represents a true candidate gene for SDS. To facilitate mutation screening of TPST1, we established its genomic structure (Table 2). TPST1 mRNA sequence (AF038009) was aligned to the working draft sequence of the chromosome 7 BAC clone RP11-379F11 (AC026281), identified by BLASTN search of the HTGS database at NCBI. TPST1 consists of five exons, with consensus splice sites at the intron–exon boundaries. Coding sequence spans exons I–IV. Intron size was determined from contiguous genomic sequence available in GenBank for introns 1, 3 and 4 (AC079855, AC026281), and from that available in Celera Data (Celera Discovery System and Celera Genomics' associated databases) for intron 2 (hCG39437).

Table 2 Genomic structure of the TPST1 gene

Each of the five TPST1 exons, including the intron–exon junctions, was amplified from genomic DNA and sequenced in five patients with SDS and one control individual. The patients are from five unrelated families used to establish linkage.1 Primers used are shown in Table 3. Two heterozygous sequence variants were identified, one in intron 2 (1126-19(G→A)), and another in the non-coding exon V (1599(T→G)). Each was observed in a single patient, and neither in the control sample. Using the Bi-PASA assay for detection of single nucleotide polymorphisms (Table 3), we confirmed that each of the two variants co-segregate with disease in the respective families. However, they were also found to occur in a control population of 52 individuals; the intron 2 variant was observed at frequency of 3%, and the exon V variant at 13%, indicating that they are not exclusively associated with SDS.

Table 3 Primers used for amplification and sequencing of TPST1 exons, and for characterisation of detected sequence variants

To exclude the possibility of local deletions and rearrangements not detected by sequencing, TPST1 cDNA probes were hybridised to genomic DNA of one control individual and four of the five patients described above. Observed band patterns for two different restriction enzymes were identical in control and patient samples, and as predicted by TPST1 sequence, suggesting that large-scale mutations in TPST1 are not the molecular defect in SDS.

We assessed TPST1 expression by RT–PCR (Table 3) in nine SDS patients, including those identified to carry intron 2 and exon V sequence variants. A single amplification product of expected size, corresponding to exons II–IV, was observed in all patient samples. Therefore, loss of TPST1 expression or mRNA instability can be excluded as causing disease.

Discussion

Using TDT for linkage disequilibrium mapping, we did not detect association of individual marker alleles with disease. This does not necessarily imply absence of linkage disequilibrium, but may reflect limited power of TDT due to small sample size and/or allelic heterogeneity at the disease locus. Given the relatively small number and the ethnic diversity of the family trios, haplotype analysis was a more powerful method to detect disequilibrium. Our observation of multiple founder chromosomes in SDS (Figure 1) is similar to findings in Wilson disease (MIM 277900). Also reminiscent of Wilson disease28,29 are haplotypes shared by patients of Italian and Northern European origin (II and IV in Figure 1).

To ensure that haplotype construction and interpretation of recombination events was correct, knowledge of marker order was essential. The resolution of genetic maps across the SDS locus is low.30,31 Furthermore, discrepancies were noted between genetic and physical maps.20,21 Survey of genomic sequence available in GenBank for this region revealed poor coverage with high degree of discontinuity, insufficient to resolve the inconsistencies. This is not surprising given the complex organisation of pericentromeric regions, including that of chromosome 7.32 The associated difficulties in mapping and sequence assembly33 emphasise the utility of our physical map for fine mapping studies, and underscore the value of the newly identified genetic markers.

The predicted size of the critical 1.9 cM interval, based on the Stanford G3 RH map, v 2.0, is 3.3 Mb, and yields 0.58 cM/Mb ratio of genetic and physical distance. This deviation from the average 1 cM/Mb appears to be due to reduced male recombination rate.31 Given the established correlation between the local recombination rate and the extent of linkage disequilibrium,34,35 the former may be directly related to extensive disease haplotype sharing, observed over large physical distances (based on 0.58 cM/Mb, 4.6 Mb for haplotypes V and VI, 7.3 Mb for haplotypes III and IV, and 16.5 Mb for haplotype I; Figure 1).

Since SDS is associated with a broad spectrum of clinical manifestations, it is difficult to predict the identity of the defective gene based on its functional features. Therefore, all genes in the critical genetic interval should be considered as candidates. Assessment of candidate genes is further complicated by the pericentromeric location of the refined SDS locus, where repetitive elements, including gene-related sequences, are abundant.32 This is well illustrated by the HSPC041gene.36 Based on sequence alignment, it is located on chromosome 8, with highly related pseudogenes found on chromosomes 7 (Figure 3), 15, and 17. Of the three known genes mapped to the SDS locus, two (PMS2L437 and ZFD2538; Figure 3) belong to gene families with multiple members mapped at 7q11, which is reminiscent of duplicated sequences associated with pericentromeric regions. In the absence of functional studies, it remains unclear whether these represent pseudogenes, or should be considered as candidate genes for SDS.

TPST1 is the best-characterised gene mapped to the SDS locus. Both publicly available22,39 and our own mapping data position TPST1 unambiguously within the SDS critical region (Figure 3), with no evidence of duplication locally or elsewhere in the genome. Broad tissue distribution of TPST1 mRNA (Northern blot analysis in Ouyang et al5) is consistent with multi-organ defects observed in SDS. This includes expression in the pancreas and liver,5 in neutrophils,40 and in the bone marrow (RT–PCR analysis by M Popovic, unpublished data), all of which are affected in SDS. Finally, functional implications of a defect in TPST1 are not inconsistent with the disease phenotype. Tyrosine sulfation of secreted and membrane-bound proteins is known to mediate protein–protein interactions involved in inflammatory and immune responses, such as leukocyte adhesion and chemokine signalling.41 This type of post-translational modification may be involved in signalling pathways that are of direct relevance to SDS. Although tyrosine sulfation has not been shown to be a dynamic process like phosphorylation, localisation of TPST1 and arylsulfatase E (ARSE) in the same subcellular compartment does support this possibility.42 Interestingly, mutations in ARSE gene lead to abnormalities in the skeletal system and short stature,42 which are components of the SDS phenotype.

Mutation analysis presented here excludes TPST1 as the candidate gene for SDS. Sequence analysis of exons and intron–exon junctions did not detect variants associated exclusively with disease. The identified sequence variants appear to co-segregate with SDS in affected families; however, they are positioned in the non-coding sequence, and also occur in controls. Recently, the same variants were observed in healthy Japanese individuals screened for single nucleotide polymorphisms.43 We also found no evidence for deletions or rearrangements. Finally, we observed TPST1 expression in patient lymphoblasts. Although mutations in regulatory elements, which may have more subtle effects on gene expression, remain a possibility, we conclude that TPST1 is not involved in SDS.

In summary, we have refined the SDS locus to a 1.9 cM interval at 7q11, supported by observation of shared disease haplotypes across this interval. Construction of a physical map of the disease locus, based on a framework of ordered genetic markers, has facilitated genetic analysis, as well as mapping of candidate genes in the critical genetic interval. One of these, TPST1, was excluded as the causative gene for SDS by mutation analysis. In the future, integration of our map with the increasingly available genomic sequence of the region will allow a more comprehensive characterisation of candidate genes for SDS. Ultimately, this work should lead to identification of disease-causing mutations, providing for a more accurate diagnosis and better understanding of this complex disorder.