Introduction

Tuberous sclerosis (TSC) is an autosomal dominant disorder characterised by skin signs, seizures and mental handicap. It is very variable in severity but in general, variation is as great within families as between families. In 1987, TSC was found to be linked to the ABO blood group locus on chromosome 9q [1], but further studies showed unexpected genetic heterogeneity and it now seems likely that about half the cases are caused by a gene, TSC1, on 9q34 and half by a gene, TSC2, on chromosome 16 [2]. Linkage analysis in two large families, together with identification of deletions on chromosome 16 has recently led to the cloning of the TSC2 gene [3]. Positional cloning of the TSC1 gene, however, has been hampered partly by the fact that most families are small making it difficult to distinguish whether a recombinant with 9q34 markers is finely localising the disease gene, or whether in fact it is indicating that the family is linked to chromosome 16. There are also some conflicts in the recombination data obtained from larger families where TSC is clearly linked to 9q34. Evidence from informative families has placed the TSC1 gene between ABL and D9S114, a distance estimated as less than 5 Mb by FISH and between 4.5 and 6.7 Mb on pulsed-field gels [4, 5]. The current consensus based on recent recombination data suggests that the critical region for TSC1 is between D9S149 and D9S114, a distance of approximately 3 Mb [6]. In parallel with efforts to define the candidate region more precisely by genetic means, we set out to clone the entire region in cosmids, an undertaking which then enables assembly of the DNA into contigs and provides a mapped resource for efficient isolation of DNA as more precise positional information becomes available. The strategy we used was to generate somatic cell hybrids containing fragments of 9q34, and to employ Alu-PCR products from appropriate hybrids to identify cosmids which could then be assembled into contigs. The irradiation hybrids we used were generated from a somatic cell hybrid 64063a12 which contains human 9q as its only human chromosomal material [7]. Alu-PCR products from three irradiation lines, chosen on the basis of retention of markers in 9q34, were used to identify over 1,400 cosmid clones. Human-specific Alu-PCR [8] has been widely used to generate sequence-tagged sites (STSs) from defined regions contained in somatic cell hybrids [for examples see ref. 913]. Previous estimates for the average frequency of Alu-PCR products obtained from hybrids using a single primer have ranged from about 1 in 200 kb to 1 in 1,000 kb [1416]. The exact figure presumably depends on the region of the genome involved, the amount of DNA in the hybrid and the exact experimental conditions. Complete coverage of a given region has required additional contributions from YAC screening [17, 18], while the assembly of cosmids into contigs by traditional methods of restriction mapping and walking would be very tedious when applied to a large region. The high density of probes obtained by Alu-PCR and other means, in conjunction with the availability of a 6-deep gridded chromosome 9 library (kindly supplied by Dr. Pieter de Jong) has allowed the use of an efficient computer-assisted fingerprinting method [19] to assemble the cosmids obtained by screening with Alu-PCR products along with a smaller number of cosmids obtained by other means into long contiguous stretches of DNA. Cosmid fingerprinting was developed and has been used for other organisms such as Caenorhabditis elegans, Saccharomyces, cerevisiae and Escherichia coli [2022], and it is currently being used for the analysis of human chromosomes 11 [23, 24] and Y [25].

Our results show that this approach can provide near-complete coverage of a large region, and can minimise the need to resort to YAC clones with their attendant problems of chimerism, internal deletions and rearrangements. We have also provided significant physical mapping data for a number of genes in 9q34.

Materials and Methods

Cell Lines and Culture Conditions

The cell line 64063a12 is a hamster-human hybrid line containing a single copy of human chromosome arm 9q [7]. It is able to grow in HAT medium. WG3H is an hypoxanthine phosphoribosyl transferase (HPRT)-deficient hamster cell line [26]. Attached cell lines were grown in DMEM supplemented with 10% fetal bovine serum, penicillin and streptomycin.

Irradiation-Fusion Hybrids

The irradiation fusion procedure was carried out essentially as described by Florian et al. [27] starting with 5 × 106 cells of parental cell line 64063a12. The cells were given a 45,000-rad exposure at room temperature using an industrial X-ray unit (HF320 SR, Pentac) at a rate of 1,000 rad/min. After irradiation, the cells were fused to recipient hamster cells WG3H. Following 24 h incubation at 37°C, HAT was added to the growth medium. Individual colonies were picked and cell lines established after 14 days.

FISH

Cosmids to be used for painting were grown individually overnight in 96-well microtitre trays. They were subsequently pooled and DNA was prepared using a standard alkaline lysis method. FISH was performed as described by Woodward et al. [28].

DNA Preparation from Cell Lines

Cells from three 180-cm3 flasks were thoroughly washed in physiological saline. 7 ml of lysing solution (0.5% SDS, 100 g/ml proteinase K) were added to each flask and incubated for 5 h at 37°C. The lysate was then extracted three times with phenol-chloroform. The DNA was ethanol-precipitated and resuspended in TE.

Alu-PCR

Human-specific Alu PCR was carried out on DNA isolated from irradiation hybrid lines using primer AluIV [11], which primes from the 3′ end of Alu sequences and primer 5R′ (GAACTCCTGACCTCAGGTGATCCAC), which primes from the 5′ end, both specifically amplifying human DNA only. Approximately 1 µg of DNA was used in a mixture containing 10 mM Tris. Cl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 0.02% gelatin, 0.01% Tween 20, 0.1% Triton X-100, 0.1% Nonidet NP-40, 200 mM each dATP, dCTP, dGTP and dTTP, 1 µM primer and 2U Promega Taq polymerase in a reaction volume of 100 µl. An initial denaturation step at 94°C for 5 min was followed by 35 cycles of 45 s at 94°C, 1 min at 56°C, and an elongation step at 72°C (3 min for 10 cycles, followed by 4 min for 10 cycles and 5 min for 15 cycles).

Sources of Cosmids

The majority of the clones were isolated from arrayed chromosome-9-specific library LL09NC01 ‘P’ which consists of 3 × 105 cosmid clones and was provided to us by Pieter de Jong. These cosmids are referred to by their plate, row and column addresses.

The following cosmid libraries were also employed: (1) a female genome library constructed by B. Cachon-Gonzalez [29] which provided 22 clones from screenings with single-copy probes; (2) a library constructed from 9q-only cell line 64063a12 which provided 150 clones on screening with human DNA; (3) a library constructed from cell line E6B [30] provided 192 clones of human origin, and (4) libraries constructed from cell lines 17B and 20A provided 15 clones. In addition, 85 clones [31] and 23 clones [32] mapped to 9q34 by FISH were fingerprinted.

Single-Copy Probes

Probes from the following genes were obtained by RT-PCR [33, 34]: complement component 8 peptide (C8G); retinoid X receptor-α (RXRA); carboxyl ester lipase (CEL); D9S46E-Cain (CAN), endoglin (ENG); pregnancy-associated plasma protein (PAPPA), and collagen α1(V) (COL5A1). A progestagen-associated endometrial protein (PAEP) probe was made by RT-PCR using primers amplifying exon 2 of the gene, and an oligonucleotide probe for SURF3 was provided by B. Janssen. Cloned fragments from the following genes and markers were kindly supplied to us as follows: ABO locus (Dr. F. Yamamoto); Abelson proto-oncogene — v-ABL (Dr. N. Tyke); folylpolyglutamate synthase — FPGS (Dr. B. Shane); Pre-B cell leukaemia transcription factor 3-PBX3 (Dr. T. Bech-Hansen); N-methyl-D-aspartate receptor — GRIN1 (Dr. P. Brett); dopamine β-hydroxylase — DBH (Dr. Jaques Mallet), and MCT 136/D9S10 (Dr. R. White). Cloned fragments from the homologue of the Drosophila notch gene (NOTCH1), from arginosuccinate synthetase (ASS) and for ATP-binding cassette 2 (ABC2) [EST01643; see ref. 35], were acquired from the American Type Culture Collection (ATCC). Multiple cosmids were isolated with all probes with the exception of the loci ABL (no clones) and COL5A1 (1 clone).

Library Screening

An aliquot of the Alu-PCR reaction corresponding to approximately 100 ng of product was radiolabelled using an Amersham oligolabelling kit. LL09NC01 ‘P’ library filters were prehybridised overnight at 42° C in 50% formamide, 4 × SSC, 1 mM EDTA, 10% dextran sulphate, 1% SDS, 50 mg/ml sheared herring sperm DNA, 5 × Denhardt’s solution, and 500 mg/ml placental DNA. Probes were preassociated with 125 µg Cot 1 DNA (BRL) in 5 × SSC in a volume of 400 µl for 1 h at 65° C then added to the filters and left to hybridise overnight at 42°C. Filters were subsequently washed twice for 30 min at 2 × SSC, 0.1% SDS following by two more washes with 0.1% SSC, 0.1% SDS and exposed at −70° C to Kodak X-omat AR X-ray film with intensifying screen. Libraries were also screened with single-copy markers. These were individually labelled and combined in pools of varying sizes for screening. Hybridisation, washing and exposing were done as described for Alu-PCR products.

Fingerprinting

DNA for fingerprinting was prepared in microtitre dishes by the alkaline lysis method of Gibson and Sulston [36]. Samples were digested with HinfI, labelled with Klenow and 32P-dCTP, and bands were separated on non-denaturing acrylamide gels [25, 37] and analysed on a Vax II/GPX workstation [24].

Results

Characterisation of Irradiation Hybrids

A panel of 39 irradiation hybrids was initially tested for the retention of 23 loci from 9q. Twenty five of the 39 hybrids tested positive for at least one marker, with 16 hybrids retaining loci from 9q34. Investigating the hybrids by FISH using biotinylated human DNA as probe onto metaphase chromosomes revealed only 1–3 fragments of human origin per cell in 14 of the lines. Hybrids which retained markers almost exclusively from 9q34 were characterised with an additional 8 markers from the region. The results are summarised in figure 1.

Fig. 1
figure 1

Pattern of marker retention of the irradiation hybrid lines with fragments from 9q34. In addition to the ones shown here, the hybrids were negative for the following markers (between D9S15 and GSN): ASSP3, ASSP12, ALDOLB, ORM and HXB. Information on all markers is accessible in GDB. Hybrids 17B, 19B and 20A (bold lines) were chosen to obtain Alu-PCR probes for cosmid library screening. Hybrid 6C was not used because FISH data with total human DNA indicated that it contains multiple fragments of human origin.

Screening with Alu-PCR Products

Three hybrid lines, 17B, 19B and 20A contain overlapping fragments from the TSC1 candidate region and little or no other detectable DNA (fig. 1). Alu-PCR products made from these three hybrids using two different primers, individually were labelled and hybridised separately to filters from gridded library LL09NC01 ‘P’. A total of 1,950 signals corresponding to 1,431 different cosmid clones were obtained from the screenings. Substantial overlaps were observed between the sets of signals obtained by Alu-PCR products from the three hybrid lines, presumably reflecting their overlap in human DNA content (tables 1, 2). To evaluate the efficiency of our strategy for selecting cosmids from the defined region, six pools — each containing 96 individually grown cosmids identified by each of the screenings with Alu PCR products (3 hybrid lines, 2 Alu primers) — were used to probe human metaphase chromosomes by FISH. The DNA preparations were biotinylated and used to paint human metaphase chromosomes. As a control, 96 random cosmids from the LL09NC01 ‘P’ library were also used as probe. The signal from the control probe covered the entire length of chromosome 9 (fig. 2a). In contrast, the signal observed with the Alu-PCR cosmid pools was confined mainly to 9q34 (see table 3 and fig. 2b, c, f). To assess the extent of the overlaps between the hybrids, pools of cosmids simultaneously identified with Alu-PCR products from 2 or all 3 hybrids were used as probes on metaphase chromosomes from translocation cell line 9T12 [t;(9;20)(q34; q11.2)] [38]. The breakpoint of this translocation has previously been mapped between D9S114 and D9S298 [28]. Results showed that cosmids identified by all three hybrids mapped proximal to the 9T12 breakpoint, while cosmids identified by any two of the hybrids mapped on either side of the 9T12 breakpoint (fig. 2e, f).

Fig. 2
figure 2

FISH of pools of cosmids on metaphase chromosomes. Arrows indicate the derivative 20 chromosome. a 96 random cosmids from the LL09NC01 ‘P’ library. Normal metaphase chromosomes. b–d 96 cosmids obtained with Alu-PCR products from hybrids 17B, 19B and 20A, respectively. Normal Metaphase chromosomes. e 62 cosmids simultaneously identified with Alu-PCR products from hybrids 17B and 19B. Translocation line 9T12. f 39 cosmids simultaneously identified with Alu PCR products from hybrids 17B and 20A. Translocation line 9T12.

Table 1 The total number of signals obtained by our 6 successive screens of the LL09 library using Alu-PCR products
Table 2 The number of signals obtained more than once by screening the LL09 library independently with Alu-PCR products from the different hibrids
Table 3 FISH results from painting metaphase chromosomes with Alu-PCR derived cosmids

Contig Assembly

A total of 1,894 cosmid clones have been fingerprinted and analysed. These clones have been assigned to 172 contigs which contain a total of 1,116 cosmids. 63 of these contigs have been mapped with respect to three translocation breakpoints in 9q34 either on the basis of FISH experiments using cell lines containing defined translocation chromosomes [28] or because they are known to contain mapped markers (fig. 3). The amount of data is too great to show here in detail, but all data are available by anonymous FTP (see below). As an example, we present data from contig 69 which is 200 kb long and contains the genes ABC2 and C8G. Figure 4a shows a tiling path across the contig and fig. 4b shows the fingerprints of the same clones run on one gel. Another contig includes the gene loci ABO, SURF and DBH and the marker D9S150. Examination of the fingerprints shows a duplication of 15 HinfI bands immediately distal to the SURF cluster and proximal to the D9S150. However, additional repeats in the same region revealed by hybridisation make it impossible to establish with certainty that there is not a gap present between SURF and D9S150 [see ref. 39].

Fig. 3
figure 3

Approximate extent of coverage of the 9q34 region. The diagram is taken from the chromosome 9 workshop report Sigma display with minor adjustment. Distances on this map were based on estimates of physical distance which were either published or were presented at the workshop. Breakpoints SD1, 9T12 and 9T01 are represented by vertical lines. The contigs have been drawn to scale on the assumption (see text) that 1 fingerprint band is equivalent to 1 kb. Contigs have been assigned to three classes: mapped (contigs which contain a mapped locus), pocketed (contigs which have been mapped with respect to translocation breakpoints by Woodward et al. [40]), and unassigned. Immediately above the chromosome ideogram are shown the mapped contigs, then a shaded band. Above that are first the pocketed contigs and then the unassigned contigs (with the exception of PAPPA which is in unassigned contig 78). All contigs are shown again in the shaded band to give an estimate of the density of coverage. Loci present in the contigs are shown directly below their respective contig. Thus the order of the loci reflects either their known map position or their position within a contig which may be only pocketed. It is therefore not definitive.

Fig. 4
figure 4

A tiling path of cosmids across contig 69. Only cosmids from the LL09NC01 ‘P’ library are shown. The positions of genes ABC2 and C8G are shown with error bars determined by those cosmids of the full contig which were found to be positive by hybridisation. b HinfI fingerprints of the cosmids across contig 69 arranged in the same sequence and orientation as in a. The scale refers to fingerprinting bands (see text).

Other contigs of interest include contig 7 which is 180 kb long and contains the genes FPGS and ENG in very close proximity. Contig 41 contains D9S10, D9S66 and the recently disovered Vav-2 [40, 41], and is about 120 kb long. Contig 71 contains both the markers D9S113 and D9S148 and is over 90 kb long. The largest contig to contain no known gene or marker is contig 33, which maps proximal to the SD1 breakpoint and is 230 kb long.

All our data are publically available. The database has been exported from the VAX as a text file in the .ace format. In this form it can be imported directly into Acedb version 1.8. With minor amendment it can be imported into later versions of the program. We have placed a copy of this file on the chromosome 9 anonymous file server. It is available by anonymous ftp from ftp.gene.ac.uk. [128.40.82.1] as the file 9q34 contigs.ace in the subdirectory /pub/chr9/acefiles, or from the World Wide Web (WWW) URL http://diamond.gene.ucl.ac.uk/ with any Web client such as Mosaic or Lynx.

Discussion

We have used a novel combination of techniques to identify in cloned form and map most of the DNA from the entire TSC1 candidate region in 9q34. Radiation hybrids containing small fragments of chromosome 9q have been used as a basic starting resource. By screening a gridded chromosome 9 cosmid library with Alu-PCR products derived from a combination of hybrid lines, large numbers of cosmids were identified which were then fingerprinted and tested for contig formation by a semi-automatic procedure. The majority of the cosmids selected with the Alu-PCR products were confirmed by FISH to originate from the targetted region, so although none of our hybrid lines contained the entire region of interest, a sequential screening strategy which identifies cosmid signals in common between hybrids containing overlapping fragments was highly successful. Use of hybrids with overlapping fragments has also enabled a more complete coverage of the region, since the relative abundance of particular IRS PCR products often varies between different hybrids. An estimate of the genome coverage achieved by Alu-PCR screening is obtained by examining the cosmids identified with 19 known genes or markers in the region. With only two exceptions, these cosmids were also independently identified with Alu-PCR products and in the two exceptional cases, Alu-PCR products picked cosmids immediately adjacent to the markers. It appears, therefore, that the average inter-Alu distance in this region, obtained from hybrids using two primers consecutively (one 5′ and one 3′), is only about one cosmid length, and approaches the frequency observed with Alu-PCR of individual cosmids. Furthermore, these figures suggest that, taken together, the cosmids obtained by the Alu-PCR screenings along with those obtained by other means (see Materials and Methods) should identify nearly all the DNA in the region. This coverage is extensive enough to allow HinfI fingerprinting, and contig assembly.

The technique of cosmid contig assembly by fingerprinting random clones has been little used in mapping the human genome. The genomes of C. elegans, E. coli and S. cerevisiae [2022] have been assembled into cosmid contigs by this method, but the magnitude of the human genome has prevented anyone from making large inroads by the direct assembly of cosmid contig maps. Mapping the human genome has relied on YACs and a YAC-STS method has been employed to assemble human genome contigs [42, 43]. However, YAC libraries have been found to have a large proportion of rearranged clones and YACs themselves present problems in handling and DNA isolation. In addition, they make a relatively poor substrate for the isolation of transcribed sequences. To overcome these limitations, YACs are increasingly used to derive the corresponding cosmids. Rather than using the intermediate stage of cloning in YACs, we have gone directly to cosmids.

Fingerprinting has been an efficient method for assembling contigs from the set of selected 9q34 cosmids. It should be pointed out that, where the same cosmids are involved, our fingerprinting data, including the degree of overlap observed within contigs, is in agreement with data obtained by walking techniques [39]. 1,116 fingerprinted clones have fallen into 172 contigs, giving an average content of 6.5 cosmids per contig. Using the approximation that one HinfI band is observed per kilobase (an average of 40 bands is observed per cosmid fingerprint), we estimate the total length of the DNA in contigs to be approximately 13.5 Mb. This is compatible with an estimate of about 10 Mb for the size of the region of 9q34 contained in our hybrids, derived from PFGE and FISH data presented at the chromosome 9 workshop [6]. It is not surprising that our empirical estimate of the length of our contigs is slightly larger than this. Firstly, a small number of cosmid contigs actually lie outside the expected region. More importantly, we have imposed a stringent requirement of overlap of at least 50% of the bands before clones are joined in a contig. Therefore there will be overlaps between the ends of contigs which have not been recognised. This would also explain why, with almost complete coverage of our target region, we have not yet closed the gaps in the contig map.

TSC1 maps between the markers D9S149 and D9S114. However, within this interval, conflicting recombination data place TSC1 either proximal to ABO or distal to D9S66. To have the best chance of covering the TSC1 candidate region in our contigs, we have been particularly interested in the DNA between the SD1 and 9T12 breakpoints (shown relative to the genetic map in fig. 3). This region is estimated to contain about 3.7 Mb of DNA [6]. 20 of our contigs are known by FISH analysis to lie between these breakpoints [28] (see fig. 3), and these cover a total of 2.2 Mb. The (generally smaller) contigs not yet assigned to any breakpoint interval cover a total of 5.8 Mb, and judging from the proportion of previously unlocalised contigs found by FISH to lie between SD1 and 9T12, we might expect about 1.2 Mb of this region to be covered by unlocalised contigs. This gives a total coverage of 3.4 Mb out of the 3.7 Mb between the breakpoints, or 92% of the region. This appears to be rather lower than our coverage of 9q34 generally and can be partly accounted for by a region from ABL to D9S113 which seems to be absent from the hybrids (see fig. 1). However, in view of the great difficulty experienced by ourselves and others in obtaining genomic clones for ABO it may be that the region just proximal to ABO is resistant to cloning into cosmids. This region may be underrepresented in cosmid libraries.

Our results contribute to the physical map of 9q34 by establishing physical connections between various genes and markers. We have positioned the SURF gene cluster between ABO and D9S150, approximately 80 kb distal to ABO. ABO was known to be near DBH, and estimates of the distance between them ranged from 650 to 250 kb [6]. For the reasons stated in the Results, we have not been able to confirm a connection between the ABO and DBH contigs. However, data by van Slegtenhorst et al. [39] indicate that any gap between these contigs cannot be more than 50 kb. In addition, between the SD1 and 9T12 breakpoints, we have found D9S113 and D9S148 to lie about 1 cosmid length apart in contig 71. Proximal to the SD1 breakpoint, we have found the genes FPGS and ENG very close together on overlapping cosmids, and in the most distal part of 9q34 we have found ABC2 and C8G to lie about 1 cosmid length apart (see fig. 4a).

Almost all of the DNA in 9q34 that can be cloned into cosmids is now represented in our contigs. We are now in a position to join the contigs together. This will involve screening the library with pooled end probes, followed by further fingerprinting to determine new overlaps. Concurrently, the cosmids that we have isolated within the TSC1 candidate region are a valuable resource for the isolation of expressed genes within this area.