Introduction

The frequency of congenital abnormalities is approximately twice as high in newborns with a de novo balanced chromosome rearrangement (6.1% for apparently balanced translocations and 9.4% for apparently balanced inversions) compared with the risk in the general population (approximately 3–4%). This suggests a causative link between the rearrangements and the observed phenotype in at least half of the disease-associated balanced chromosome rearrangements (DBCRs).1 The clinical phenotype in these cases can be caused by a microdeletion or – duplication at the translocation or inversion breakpoint(s) only detectable by high-resolution techniques, or by disruption or inactivation of specific gene(s) at or close to the breakpoint(s). Therefore, characterisation of the breakpoints in DBCRs has often been a promising starting point in the molecular elucidation of early-onset Mendelian disorders.2, 3, 4, 5 Recently, such a strategy has also been applied to search for genetic risk factors for complex and late-onset diseases.6

Mapping translocation breakpoints using conventional methods, such as in situ hybridisation with fluorescent dye-labelled bacterial artificial chromosome clones (FISH), is laborious, time consuming and often provides limited resolution of breakpoint positions. With the development of array painting techniques, which combine DNA array and chromosome sorting technologies, the efficiency has been improved enormously7, 8, 9 and an ultra-high resolution was achieved by the sequential painting with two arrays, a tiling path large insert array and a region-specific, ultra-high-resolution oligonucleotide array.10 Recently, we have introduced a novel and rapid method to map translocation breakpoints by shotgun sequencing flow-sorted derivative chromosomes using ‘next generation’ massively parallel sequencing technology. The coverage obtained by this method was sufficient to bridge the breakpoints by PCR amplification, and the procedure allowed to determine their exact nucleotide positions in a short time frame.11

In this study, to map chromosome breakpoints at a high resolution with ultimate efficiency, we applied the paired-end sequencing strategy in four unrelated patients, two with balanced translocations and two with chromosome inversions. We mapped the breakpoints within a region of a few hundred base pairs (bps) by searching for the read pairs spanning the breakpoints out of millions of short reads generated from both ends of DNA fragments. Subsequent PCR amplifications and Sanger sequencing of the junction fragments confirmed the breakpoint regions.

Subjects and methods

Case 1

The patient was the second child of healthy and non-consanguineous parents. She was born after an uneventful pregnancy by caesarean section in the 38th gestational week. Birth weight (2340 g) and length (46 cm) were below the 3rd centile; head circumference was normal (34.5 cm, 50th centile). She had a very low muscle tone and seizures beginning at the age of 3 months. The seizures were successfully treated by topiramate and oxcarbazepine. Psychomotor development was severely retarded. At the age of 3 years, she could neither walk nor talk; she had short stature (85 cm, <3rd centile) and was mildly microcephalic (47 cm, 3rd centile). A cerebral MRI scan at the age of 2 years showed mild cortical atrophy and a hypoplastic corpus callosum. A muscle biopsy revealed no abnormalities.

Case 2

The patient was the first child of healthy and non-consanguineous parents. She was born after an uneventful pregnancy with normal birth measurements. Psychomotor development was retarded. She started walking and talking at the age of 2 years. On examination at the age of 2 years 10 months, she was able to speak several single words. Body measurements were normal, and apart from mild hypertelorism and bilateral epicanthal folds, she had no other facial dysmorphism. A brain MRI scan revealed no abnormalities.

Case 3

Prenatal diagnosis was performed in the 11th gestational week because of advanced maternal age. Karyotyping revealed a normal female karyotype 46,XX. During the genetic counseling session the father, a mathematician, reported that in two previous pregnancies of the couple a paternal inversion of chromosome 8 [46,XY,inv(8)(p11.22q22.3] was diagnosed prenatally. Both boys were born after uneventful pregnancies and deliveries, early milestones were normal. The boys, now aged 8 and 9 years, respectively, suffer from dyslexia as the father did during childhood and adolescence.

Case 4

The patient was adopted at the age of 4 years. No clinical information of the pre-adoption period and of the family is known. From adoption onwards, a progressive asymmetry in the length of his legs was noted, resulting in a referral to a paediatrician at the age of 12 years. Eventually, this was treated by an epiphysiolysis of the left tibia. Although his overall cognitive development was normal (TIQ 96), his performal IQ (PIQ 76) was remarkably lower than his verbal IQ. Moreover, the boy experienced learning difficulties because of behavioural problems. At the age of 12 years, he was diagnosed with ADHD and impulsive regulation disorder. Chromosome studies at that age showed an apparently balanced paracentric inversion of chromosome 5: 46,XY,inv(5)(q13q35). Both parents were not available for chromosome analysis. On physical examination at the age of 12 years, his height was 162 cm (P80), weight 53.6 kg (P90) and head circumference 56.7 cm (P98). He had large hands (P97) but no other physical abnormalities or dysmorphism.

Examination of clinically relevant DNA copy number variation

To study submicroscopic deletions or duplications, comparative genome hybridisation experiments were performed in three patients, using whole genome tiling path BAC arrays, as described earlier,12 and in case 4 on a 244k oligonucleotide array following the manufacturers protocols (Agilent, Santa Clara, CA, USA). Results were compared with previously described copy number variants (CNVs) (http://projects.tcag.ca/variation/) and CNVs observed in approximately 700 unrelated probands screened for genomic imbalances in our laboratory using tiling path BAC arrays.13 No potentially disease-associated CNVs have been found in cases 1, 3 and 4. In case 2, we have identified a paternally inherited duplication ranging from chr2:36538212 to 36996317 (NCBI Build 36.1) encompassing CRIM1, FEZ2, VIT, STRN and HEATR5.

Chromosome sorting and amplification

For chromosome sorting, the lymphoblastoid cell lines were cultured in RPMI 1640 medium supplemented with 10% foetal calf serum, 2 mM L-glutamine and antibiotics at 37°C in a humidified atmosphere containing 5% CO2. Cells in log phase were treated for 16 h with colcemid (0.05 mg/ml final concentration) to arrest cells in metaphase. Metaphase chromosomes were flow sorted as described earlier.14 The sorted chromosomes were amplified using the GenomiPhi V2 DNA Amplification kit (GE Healthcare, Piscataway, NJ, USA) following the protocol of the manufacturer.

Chromosome sequencing using Solexa

Approximately 2 μg amplified chromosomes were randomly fragmented to less than 800 bp by nebulisation. DNA fragments were then repaired to generate blunt ends by T4 polymerase and Klenow DNA polymerase, and phosphorylated with T4 polynucleotide kinase. After adding a single ‘A’ base to the 3′ end of the DNA fragments using Klenow exo (3′–5′ exo minus), we ligated Solexa paired-end adaptors with the DNA fragments using DNA ligase. Ligated products (size range 300–600 bp) were gel purified on 2% agarose, followed by 18 cycles of PCR amplification. We measured the DNA concentration with a Nanodrop 7500 spectrophotometer, and a 1 μl aliquot was diluted to 10 nM. Adaptor-ligated DNA was hybridised to the surface of paired-end flow cells, and DNA clusters were generated using the Illumina/Solexa cluster station, followed by 36 cycles of sequencing on the Illumina/Solexa 1G analyser from both ends, in accordance with the manufacturer's protocols.

Whole genome paired-end sequencing using SOLiD system

Genomic DNA of 40–60 μg was sheared using HydroShear and size selected to an average size of 2.5 kb by gel extraction. DNA fragments were then repaired to generate blunt ends and EcoP15I sites within the DNA fragments were methylated. After ligating EcoP15I CAP adaptors containing EcoP15I binding sites to both ends, we again size selected the DNA fragments of size 2–3 kb, which were subsequently circularised. After removing un-circularised DNA, the circularised DNA fragments were digested with EcoP15I, which cleaved 25/27 bp away from its unmethylated recognition sites. After the digested products were ligated with P1 and P2 adaptors, they were purified and amplified with 10 PCR cycles; 50 pg of the resulting library was then used for 40 cycles of emulsion PCR. Approximately 32 million beads from one library were deposited on one quarter of a slide, followed by 25 bp mate pair sequencing according to the manufacturer's protocol.

Data processing

Solexa sequence reads were compiled using a manufacturer-provided computational pipeline consisting of the open source Firecrest and Bustard applications. Sequence reads from a derivative chromosome were then aligned with the sequence of its two normal counterparts (NCBI build 36) using the Eland application. Only uniquely mapped reads with less than two mismatches were retained. Multiple sequencing reads mapped to the same position were probably the result of preferential PCR amplification during library construction, thus, only one of them was kept for further analysis. We identified the read pairs derived from the chromosome translocation junction fragments by searching for those with both ends aligned to different chromosomes and only ≥3 read pairs spanning the same junction fragment were used to locate the breakpoint regions.

SOLiD sequence reads were mapped against the human genome (NCBI build 36) using Applied Biosystems (Foster City, CA, USA) Resequencing Analysis Pipeline software ‘Corona Lite’. Only the reads that could be mapped at a unique position with at most two mismatches in colour space were retained for further analysis. We identified the read pairs derived from chromosome inversion junction fragments by searching those mapped at the specific chromosome (chr8 for case 3 and chr5 for case 4), but at the opposite strands. Only ≥2 read pairs spanning the same junction fragment were used to locate the breakpoint regions.

PCR amplification and sequencing of junction fragments

Junction fragments were amplified by long range PCR using the Takara LA PCR kit version 2.1 (Otsu, Shiga, Japan). Primers and PCR conditions are available on request.

PCR products were used as templates for sequencing in both directions using BigDye Terminator chemistry (PE Biosystems, Foster City, CA, USA) on an Applied Biosystem 3730xl DNA Analyser.

Sequence analysis of junction fragments

The sequences of junction fragments were aligned to the human genome reference sequence (NCBI build 36) using Blat from the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway).

Results and discussion

We applied a paired-end sequencing strategy to characterise the breakpoint regions in two patients with a reciprocal translocation and two patients with an inversion. In both patients with a translocation, one derivative chromosome was flow sorted to reduce the sequencing cost and both ends of DNA fragments were sequenced. Sequence reads were subsequently aligned to reference sequences of the two corresponding chromosomes. We were able to identify the breakpoint-spanning region by searching for the read pairs with both ends mapping to different chromosomes and confirmed the results by PCR amplification and Sanger sequencing of the junction fragments.

Case 1

In case 1 (46,X,t(X;17)(q21;p13), see Figure 1a), the der(X) chromosome was flow sorted. Both ends of DNA fragments in the range of 300–600 bp in size generated from der(X) were sequenced in one lane of a paired end flow cell. In total, 6 631 558 paired reads were generated; 2 797 360 and 103 387 first reads were mapped uniquely to chromosomes X and 17, whereas 2 813 177 and 103 729 second reads could be uniquely mapped to chromosomes X and 17, respectively. Six read pairs were found to span the breakpoint consistently. On the basis of their position and the size of DNA fragments (<600 bp), the breakpoints were estimated to be at bp position chrX:83988 444–83 988 730 and bp chr17:9 527 365–9 527 623 (NCBI Build 36.1). To confirm the finding, we performed PCR of the der(X) chromosome junction fragment using primers designed to flank the breakpoint. This yielded a PCR product of approximately 300 bp, which was sequenced using the Sanger technique. Precise breakpoint junctions were determined through alignment with the reference sequences of chromosomes X and 17, and subsequent sequencing of the PCR product generated from the junction fragment at chromosome 17 could map the breakpoint on chromosome X to a deleted interval of 2 bps between nucleotides chrX:83 988 515 and 83 988 518, and the breakpoint on chromosome 17 could be mapped between bp chr17:9 527 443 and 9 527 444 (Figure 2a). One bp of the breakpoint sequence on the der(17) chromosome could neither be aligned to chromosome X nor to chromosome 17 and one nucleotide change, which is not a known SNP, was found close to the breakpoint on the der(17) chromosome (Figure 2a). Although no annotated reference genes were in the vicinity of the breakpoint on chromosome X, the breakpoint on chromosome 17 disrupted USP43 (Figure 3a). USP43 encodes a putative ubiquitin-specific protease.15 Without any known function, we are reluctant to speculate any causative relationship between USP43 disruption and the phenotype present in case 1.

Figure 1
figure 1

Ideograms of the patients with breakpoint regions indicated by arrows. (a) Case1. (b) Case 2. (c) Case 3. (d) Case 4.

Figure 2
figure 2

Junction fragment sequences of cases 1 (a), 2 (b), 3 (c) and 4 (d). The reference sequences are labelled in italic and normal capital characters, respectively. Deleted sequences are underlined. Inserted sequences and single nucleotide variations found on junction fragments are marked by lower case letters and bold capital characters, respectively. Chr, chromosome; Der, derivative chromosome. Junc, junction fragment; 5C, centromeric breakpoint of inv(5); 5T, telomeric breakpoint of inv(5).

Figure 3
figure 3

Chromosome breakpoints (marked by arrows) and disrupted genes. (a) Case 1, the breakpoint on chromosome 17 disrupts USP43. (b) Case 2, the chromosome 13 breakpoints disrupted a 5′ extended ELF1 isoform. (c) Case 4, the proximal/centromeric breakpoint disrupted RHOBTB3.

Case 2

The same procedure was applied to case 2 (46,XX,t(2;13)(p16;q14), Figure 1b). In case 2, the der(2) chromosome was sequenced. In total, 14 701 429 read pairs were generated; 1 625 551 and 642 370 first reads were mapped uniquely to chromosomes 2 and 13, whereas 1 649 302 and 651 980 second reads could be uniquely mapped to chromosomes 2 and 13, respectively. Six pairs were found to consistently span the breakpoint. This enabled us to localise the corresponding breakpoints at or near position chr2:60 002 921 and close to chr13:40 506 425. The finding was then confirmed by Sanger sequencing of PCR products amplified from the two junction fragments of der(2) and der(13). We finally mapped the breakpoint on chromosome 2 between nucleotides chr2:60 002 811 and 60 002 812 and the breakpoint on chromosome 13 to a deleted interval of 6 bp between 40 506 411 and 40 506 418. Seven bps of the breakpoint sequence on the der(2) could be aligned neither to chromosome 2 nor to chromosome 13 (see Figure 2b). On chromosome 2, there is no annotated gene in a 200 kb region around the breakpoint, whereas the breakpoint on der(13) disrupts mRNA BX640798, which represents a 5′ extended ELF1 isoform (see Figure 3b).

Flow sorting of derivative chromosomes cannot always be used to study chromosome rearrangements. For example, in patients carrying inversions, it is impossible to separate the inversion chromosome from the normal one. For such cases, the paired-end sequencing strategy for breakpoint characterisation needs to be applied to the whole genome. As shown below, we showed the efficiency of such a strategy by mapping the breakpoints in two patients with an inversion.

Case 3

For case 3 with 46,XY,inv(8)(p11.22q22.3) (Figure 1c), we constructed a whole genome paired-end library with inserts of 2–3 kb in size. The library was sequenced in one quarter of a slide using the SOLiD system. In total, 33 165 957 read pairs were generated out of which 6 214 822 pairs could be uniquely mapped in the human genome. A total of 324 235 pairs were derived from chromosome 8. From these uniquely mapped read pairs, we searched for those spanning the breakpoints (see Subjects and Methods). A total of five pairs were found to consistently span the breakpoints, two pairs from the junction fragment on the short arm of der(8) and three from that on the long arm. On the basis of their mapping position, the two breakpoint regions were estimated at position chr8:25 703 933–25 705 876 and chr8:113 035 276–113 036 801, respectively. The junction fragments were subsequently amplified by PCR and Sanger sequenced. The exact breakpoint positions were finally mapped at chr8:25 704 251–25 704 252 and chr8:113 036 579–113 036 580. Two bps of the breakpoint sequence on the long arm junction fragment could not be mapped on the reference sequence and one nucleotide change, which was not a known SNP, was found close to the breakpoint (Figure 2c).

In this case no reference genes were disrupted or in the vicinity of either of the two inversion breakpoints. Only one transcript supported by a single mRNA clone (AK130123), which most likely represents an artefact because it contains exons of two neighbouring genes, was found at the der(8) short chromosome arm breakpoint.

Case 4

For case 4 with 46,XY,inv(5)(q13q35) (Figure 1d), sequencing of the whole genome paired-end library with inserts of 2–3 kb in size in one quarter of a slide using the SOLiD system was performed; 33 651 452 read pairs were generated out of which 7 826 287 pairs could be uniquely mapped in the human genome. A total of 510 640 pairs originate from chromosome 5. From these uniquely mapped read pairs, six were found to span the proximal junction fragment and two were from the distal one. Their coordinates enabled us to localise the breakpoint regions at chr5:95 127 797–95 129 377 and chr5:154 933 616–154 933 919. Subsequent Sanger sequencing of the PCR products amplified from the two junction fragments confirmed our finding and we could map the exact breakpoints to a deleted interval of 10 bp between chr5:95 128 323 and 95 128 333, and a deleted interval of 1 bp between chr5:154 933 881 and 154 933 883. Six bps of the breakpoint sequence on the distal junction fragment could not be mapped on the reference sequence (Figure 2d).

In this case, no known genes were found in a 200 kb region around the long arm breakpoint. The p-arm breakpoint disrupted RHOBTB3, a member of the evolutionarily conserved RHOBTB subfamily of Rho GTPases (Figure 3c). Further functional studies are required to determine whether truncated RHOBTB3 is causative of the clinical manifestations in case 4.

In this study, we described the use of a massively parallel paired-end sequencing approach to characterise chromosome rearrangement breakpoints with a resolution sufficient for subsequent PCR amplification and Sanger sequencing of junction fragments. For the two cases with reciprocal translocations, we took advantage of paired-end libraries with relatively small insert sizes (300–600 bp), which enabled us to map the breakpoints to an interval of 300 bp. This ultra-high mapping resolution facilitated the straightforward PCR confirmation. To reduce the sequencing costs, we constructed the libraries from flow-sorted derivative chromosomes. Depending on the purity of flow sorting (in our two cases, >60%), the required number of sequencing reads to achieve the same mapping resolution can be dramatically reduced compared to whole genome sequencing. However, depending on the physical properties of the chromosomes under investigation, such an approach is not always feasible and then whole genome paired-end sequencing is required. To show the efficiency, we constructed and sequenced paired-end libraries from genomic DNA in two cases carrying an inversion. The relatively large insert size of 2–3 kb was chosen to obtain higher genomic coverage per sequenced read, whereas the expected resolution still allowed performing subsequent PCR amplification of junction fragments without much effort. Indeed, with around 30 million sequencing reads, we could map the breakpoint to regions ranging from 300 to 1900 bp. The accurate mapping of chromosome breakpoints by massively parallel paired-end sequencing, as shown in our study, has enabled us to unambiguously identify the potentially affected/disrupted genes. Implementation of this method will pave the way for large-scale breakpoint mapping in disease-associated balanced genomic rearrangements.