INTRODUCTION

Nuclear pseudogenes of mitochondrial (mt) DNA were initially discovered in the early 80's1, 2, 3, 4, 5, 6. However, mechanisms for the generation of mtDNA pseudogenes are still not clear and may vary in different cases. Both RNA-7, 8 and DNA mediated9, 10, 11 processes have been suggested. mtDNA fragments may even act as transposable elements12, 8 and create new copies of pseudogenes. Evidences showed that some mtDNA pseudogenes had gone through gene amplification in different stages of evolution13, 14. Our previous studies also showed that some human nuclear mtDNA pseudogenes were multi-copy sequences, probably resulted from (a) very recent gene amplification event(s)15, 16. In domestic cat, such amplification of mtDNA pseudogenes has lead to tandemly repeated sequences17, 18. Possible role of mtDNA pseudogenes in carcinogenesis and aging was suggested8, 19, 20.

The number of mtDNA pseudogenes in the human genome has been estimated to be about a thousand copies, and all regions of mtDNA have nuclear homologies10, 21. Here we report 8 nuclear sequence homologous to bp 2477-2593 region of the mt 16S rRNA gene, in addition to those previously reported15. Three pseudogenes homologous to the same portion of the 16S rRNA gene, designated ψA, ψB and ψC, were detected employing a method based on a high-efficiency restriction digestion of cellular DNA and denaturant gradient gel electrophoresis (DGGE). At least two of them have variable copy numbers in different samples. Genome-scale screening for mtDNA pseudogenes in human yeast artificial chromosome (YAC) libraries led to discovery of numerous pseudogenes, one of which was located on a YAC clone carrying only repetitive sequence tag sites (STSs). The overall fraction of mtDNA pseudogene in the nuclear genome was estimated. Our results suggested that the mtDNA pseudogenes could become a variable part of the nuclear genome.

MATERIALS AND METHODS

Cell, tissue and DNA samples

The human lymphoblast line TK6 and its cloned population, TK6-KK2, were grown in stirred flasks in RPMI 1640 with 5% horse serum (JRH Biosciences, Lenexa, KS). Cell lines KB13, 30, 57 and 106 are cybrids of the nuclei from 143B20622 and the cytoplasms from patients with mitochondrial disease23. Cell line 143B12-3A2 is a rho° derivative of 143Btk-22 derived from a Kirsten murine sacroma virus-transformed human osteosarcoma cell line HOS24, which was isolated from a 13-year old Caucasian female25, 26. NA10495A, NA10496A, NA11325, NA11324, NA10846 and NA10847 are DNA samples from National Institute of General Medical Sciences (NIGMS). Human Genetic Mutant Cell Repository (Camden, NJ), isolated from cultured lymphocytes from Pygmy, Chinese and White donors. The human Y chromosome-specific YAC library27 was purchased from Research Genetics (Huntsville, AL, USA). The CEPH human YAC library28 was a gift from Dr. Z. Chen, Shanghai Second Medical University, Shanghai, China.

Polymerase-chain reaction (PCR)

PCR was carried out in 10 mM Tris.Cl, pH 8.4; 50 mM KCl; 2.25 mM MgCl2 0.1% gelatin; 0.2 μM each primer, 150 μM dNTPs, 0.1 μg/μl BSA, and 5 units Taq DNA polymerase (Perkin Elmer Cetus, Branchburg, NJ) per 100 μl reaction. The thermo-cycling of PCR was 94°C, 1 min; 45 °C, 1 min; 72°C, 1 min. Primers used in PCR are Rsma1 (5′ AAA AAA AGT AAA AGG AAC TC 3′ homologous to bp 2457-2476 of mtDNA29); Rsma2A (5′ AGG AAC AAG TGA TTA TGC TA 3′ homologous to bp 2613-2594 of mtDNA); Rsma3 (5′ CTC ACT GTC AAC CCA ACA CA 3′ homologous to bp 2415-2434 of mtDNA), Rsma4 (5′ TTC ACT GGT TAA AAG TAA GA 3′ homologous to bp 2677-2658 of mtDNA); W (5′GAA CTC GGC AAA TGT CGG CC 3′ homologous to bp 2471-2490 of mtDNA, mismatches with wild-type mtDNA underlined), and W/GC: 5′ CGC CCG CCG CGC CCC GCG CCC GTC CCG CCG CCC CCG CCC G CT CGG CAA AT G TCG GCC′3′ primer W attached to an artificial high-melting temperature GC clamp, which is in italics).

Detection of mtDNA pseudogenes by eliminatingwild-type sequences and DGGE

Cellular DNA was digested with SphI and PvuII. DNA fragments 215 +/−40bp in length were PAGE purified and recovered by electro-elution using DEAE-cellulose, in order to optimize the restriction efficiency of cellular DNA30. Wild-type mtDNA was eliminated by KpnI digestion and indigestible residual DNA was amplified with Taq DNA polymerase using primers W and RSMA2A. A 5μl portion of the reaction was further amplified with primers RSMA2A and 32P-labeled GC/W. Pseudogenes were displayed on a DGGE.

Other technologies

32P-end labeling of oligonucleotides using T4 polynucleotide kinase, digestion of DNA with restriction endonuclease (New England Biolabs, Berely, MA) and DNA sequencing using PCR Product Sequencing Kit (U.S. Biochemical, Cleveland, OH) were performed as per the manufacturer's instructions. PAGE purification of DNA was carried out on a 7% polyacrylamide gel (acrylamide:bis-acrylamide = 37.5:1). Gel pieces carrying appropriate DNA fragments were cut from the gel based on an ethidium bromide-stained pBR322/MspI standard digestion (New England Biolabs, Beverly, MA), so that cellular DNA would not be exposed to ethidium bromide or UV light. DNA in gel pieces was electro-eluted onto DEAE-cellulose and recovered by elution with 0.5 ml of 1 M NaCl.DGGE was performed as previously described31.

RESULTS AND DISCUSSION

Copy number polymorphism of the nuclear pseudogenes of mtDNA

Nuclear pseudogenes homologous to bp 2490-2594 region of mtDNA were screened from the total cellular DNA by eliminating the previously sequenced pseudogenes15, 32aswellas wild-type mtDNA (see materials and methods). Cellular DNA from two populations of human lymphoblast TK6 and four cybrid lines, KB13, 30, 57 and 106 containing nuclei from 143B206 (a rho° derivative of a human osteosarcoma cell line 143Btk-), were examined.

Two DNA sequences, designated ψA and ψB, were detected (Fig 1a). All cybird lines with nuclei from 143B206 had an extra band, ψB, as compared with TK6. Both ψA and ψB were observed in the rho° cell line 143B12-3A2, a rho° cell derivative of 143Btk-22, confirming that they were nuclear pseudogenes of the mt 16S rRNA gene, not mutated mtDNA (Fig 1b). The presence of ψA and ψB was also tested in five human blood samples from white donors (data not shown), and in six DNA samples isolated from cultured lymphocytes representing both male and female donors of three races, Pygmy, Chinese and Caucasian (Fig 1b). It was observed that ψB was detectable only in 9 out of 14 samples tested, while ψA was in all of the samples. The intensity ratio of ψA verses ψB as measured by Phosphorimager (Molecular Dynamics, Sunnyvale, CA) varied from 2.8–0.1 (Fig 1b). It was therefore suggested that the ψB is a polymorphic DNA sequence, and its presence was independent on the sex or race of sample donors. However, the ratio between ψA and ψB remained consistent among the four cybrids of the 143B206, suggesting that the change of the ratio was either infrequent, or took place in meiosis (Fig 1a).

Figure 1
figure 1

Pseudogenes ψA and ψB displayed on a DGGE. Cellular DNA from cultured cells (panel a) and from NIGNS DNA samples (panel b) were tested as described in ATERIALS AND METHODS Pseudogenes ψCA and ψCB were labeled with arrows. Sample names were indicated on top of each lane. The intensity ratios between bands A and B of the samples shown in panel b were measured by PhosphorimagerTM analysis (Molecular Dynamics, Sunnyvale, CA). Pygmy male and female, Chinese male and female, and White male and female were NA10495A, NA10496A, NA11325, NA11324, NA10846 and NA10847 from National Institute of General Medical Sciences (NIGMS), Human Genetic Mutant Cell Repository (Camden, NJ).

DNA sequencing indicated that ψA and ψB were 88 % homologous with wild-type mtDNA, and the only difference between the two sequences was that ψB carries an extra A →T mutation at bp 2512 compared to ψA (Fig 2).

Figure 2
figure 2

DNA sequences of wild-type mtDNA and mtDNA pseudogenes. The GeneBank accession numbers were labeled next to the names of the pseudogene sequences. Numbers indicated the base position in the Cambridge sequence [Anderson et al. 1981]. The recognition site for pnKpnI was in italic. Sequences to which primers Rsma3, Rsma1, W, Rsma2A and Rsma4 combine were underlined.

The copy number of ψB was further examined in three Chinese pedigrees. The copy number ratio between ψA and ψB was found to be in general consistent within the pedigrees, suggesting that the copy number variation of ψB is probably infrequent even in meiosis. However, another pseudogene, ψC, was unexpectedly identified and had variable copy numbers even between siblings. The assay was highly reproducible as demonstrated in the quadruplicated assay (Fig 3).

Figure 3
figure 3

Distribution of pseudogenes ψA, ψB and ψ C in pedigrees. Samples were treated the same way as described in Fig 1 except that the DGGE gels were ethidium bromide-stained. Panels a, b and c were the DGGE display of pseudogenes from pedigrees A, B, and C. f, father; m, mother. The numbered samples in each pedigree represented the children from the parents f and m. WhF was the white female sample in Fig 3. Panel r was a quadruplicated study indicating that the variable intensity ratio between pseudogenes ψA, ψB and ψC were highly reproducible and was not an experimental artifact. The ratios measured by phosphorimagerTM analysis were WhF (1/1.2/0); A6(6/6/1); (6/6/1); (6/6/1.2) (6/6/1); A8(1/1/0.9); (1/1/1); (1/1/0.9); (1/1.1/0.9)

Nuclear pseudogenes of mitochondrial DNA as a variable part of the nuclear genome

The observation that ψB is one base more divergent from wild-type mtDNA than ψA suggested that ψB was derived from ψA much more recently than the generation of their common ancestor sequence. This is consistent with observations from other mtDNA pseudogenes1, 13, 14, 15, 16. During the in vitro enzymatic amplification, ψA, ψB and ψC served as quantitative internal standards of each other, which should accurately reflect their relative gene dosage even at the plateau phase of PCR32, 33, 34. The variable intensity ratio between ψA, ψB and ψC thus implied that all three pseudogenes might be multi-copy sequences, and that the copy number of at least two of the pseudogenes differed among the samples. The fact that ψA presents in all samples lead to the hypothesis that ψB and ψC were the ones with variable copy numbers, although the possibility that all three pseudogenes had copy number polymorphisms could not be ruled out. We thus suggest that mtDNA pseudogenes might have been involved in the variable part of the human genome.

PCR screening of YAC libraries for mtDNA pseudogenes

We were very interested in the mechanism(s) by which the copy number of mtDNA pseudogenes changed. Cloning of those multi-copy pseudogenes would help to understand the related gene-amplification mechanisms. However, direct cloning a particular pseudogene via any hybridization-based technology was proven unsuccessful (data not shown). Major causes of the failure included the interference from the wild-type mtDNA fragments cloned in most genomic libraries, the presence of a large number of pseudogenes homologous to the same part of the mtDNA and the instability of the cloned multicopy pseudogenes. An alternative approach we took was to screen human YAC libraries for multi-copy pseudogenes by PCR15, expecting some YAC clones might have maintained some repetitive sequences.

From a Y chromosome-specific YAC library27, a YAC clone #143 was found to carry a new mtDNA pseudogene, designated P si Y. PCR amplification using primers Rsma3 and Rsma4 give rise to a longer sequence as shown in Fig 2. It was interesting that the YAC clone #143 could not be located on a particular position of the Y chromosome because it contained only repetitive STSs. The observation was consistent with the hypothesis that the mtDNA pseudogene was involved in the variable part of the nuclear genome.

Similar screening of 105 YAC pools, each containing 96 YAC clones which carried 800-1200 kb genomic DNA each, from the CEPH human YAC library using primers Rsma1 and Rsma2a yielded a large number of pseudogenes. Those pseudogenes, after being further amplified with Rsma2a and a nested primer W/GC, were displayed on a denaturant gradient gel electrophoresis (DGGE). On average, one pseudogene that has a melting temperature different from that of the wild-type mtDNA could be amplified from each 100 YAC clones, which was approximately 108 bp of human genomic DNA (data not shown), or 1/30 of a haploid human genome. Four of them from the YAC clones 963B7, 964F2, 699H8 and 965F2 were sequenced (Fig 2). However, none of the known multi-copy pseudogenes was detected from the YAC libraries, probably due to the instability of the YAC clones containing the repetitive pseudogene sequences.

The PCR product amplified by primers Rsma1-Rsma2a is 157 bp in length, and the average length of the published uninterrupted mammalian mtDNA pseudo-genes was 1228bp13, 21, 35, 36, 37 (including those we identified in GenBank-deposited human genomic sequences, GenBank accession number U66061, bp46495-48989; AF029308, bp50199-52715; and AC004035, bp5778-7569, bp7870-9657, bp9990-10604 and bp10924-11009). If that is typical in all mtDNA pseudogenes, a mtDNA pseudo-gene should have (1228-157)/16569=6.46% of chance to be amplified by a given primer pair that generates a 157 bp PCR product. Assuming that most part of the mtDNA has the similar chance to be integrated into the nuclear genome, each pseudogene amplified by the primers Rsma1-Rsma2a should represent 1228/6.46 %=1.9 × 104 bp of mtDNA-like sequence in the chromosomes. The amount of the mtDNA-like sequence in the human haploid genome can thus be estimated to be at least 1.9 × 104 × 30=5.7 times 105, or 35 equivalents of the complete mt genome. Considering the presence of the multi-copy pseudogenes and the pseudogenes that might be identical to the wild-type sequence16, the real fraction of the mtDNA pseudogenes in the human nuclear genome should be even greater.

We observed mtDNA pseudogenes with variable copy numbers, and estimated the fraction of mtDNA pseudogenes in the nuclear genome. It was known that microsatellite DNA changed copy numbers via intra-allelic duplications/deletions and inter-allelic recombination/conversions depending on flanking sequences38, 39, 40. It would be interesting to find out whether the mtDNA pseudogenes changed copy numbers as minisatellite DNA did. Understanding the flanking sequence of the multi-copy pseudogenes, especially those with variable copy numbers such as ψB and ψC, would provide more information on how mtDNA pseudogenes had been generated during evolution, and how they might affect the genetic stability of the human nuclear genome.