Introduction

Retinitis pigmentosa (RP; MIM 268000) is a group of inherited retinal degeneration disorders, affecting 1 in 3700 individuals in the world population (http://www.ncbi.nlm.nih.gov/Omim/). It is characterised by night blindness, progressive loss of the peripheral visual field, bone spicule-like pigmentary deposits and abnormal electroretinograms (ERGs). This condition is genetically heterogeneous and is inherited in an autosomal recessive, autosomal dominant or X-linked fashion. However, in the majority of cases (about 50–60% in the Caucasian population) it is impossible to determine the pattern of inheritance by the absence of family history. To date, mutations in over thirty genes have been found to have a causative role in RP, but a large number of RP genes have not yet been identified (http://www.sph.uth.tmc.edu/Retnet/).

The retinitis pigmentosa 1 (RP1) gene plays a role in the pathogenesis of 5–10% of autosomal dominant RP.1,2 RP1 is a photoreceptor-specific gene located on chromosome 8q12, and encodes a predicted protein sharing, in its N-terminal region, significant sequence homology with a domain of the human doublecortin protein (DC domain) known to interact with microtubules.1,3,4 Recently, immunofluorescence analysis showed that the RP1 protein is localised in the connecting cilia of rod and cone photoreceptors, which suggests that this protein plays a role in the transport of proteins or in the maintenance of cilial structure.5 This hypothesis is also strengthened by recent studies carried out in mice with a targeted disruption of the Rp1 gene. In these mutant animals, photoreceptor cell layers undergo progressive degeneration and disorganisation. These abnormalities are preceded by mislocalisation of rhodopsin in inner segments and cell bodies of rods.6

We screened public EST databases to identify, catalogue and characterise novel retina-specific genes (manuscript in preparation). The highly specialised function of the retina is likely to require a large number of genes expressed specifically or predominantly in this tissue. Such genes may play a critical role for the function of the retina and, when defective, may cause or predispose to retinal disease. In the course of this project, we identified a novel retina-specific gene, retinitis pigmentosa 1-like 1 (RP1L1), and its murine homologue Rp1l1, showing significant sequence similarity to the RP1 gene. We performed a detailed study of the expression of RP1L1 gene using both Northern-blot and semi-quantitative RT–PCR analyses on different human tissues. We demonstrate that RP1L1 expression is restricted to the retina, suggesting that this gene may be involved in the pathogenesis of retinal degenerations.

Materials and methods

PCR analysis

For semi-quantitative RT–PCR, total RNA from seven human tissues (brain, liver, lung, skeletal muscle, placenta, heart, and kidney) was purchased from Clontech. Total RNA from retina, foetal eye, RPE/choroid, foetal cochlea and RPE cell line, ARPE-19, was isolated with RNAzol B (Campro Scientific) or CsCl purification and treated with Dnase I (Gibco/BRL). Semi-quantitative RT–PCRs were performed as described.7 A 5 μg aliquot of total brain, retina, and a retinoblastoma cell line (WERI-Rb) RNA, isolated as previously described,8 was used for reverse transcription, which was carried out using random hexanucleotide priming and Superscript II (Gibco BRL) in a 20 μl reaction, according to the protocol provided by the manufacturer. PCR with RP1L1-specific primers was performed using 1 μl of the reverse transcription reaction as template in a standard PCR reaction with AmpliTaq Gold (Applied Biosystem). In each experiment, a sample without reverse transcriptase was amplified under the same conditions as the reverse-transcribed RNA. PCR products were purified from agarose gels by the Qiagen Gel extraction Kit and directly sequenced on an automated sequencer (ABI 3100; Applied Biosystem) using the ABI-PRISM big-dye terminator cycle sequencing ready reaction kit (Applied Biosystem). Oligonucleotide primers used in RT–PCR experiments were the following: o5F, 5′-CTC CAG CTT TCC GCT CAG CC-3′; o5R, 5′-AGC CTC CAG CAC CGG CCT C-3′.

To identify the full-length of the RP1L1 gene, we performed 5′ and 3′ RACE–PCR on human retina Marathon-Ready cDNA (Clontech) according to the protocol provided. A first round PCR was performed using the adaptor primer AP1 (Clontech) and the gene-specific primers o5/5RACE/F (5′-GTG GGA TGC AGG GAA GTC TTT GGC CGA G-3′) and o5/3RACE/R (5′-ATG AGG TGA GTG CCA ACA GAA GCC AAG GAG-3′) for 5′ and 3′ RACE respectively. A nested PCR was then carried out using the adaptor primer AP2 and the specific primers o5/5RACE/nF (5′-AGC CAG GAC AGT GCC AGC CCA G-3′) and o5/3RACE/nR (5′-AAA GCC GTC TGC CCT GCC CAC TGC-3′) for 5′ and 3′ RACE, respectively. We cloned the resulting PCR products in a TOPO TA cloning vector (Invitrogen), according to the manufacturer's instructions, and directly sequenced them.

For analysis of polymorphisms in the human RP1L1 sequence, genomic DNA was extracted from peripheral blood leukocytes using standard lysis/phenol extraction protocols. PCR analysis was performed on 20 normal individuals using oligonucleotide primers RP1L1/4-9F (5′-TCG AAC CTG GAG CAG TTA GC-3′) and RP1L1/4-9R/5′-AGC CTC TCC TTG CAG TCC TC-3′). PCR reactions were performed with AmpliTaq Gold (Applied Biosystem) using a ABI 9700 automatic thermal cycler under the following conditions: 95°C for 10 min; then 35 cycles of 94°C for 1 min, 59°C for 1 min, and 72°C for 2 min; and a final step of 72°C for 10 min. PCR products were purified and sequenced as described above.

Expression studies

Northern-blot analysis

Probe synthesis was obtained by PCR amplification on genomic DNA using primer pairs: h/5F (5′-GAG CCC TCA GGT CAG TCT AG-3′) and h/5R (5′-GCT CTC TGA CAC TTC TGG AC-3′) for the human gene, and m/5F (5′-CAT CCA GTC TCC TCA GCT TC-3′) and m/5R (5′-GTG GGA GTT GAT GAG TGA GC-3′) for the mouse gene. Human and mouse multiple-tissue Northern blots (Clontech) were hybridised using (α-32P)dCTP-labelled probes according to the protocol provided. The hybridised filters were washed once in 2×SSC (300 mM NaCl, 30 mM sodium citrate, pH 7.5), 0.1% SDS at 65°C, twice in 1×SSC, 0.1% SDS at 65°C, and once in 0.5×SSC, 0.1% SDS at 65°C.

In situ hybridisation was performed using standard techniques.9 The anti-sense DIG-labelled riboprobes were transcribed from three different probes corresponding to both the 5′ end and the 3′ end of the transcript. The riboprobes were prepared by the DIG RNA labelling Kit from Roche according to the protocol supplied. Hybridisation was performed on a minimum of three slides in at least five independent experiments. Tissue sections of 20 micron thickness were cut from CD1 mouse embryos at 14.5 days of gestation (E14.5) and adult eyes of different age (P40–80). For each eye, at least two series of sections were prepared. In each ISH experiment three slides derived from independent series were used.

Bioinformatic analysis

cDNA sequence analysis and nucleotide and protein database searches were performed by using the BLAST210 algorithm. RP1L1 protein sequences were scanned for the presence of motifs and domains using PFAM (http://www.sanger.ac.uk/Software/Pfam/index.shtml)11 and SMART (http://smart.embl-heidelberg.de/).12 Sequence alignment was performed using the clustalW program (http://www.hgsc.bcm.tmc.edu/). For the prediction of the coiled coil domains, the COILS program was used.13

Results

Identification of the human RP1L1 gene

In the course of a project aimed at the identification of novel retina-specific genes, we isolated a cDNA, provisionally termed oEST5, corresponding to the UniGene cluster Hs. 33538 (a cluster of eight ESTs, all derived from retina cDNA libraries), showing a significant sequence similarity to the N-terminal region of the RP1 gene.1,2 Sequence analysis of one EST clone (IMAGE ID 363208), belonging to this Unigene cluster revealed the presence of a 1500-bp open reading frame (ORF) starting with an ATG located at position 277 and preceded by an in-frame stop codon located 57 nucleotides upstream. This ORF was incomplete at the 3′ end, since no in-frame stop codons were found. By analysing the public human genomic sequence at NCBI (http://www.ncbi.nlm.nih.gov) and at the Human Genome Browser at the University of Santa Cruz (UCSC, http://genome.ucsc.edu/) we identified two overlapping BAC clones, CTD-2135J3 and RP11-981G7 (GenBank AC105001 and AC104964), on chromosome 8p23.1, containing the entire sequence of IMAGE clone 363208. Analysis of these two clones provided us with the intron/exon junctions of the four exons of this novel gene (Figure 1). Moreover, we found that BAC clone RP11-981G7 also contained two human retina ESTs (GenBank BG395268 and W27732) located approximately 6–10 kb distal of the 3′ end of clone 363208 (Figure 1). By reverse transcriptase (RT)–PCR experiments with oligonucleotide primers localised on the two above mentioned retina ESTs, and on the IMAGE clone 363208 sequence, we were able to demonstrate that these three sequences were part of the same transcript. Subsequently, 5′ and 3′ RACE experiments, as well as analysis of the human genome sequence at UCSC, enabled us to assemble the putative full-length cDNA sequence of the oEST5 transcript, which is 8194 bp long (GenBank AJ491324) and contains an ORF of 7392 bp encoding a predicted protein of 2464 amino acids (Figure 2a). The first ATG in the transcript is located at position 277, within a sequence content that, based on the presence of a purine at position −3, satisfies Kozak criteria for an efficient translation initiation codon,14 while the putative translation termination codon is located at position 7669. The bona fide nature of this transcript was confirmed by RT–PCR experiments on RNA from human retina and from retinoblastoma cell lines (data not shown). A region of 1167 bp, between positions 5968 and 7135, encodes 25 imperfect copies of a 16-aa repeat module (see also below) and could be less efficiently PCR amplified both from RNA and from genomic DNA. This type of nucleotide repetitive sequence, which contains a high GC content (62%), might adopt unusual non B-DNA conformations, including triplex structures, which are associated with reduced fidelity of replication.

Figure 1
figure 1

(A) Schematic representation of physical and transcript maps of the region encompassing the human RP1L1 gene. The region is covered by two overlapping BAC clones (CTD-2135J3 and RP11-981G7). Polymorphic markers and genes are indicated on the bar below the chromosome. (B) The spliced EST clone 363208 and the ESTs BG395268 and W27732 belonging to the RP1L1 gene are shown. (C) Genomic structure of the human RP1L1 gene, showing the exon/intron structure, as well as exon and intron sizes.

Figure 2
figure 2

(A) Deduced amino acid sequence of the human RP1L1 protein. The DC domains are indicated by a dashed line, the two amino acid repeats are underlined and the two coiled coil domains are boxed. (B) Multiple sequence alignment of the DC domains of the human RP1L1 (hRP1L1), mouse Rp1l1 (mRp1l1), fugu Rp1l1 (fRp1l1) and human RP1 hRP1), mouse Rp1 (mRp1), Fugu Rp1 (fRp1). Conserved residues (>70% identity) among the hRP1L1, mRp1l1 and fRp1l1 proteins are shown in green; conserved residues (>70% identity) among the hRP1, mRP1 and fRP1 protein are shown in red; conserved residues (>70% identity) among all the six proteins are shown in white. Amino acid residues identical in all the six proteins are also indicated.

The alignment of the oEST5 cDNA sequence to the human genome sequence revealed that this novel gene is organised in four exons of 258, 628, 141, 6667 bp (Figure 1), respectively. All exons showed donor and acceptor splice site sequences conforming to the GT/AG rule. The oEST5 gene spans approximately 52 kb of genomic DNA and maps 122 kb proximal to the sex determining region Y-box 7 gene (SOX7) and approximately 319 kb distal to the methionine sulphoxide reductase A (MRSA) gene (Figure 1). The significant sequence homology of oEST5 with the RP1 protein (see below) prompted us to rename this cDNA retinitis pigmentosa 1-like 1 (RP1L1).

Sequence analysis of the RP1L1 predicted protein

We analysed the RP1L1 predicted protein for similar protein sequences and predicted domains using BLASTP and SMART analyses.12,15 This search revealed the presence of two DC domains (CDD: smart00357) in the N-terminal region, at residues 33–113 and 147–228 respectively, sharing the highest sequence homology with the RP1 protein (Figure 2b) and, at lower levels, with doublecortin and other proteins containing these domains (data not shown). The homology of RP1L1 with RP1 is not limited to the DC domains but encompasses the first 350 amino acids at the N-terminus (Figure 2b). The sequence similarity between RP1L1 and RP1 is only confined to their N-terminal regions while the remaining portions of these two large proteins are completely unrelated.

Using the program developed by Lupas et al.,13 we found that the predicted RP1L1 protein contains two putative coiled-coil domains. The program predicted the presence of several heptad repeats, characteristic of alpha-helices that form dimeric coiled-coil structures. Heptad repeats were predicted with high probability in two regions of the C-terminal part of the RP1L1 predicted protein, between amino acids 2026 and 2053 and between amino acids 2120 and 2140, respectively. In addition, the RP1L1 protein sequence was found to contain two imperfect amino acid repeats (Figure 2a). The first repeat is located between amino acids 1298 and 1443, and contains a region of low sequence complexity with high glutamic acid content. The consensus sequence of the 16-aa repeat module is EGLQEEGVQLEETKTE (Figure 3), and does not share any significant sequence similarity with known motifs and proteins. The number of units is polymorphic in the normal population, since we found that one of the repeat modules is present 1 to 5 times in 20 individuals analysed (Figure 3). The polymorphic repeat module is surrounded by six additional modules, the first two at the N-terminal side and the remaining four at the C-terminal side (Figure 3). The second amino acid repeats, extending from amino acids 1896 to 2307, also contains a region of low sequence complexity with high glutamic acid and alanine content. We performed a BLASTP analysis without filtering for low complexity regions, and found that the repeat units is slightly similar to the ‘Plaid domain’ of the RPGR protein,16 involved in an X-linked form of retinitis pigmentosa.17 The consensus sequence of this repeat is 16 amino acids long (EAPEAEGEAQPESEGV) and is repeated 25 times. We could not determine whether the length of the this second repeat is polymorphic, since PCR amplification on genomic DNA could not be carried out reliably due to the highly repetitive content of this sequence.

Figure 3
figure 3

Structure of the first amino acid repeat in the human RP1L1 protein. The third module of the repeat (EGLQEEGVQLEETKTE) is polymorphic in the normal population while the remaining six are invariant.

Identification of the murine and fugu Rp1l1 genes

To identify the murine homologue of the RP1L1 gene, we screened the Mouse Genome Browser at UCSC with the entire human RP1L1 sequence, and we were able to obtain the putative full-length of the mouse Rp1l1 cDNA (GenBank AJ491325). Rp1l1 shows the same intron/exon organisation as its human counterpart and is localised to mouse chromosome 14, in the syntenic region of human chromosome 8p23.1. The size of the Rp1l1 mRNA is approximately 6.5 kb and contains an ORF of 5580 bp, encoding a predicted protein of 1859 amino acids, 603 amino acids shorter than the RP1L1 protein. The overall homology of the murine Rp1l1 gene, with its human counterpart, is unusually low, since it shows 60% identity at the nucleotide level and 46% similarity and 39% identity at the protein level. However, the degree of similarity between the two sequences is not homogeneous along their entire lengths. The highest degree of conservation was found in the N-terminal region (63% identity and 71% similarity at the protein level in the first 400 amino acids) while the C-terminal regions was found to be less conserved. The murine Rp1l1 protein contains the two DC domains but does not contain either significant prediction of coiled coils domains or the two repeats found in human RP1L1. However, similar to the second repeat, the mouse protein contains a short region rich in glutamic acid residues.

Interestingly, we also identified a partial sequence of the predicted pufferfish Rp1l1 protein (Genscan 789.1, 2115 amino acids) through the analysis of a database containing the information on the annotated Fugu rubripes genomic sequence (http://www.fugubase.org).18 This peptide is clearly distinct from the putative Rp1 fugu homologue (Genscan 2330.1, 1627 amino acids) (Figure 2b), demonstrating that also in this organism both paralogues of the Rp1 family of genes are present (Figure 2b). Similar to what we have already observed when comparing the human and mouse proteins, the fugu Rp1l1 predicted peptide showed significant sequence similarity with its mammalian homologues in the first 350 amino acids (Figure 2b) while the homology was much lower in the remaining portion of the protein.

Expression study

To determine the expression pattern of the RP1L1 gene, we first carried out a semi-quantitative RT–PCR analysis (25, 30 and 35 cycles) on RNA from 12 human tissues, using the oligonucleotide primers oEST5F and oEST5R. This analysis revealed that the RP1L1 gene is specifically expressed in retina, since a product was exclusively observed in this tissue after 25 and 30 cycles (Figure 4a). However, after 35 cycles we were also able to detect weaker amplifications in other tissues (data not shown). These data were confirmed by Northern blot analysis carried out on murine adult tissues using as probe a 900 bp fragment of the Rp1l1 cDNA covering the 3′ end of the coding region. Two weak bands of about 7.5 and 5 kb were only detected in mouse adult retina RNA, while no expression could be detected in other tissues (Figure 4b). The 7.5 kb band may correspond to the full-length transcript, whereas the smaller product could correspond to a transcript derived from alternative splicing or by the use of an alternative polyadenylation site. Northern analysis with a human RP1L1 probe on a RNA panel from 12 adult human tissues, but not including retina, did not reveal any hybridisation product (data not shown). The restricted pattern of expression of the RP1L1 gene is also confirmed by the evidence that all the 19 ESTs (12 independent cDNA clones) corresponding to this gene present in dbEST (release 083002), are derived from retina and eye cDNA libraries.

Figure 4
figure 4

Expression of the RP1L1 transcript in human tissues. (A) Semiquantitative RT–PCR analysis performed with RP1L1 oligonucleotide primers after 25 cycles (top panel) and 30 cycles (middle panel). The bottom panel shows RT–PCR amplification of the ubiquitously expressed GAPDH cDNA. Lanes: 1 Liver; 2 Kidney; 3 Lung; 4 Retina; 5 Skeletal muscle; 6 ARPE-19; 7Placenta; 8 RPE/Choroid; 9 Heart; 10 Foetal cochlea; 11 Brain; 12 Foetal eye; 13 no-cDNA. (B) Northern blot containing RNA from mouse adult tissues hybridised with an RP1L1 probe. Lane 1 Heart; 2 Brain; 3 Spleen; 4 Lung; 5 Liver; 6 skeletal Muscle; 7 Kidney; 8 Testis; 9 Retina.

To gain more insight into the spatial and temporal expression of RP1L1 in mouse retina, we performed RNA in situ hybridisation experiments on mouse embryos (E9.5, E11.5, E14.5) and on adult mouse eyes. We used as probes three different fragments derived from different portions of the transcript, but we could not detect any signal above the background level (data not shown).

Discussion

In the present study, we describe the molecular cloning, determination of the genomic structure and study of the expression pattern of a novel retina-specific gene, termed retinitis pigmentosa 1-like (RP1L1). RP1L1 shows a significant sequence similarity to RP1, a gene responsible for an autosomal dominant form of retinitis pigmentosa.1,2 The homology between RP1L1 and RP1 is confined to the N-terminal region, which is characterised by the presence of a DC domain. This domain is normally found in the N-terminus of proteins, and consists of one or two tandemly repeated copies of a region including about 80 amino acids. It is important to note that the sequence similarity between RP1 and RP1L1 is not restricted to the DC domains but extends to a larger region including the first 350 amino acids (Figure 2b). The genomic organization of RP1 and RP1L1 is also similar, since both genes have one non-coding exon followed by three coding exons with similar sizes. These observations suggest that these two genes may be derived from the duplication of a common ancestor gene.

Both the Rp1 and the Rp1l1 protein are conserved in mammals and in distant vertebrates, as we identified their putative homologues in Fugu rubripes, an organism whose evolutionary distance with humans is about 450 million years. However, the degree of sequence identity of these proteins across evolution is not very high and is not homogeneous along the entire protein. It was previously reported that the degree of overall sequence identity between the human and the mouse Rp1 proteins is not particularly high (about 60%).5 We confirm this finding in the Rp1l1 protein, which displays a very low degree of overall sequence identity (only 39% identity) compared to the average values of sequence similarity observed between human and mouse proteins.19 The most conserved portion of the human, mouse and fugu Rp1l1 proteins is the most N-terminal region, which is characterised by the presence of the DC domains. The remaining part of the proteins shows a much lower degree of sequence similarity indicating that this part of the protein underwent a faster divergence during evolution due to lower sequence constraints. This is also confirmed by the presence of a significant number of amino acidic variations in the human population including a repeated sequence that is polymorphic in its copy number (Figure 3 and data not shown).

The DC domain, which characterises the N-terminal region of the RP1L1 protein, was first identified in doublecortin (DCX), a brain specific protein implicated in X-linked lissencephaly and double cortex syndrome.20 It has been recently shown that doublecortin as well as other proteins containing two tandemly repeated copies of the DC domain, such as DCAMKL1, binds to microtubules through the DC domain, thus defining a new family of microtubule-associated proteins (MAPs).3,21,22 The localisation of the RP1 protein to the connecting cilia of photoreceptors5 led to the hypothesis that this protein may interact with the microtubules of the connecting cilia through the DC domains. Based on these observations, the presence of two DC domains in RP1L1 suggests that this protein may represent a novel member of the MAP family. The central and the C-terminal region of the Rp1l1 protein, which is poorly conserved across evolution, is characterised in human by the presence of two imperfect repetitive modules with high glutamic acid content, one of which shares moderate sequence similarity with the RPGR protein.16 Similar to the Rp1 protein, Rpgr was also localised to the connecting cilia of photoreceptors in mouse, and seems to play an important role in the maintenance of a polarised distribution of outer segment-specific protein(s).23

The relevant sequence similarity of RP1L1 with RP1 and, to a lesser extent, with RPGR, as well as its expression restricted to the adult retina suggest that this gene may play a similar role in the retina and may be involved in the pathogenesis of retinitis pigmentosa. Currently, there are no known loci for retinal degenerations mapping to 8p23 where the RP1L1 gene has been localised. To determine the possible involvement of RP1L1 in RP and to determine its role in postnatal retina and its possible interactions with RP1 and RPGR, it will be necessary to carry out mutation analysis in a large cohort of RP patients and to perform further functional studies.