Introduction

In the last few years a new cysteine-rich structural motif of approximately 50 amino acid residues, known as either the trefoil motif or the P-domain, has been described [1, 2]. The conserved six cysteine residues form three intramolecular disulfide bridges resulting in three loops which are responsible for the trefoil-like shape and its resistance against proteolytic degradation. P-domains were discovered in mucins of Xenopus skin as well as in porcine, rodent and human gastrointestinal peptides [for a review, see ref. 3]. P-domain-containing peptides display very distinctive expression patterns in normal as well as in pathological gastrointestinal tissues. Although their precise physiological role is not understood, recent in vitro models propose that the peptides are involved in the maintenance of mucosal integrity and may accelerate ulcer healing, presumably by enhancing cell migration after wounding [4, 5]. Transgenic mice that overexpress a human trefoil peptide exhibit increased resistance to intestinal damage [6].

The human pS2 gene, found under estrogen transcriptional control in a subclass of breast cancer cells, was reported to be expressed in normal stomach surface epithelial cells, whereas additional gastrointestinal tissues, such as pancreas and colon, do not produce pS2 at all [7]. Estrogen-independent expression in various tumors of the human gastrointestinal tract, like carcinoma of the stomach, pancreas, colon and biliary tract, was noted [811]. The mechanism of pS2 gene activation and its biological role in carcinogenesis remains to be elucidated.

A remarkable sequence homology exists between pS2, a porcine pancreatic spasmolytic protein (PSP), and its subsequently isolated human counterpart, hSP [2]. Like the pS2 gene, the hSP gene (also termed human spasmolysin; SML1) is expressed in stomach mucosa, but not in any other tissue of the normal gastrointestinal tract. In stomach, pancreas and biliary tract carcinomas, however, the presence of the regular pS2 mRNA is strongly correlated with hSP/SMLl expression [810] as well as in ulceration-associated cell lineage in Crohn’s disease [12]. Interestingly, both genes were localized in the same chromosomal region 21q22.3 [13].

Moreover, the human cDNA sequence [14], and genomic sequences of human and rat [15, 16] were presented for the third P-domain peptide, the intestinal trefoil factor (hITF). In contrast to stomach specific expression of pS2 and hSP, hITF is found in the human intestine. Most recently, we also mapped the genomic locus for hITF to 21q22.3 [17] and found that all three trefoil peptide genes are clustered within a region of less than 100 kb [18].

With the coordinated cell- and tissue-specific gene expression in mind, we show here the head-to-tail organization of the gene cluster of the three trefoil proteins, compare their 5′-flanking regions and present evidence for a cell-specific and coordinated transcriptional regulation. The structural organization of the genes provides a framework to study gene regulation as a response to gastrointestinal pathological conditions such as defense against pathogens, ulcer healing and carcinogenesis.

Materials and Methods

Restriction Analysis, Southern Blotting and Hybridization of BAC DNA

BAC DNA was prepared by Standard alkaline lysis and purified by phenol/chloroform and ethanol precipitation. HindIII restriction digests were separated on 0.7% agarose-TAE gels. Digests by rare cutting enzymes were separated by PFGE on 1% agarose gels in 0.25 x TBE using an LKB 2015 electrophoresis unit, as described [18]. DNA was transferred in alkaline buffer for 2 h to Nytran membrane (Schleicher & Schüll) by downward blotting. DNA probes were labeled with 32P-dATP by random priming (Boehringer Mannheim Biochemicals). A 1-kb fragment of the 5′-flanking region of BCEI (−90 to −1100) was isolated from the plasmid pS2-cat. A 300-bp cDNA fragment of SML1 was isolated from pGEM-hsp200 [2]. Both Plasmids were kindly provided by Dr. M.-C. Rio, Strasbourg. All other probes for hybridization were generated by PCR with the oligonucleotides listed in table 1. Probes were hybridized at 65 °C overnight in 7% SDS, 0.5 M sodium phosphate buffer.

Table 1 Oligonucleotides used for PCR

PCR Analysis

Oligonucleotides were designed from cDNA or genomic sequences of all three trefoil genes [2, 15, 19]. Products listed in table 1 were amplified from 10 ng BAC DNA using Gold Star polymerase (Eurogentech, Belgium) or the Expand Long Template PCR System (Boehringer Mannheim Biochemicals) according to standard conditions recommended by the manufacturers.

Sequence Analysis

Cycle sequencing with ThermoSequenase (Amersham Life Science) using either 32P-primer labeling or 35S-dATP internal labeling was performed according to the manufacturer’s protocol. Sequence data were processed by BLASTN, FACTOR, and SIMILARITY algorithms of the GCG Wisconsin package.

Gene Expression Study by RT-PCR

For RT-PCR studies we used paraffin-embeded material from two samples of normal mucosa of the stomach, and four gastric hyperplastic polyps. Frozen material of two cases of gastric carcinoma and material from cell lines GP220 and GP202 were also used [20]. RNA was extracted according to the method described by Chomczynski and Sacchi [21]. First-strand synthesis was made by random 6-mer primer using M-MLV-reverse transcriptase at 42°C for 15 min in the presence of 7 mM MgCl2. For PCR we used Taq polymerase in the presence of 2 mM MgCl2. Specific primers for hSP (5′-GGATCAGTGCTGCATGGAG and 5′-GTTGGAGAAGCAGCACTTCC), pS2 (PS1 and PS2, table 1) and hITF (HITFF and HITFR, table 1) were chosen. Cycle conditions were 95°C, 4 min; 95°C, 1 min (30 cycles); 64°C or 55°C (for hSP) 1 min; 72°C, 1 min. PCR controls without the addition of reverse transcriptase were routinely performed to exclude genomic DNA contamination.

Results

Orientation of the Contig

We have previously reported the isolation of clones coding for the three trefoil peptides from a human genomic BAC library [18]. These contiguous genomic fragments were ordered by HindIII restriction finger printing, determining overlapping segments after hybridization of BAC 921F4 and BAC 801B4 (fig. 1). The size of each BAC determined by comparison of fragments produced by rare cutters (NotI, SalI, MluI, NruI, SgrAI) after PFGE is concordant with data obtained by summarising HindIII fragments. Thus, the contig spans a region of 350 kb. In addition, all seven BACs were probed independently by PCR and hybridization with the well-mapped markers D21S19 and D21S212. Only BACs 843E9 and 43A9 were positive for D21S19 (fig. 2). The known relative map position of BCEI and D21S19 allowed the chromosomal orientation of the whole cluster. BCEI was placed 40–65 kb proximal to D21S19. D21S212 was not localized within the 350-kb BAC contig.

Fig. 1
figure 1

Fingerprint analysis of a contig of 7 BACs. 1 µg of BAC DNA was digested with HindIII, Southern blotted to duplicate filters and hybridized with labeled BAC 801B9 (left) and BAC 921F2 (right).

Fig. 2
figure 2

Genomic organization of the gene cluster of trefoil peptides within a BAC contig located on 21q22.3 (upper part), and restriction enzyme mapping, gene order, orientation, and exon-intron structure of trefoil peptides on BAC 921F2 (lower part). Small upper bars indicate HindIII sites as determined by Southern hybridization with various probes, lower bars and hatched boxes indicate gene exons.

Distances and Orientation of Genes

The three trefoil genes BCEI, SML1 and TFF3, encoding pS2, hSP, and hITF, respectively, were mapped in detail by PCR with appropriate primers (table 1) using the six BACs shown in figure 2. The genes’ positioning was verified by hybridizing gene probes to restriction fragments generated by rare cutting enzymes (fig. 2). Whereas BAC 1125H6 was positive for TFF3, exclusively, BACs 548B9 and 90E5 in addition contained SML1. BACs 921F4, 843E9 and 43A9 yielded signals with all three gene probes (fig. 2).

Using outward-directed primers derived from the known cDNA sequences, long range PCR was performed to determine the intergenic distances of the three genes. A 12-kb product was specifically generated using primers T1 and R2 derived from the 3′-end of BCEI and the 5′-end of SML1. The product was verified by a second PCR with nested primers T11 and R22. Thus the orientation of both genes is organized in tandem (head to tail) and transcription is directed towards the centromere. The orientation and location of BCEI were further verified using Southern hybridization. A single NruI site, localized in intron I, created restriction fragments of 45 and 55 kb, which hybridized specifically with a probe harboring the promoter (−1100 to −90) and a probe harboring exon II, respectively. Attempts to generate a PCR fragment between SML1 and TFF3 failed, presumably because of the large distance (30–35 kb). To determine the localization and orientation of TFF3, BAC 921F4 was probed by PCR with vector primers and primers corresponding to the 3′-as well as the 5′-end of hITF cDNA. A 1.7-kb PCR product, specifically amplified with the vector primer (SP6L) and the 3′-end-specific primers (H2 and H22, table 1), verified that TFF3 is orientated in the same direction as SML1 and BCEI (head to tail), and is localized 30–35 kb downstream from SML1, as judged by PFGE using rare cutters (data not shown). To confirm these results, the 20-kb SalI restriction fragment of BAC 921F4 was isolated, further digested with BamHI, EcoRI, and HindIII, and hybridized with cDNA probes of hSP and hITF, respectively (not shown). The intergenic region between SML1 and TFF3 was estimated to be 32 kb, as calculated from the sizes of the intergenic HindIII fragments (fig. 1).

Exon-Intron Mapping of SML1 and Translational Initiation

The genomic organization of SML1 was of special interest, since this gene codes for a protein with two P-domains, in contrast to the genes BCEI and TFF3 that code for only one P-domain. Using the cDNA-derived primer pairs (SLA-SLAR, SLMF-SLM, and SML1-SL2), we amplified introns of 900, 2,300 and 850 bp, respectively (fig. 2, 3). The exon-intron junctions were determined by sequence analysis of the corresponding PCR products (EMBL accession No. X97790, X97791, X97792, X97793) revealing that the two trefoil motifs of SML1 are encoded by exons 2 and 3 (fig. 3). Thus, each P-domain is encoded by a single exon. An additional cytosin base at position 56 was found in contrast to the previously published sequence of hSP [2]. As a consequence, the corrected human sequence exhibits high amino acid homology to the mouse signal peptide (22 identical residues out of 30, see fig. 3). Thus, both signal peptides comprise basic amino acid residues followed by a cluster of hydrophobic residues.

Fig. 3
figure 3

Sequence of the SML1 gene and partial comparison with other trefoil peptide genes. The 5′-flanking region of SML1 is displayed up to position −887 with respect to the translational start codon (+1). Introns sequences are indicated by small letters. Translated codons are denoted by single code amino acid residues. The resulting signal sequence of hSP is presented in italics and aligned with the homologous sequence of mSP. Possible regulatory targets are either underlined or printed in bold letters. Canonical target sequences of Myc, Pea3 (Ets-like transcription factor) and the ERE are indicated in bold and italics. rITF denotes the corresponding gene sequences of rITF.

Moreover, the translational initiation sequences of all trefoil peptides are homologous. In positions −2 and + 4, a common G is obvious, although the sites exhibit marked differences to the Kozak consensus sequence site (ACCATGG; fig. 3). Nevertheless, this consensus site is present three codons downstream of the presumed translational initiation site of BCEI and therefore might contribute to an enhanced translation of the pS2 protein.

Promoter Sequence of SML1 and Comparison of Gene Regulatory Sequences

Since the 5′-flanking regions of BCEI and TFF3 were already published, only the corresponding region of SML1 was analyzed. Using primers T1 and T11, the nucleotide sequence of this region was determined up to position − 887 by sequencing of BAC 921F2 DNA using a primer walking strategy (EMBL accession No. X97790). At this position, the downstream primer designated SPLR2 (table 1) was designed to amplify the promoter region. The nucleotide sequence was confirmed by sequencing the PCR product. A TATAA box is present 67 bases upstream of the ATG codon, a feature shared with the BCEI but not with the TFF3 promoter. We found potential binding sites for several transcription factors (Myc; PEA3, a Ets-like factor; fig. 3) whose involvement in regulation remains to be determined. A search for homologous motifs shared by the promoters of the trefoil genes revealed some interesting features which underline the concept of coordinated gene expression. Although there is no overall homology among the promoters, some common signals with almost identical spacing with respect to the TATAA box were found (fig. 3). Motifs II and III are shared only by SML1 and BCEI. Motif II, a pyrimidin-rich region TGAGA/CTG/CCTTCCCTTCC) is located adjacent to the estrogen-responsive element (ERE) of the pS2 gene. Remarkably, SML1 exhibited no ERE. Another consensus sequence, AAAG/TGTTATCT is located at almost identical positions (−106, −107) just upstream to the TATAA box. A consensus sequence (CAAACA, motif IV) located 12–16 bases upstream of the TATAA box is found in all human trefoil genes as well as the rat ITF gene. This motif is extended to CAACAGAG in SLM1 and BCEI. Finally, a consensus sequence (motif I; TTATTAAAA; fig. 3) located at position No. −628 is found in all trefoil gene promoters at similar positions. This consensus sequence partially overlaps with a 14-bp homeodomain-like sequence found also at a comparable position in the 5′-flanking region of the rat intestinal fatty acid binding protein [22] with partial homology to the Pit-1 homeodomain [23].

Expression Pattern in Gastric Tissue

To study the expression of trefoil peptides on the level of transcription, we used RT-PCR as a sensitive tool (fig. 4). In two paraffin-embedded samples from normal superficial mucosa we detected pS2 (BCEI) mRNA, but not hITF (TFF3) or hSP (SML1) mRNA. In four paraffin samples from hyperplastic polyps of stomach mucosa pS2 and, surprisingly, hITF was present. Frozen material from two different gastric carcinomas of the diffuse type exhibited expression of all three trefoil peptides, also noted in two gastric tumor cell lines, GP220 and GP202.

Fig. 4
figure 4

Example of trefoil gene expression by RT-PCR analyzed on a 5% PAA-TBE stained with ethidium bromide using primers specific for TFF3 (hITF; lanes 2–5), SML1 (hSP; lanes 6–9), and BCEI (pS2; lanes 10–13), respectively. RNA samples were from normal gastric mucosa (lanes 2, 6, 10), hyperplastic polyp (lanes 3, 7, 11), gastric tumour (lanes 4, 8, 12) and cell line GP220 (lanes 5, 9, 13). Lane 1 = molecular DNA marker (GibcoBRL No. 15615-024).

Discussion

The data presented provide some interesting factors that shed light mainly on the regulation of the genes coding for all yet known human trefoil peptides, and may eventually elucidate their physiological function. The order and distances of the corresponding genes are now mapped within a physically defined region on 21q22.3 (cen-D21S212-TFF3-(32 kb)-SML1-(12 kb)-BCEI-(40–65 kb)-D21S19-tel). The location of BCEI and D21S19 was previously assigned to adjacent NotI fragments 400 kb and 440 kb in size, respectively [24]. Since D21S212, the BCEI proximal marker located on the 400 kb NotI fragment, is not present on our genomic contig, it is likely to be located 270–400 kb adjacent to BCEI, towards the centromere. Recently, the clustering of genes encoding the trefoil peptide family was also reported by others [25], without presenting the gene order, distances and chromosomal fine mapping. This gene cluster in 21q22.3 is positioned within a CpG-rich region.

Analysis of the exon-intron boundaries of SML1 revealed that the two P domains are encoded by two different exons. The exon structure of the three trefoil peptides are very similarly organized. The first exon encodes the secretion signal sequence, the second exon (and third exon of SML1) encodes the P-domain or trefoil motif, and the third exon (fourth exon of SML1) encodes three to four residues of the carboxy terminus. This conserved structural organization may have evolved by gene duplication and exon shuffling.

The main data presented here support the idea of a coordinated regulation of gene expression of the trefoil peptides. Besides gene clustering, the transcriptional orientation of all three genes is identical. More interestingly, the genes’ 5′-flanking regions share several motifs with almost identical sequences and spacings. This is reminiscent of a situation found for β-globin genes, which are directed by a cis-located locus-controlling region. In this case, a developmental switch of gene expression is mediated by consensus sequences shared by the genes’ 5′-flanking regions that compete for binding to the distant locus-controlling region [26]. For the presently known promoters of genes coding for mammalian trefoil peptides, two motifs are identified which share similar sequences and spacings. They may present targets for unknown regulatory DNA-binding proteins. In fact, motif I overlaps with a 14-bp homeodomain-like sequence (ATTAAAATACATTT) present at a corresponding position in the 5′-flanking region of the rat intestinal fatty acid binding protein [22] with partial homology to the Pit-1 homeodomain [23].

Transcriptional activity of the trefoil genes is known to be tissue specific. In the normal gastrointestinal mucosa, BCEI and SML1 are expressed in the superficial and glandular area of the stomach, respectively, whereas TFF3 is expressed in goblet cells of the intestinum. Aligning of the genes’ 5′-flanking regions reveals two motifs exclusively shared by the stomach-specific genes BCEI and SML1. They provide a first hint for testing stomach-specific gene regulation by reporter gene technology. The transcriptional regulation of the pS2 gene has been studied in greater detail. A variety of factors are known to be involved in the transcriptional regulation [27], but up to now, studies have been focussed on the estrogen-responsive MCF7 breast cancer cells. In this respect, we find it interesting to assess the influence of the pyrimidin-rich motif II (TGAGA/CTG/CCTTCCCTTCC), localized close to the ERE, for both SML1 and BCEI transcription in gastrointestinal tumor cell lines.

Finally, the expression of trefoil peptides is associated with changes of the physiological status of the cell, and is upregulated in pathological conditions of the gastrointestinal tract, like damage of the mucosa, ulcerative lesions and cancer [12, 28]. In several cancers of the gastrointestinal tract, we have previously identified coordinated regulation of pS2 and hSP at the mRNA and protein level [810]. Here we show that different pathological and morphological stages of the stomach mucosa, are associated with alterations in the pattern of trefoil peptide expression. In normal mucosa, only pS2 is found, whereas in hyperplastic polyps, a premalignant stage of gastric mucosa, pS2 and, surprisingly, hITF mRNA are present. All three trefoil peptides are expressed in tumors of this tissue as well as tumor cell lines. Additionally, a role for the intestinal trefoil factor has been recently proposed in the development of the mouse brain [29].

Although the physiological relevance of this switch of expression pattern is not known, our results suggest a coordinated gene regulation of trefoil peptides, which is also supported by the genomic structure and sequence comparison of the 5′-flanking regions of the corresponding genes. In this respect, probing promoters by transient transfection in different cell lines should elucidate regulation steps in inflammatory and preneoplastic processes, and contribute to understanding the yet unclear role of trefoil peptides.