Main

Targeted mutagenesis studies using null alleles, conditional deletions and point mutations of ligand and receptor genes have shown that PDGF signaling has essential roles in diverse developmental contexts1,2, but crucial downstream effectors have not yet been identified in PDGF-responsive cells and tissues. Conventionally, transcriptional target genes are identified by stimulating quiescent cells in culture and then monitoring expression changes using differential display3, serial analysis of gene expression4 or microarrays5. Some PDGF-responsive genes identified in these screens are involved in specific regulated cellular responses, including proliferation, adhesion, chemotaxis and survival6,7,8. One limitation of these RNA-based methods is that physiological validation of the identified targets requires independent genetic approaches, which are typically laborious and time-consuming.

To systemically explore PDGF targets and their in vivo functions, we carried out gene-trap mutagenesis, in which a promoterless reporter is introduced into ES cells to randomly tag and mutate genes, integrating expression monitoring, molecular cloning and functional analysis9,10,11. Transcriptionally responsive genes can be identified by induction trapping based on the altered expression of the gene-trap reporter on stimulation12,13,14,15. In our initial attempts, however, we did not identify any PDGF-regulated genes, probably owing to the low abundance of PDGF receptors in ES cells. In addition, because these cells are not normal targets of growth factor signaling in the developing embryo, responsive genes identified in this manner may have little in vivo relevance.

Therefore, whereas subsequent functional dissection necessitates gene trapping to be done in ES cells, other physiologically relevant responsive cells are required to identify PDGF transcriptional targets. We circumvented this limitation by coupling gene trapping with microarray technology. Using a retroviral vector optimized for efficient cloning, we amplified trapped transcripts from individual ES cell clones to construct a cDNA microarray. The availability of this gene-trap array allowed us to identify PDGF targets in physiologically relevant cells and to generate mutant mice directly from the corresponding gene-trapped ES cell clones. The same platform can be used to simultaneously identify and validate differentially expressed genes in a broad array of experimental settings, providing higher specificity and efficiency than conventional gene-trap mutagenesis screens.

Results

A microarray-compatible gene-trap vector

We previously developed a Mo-MuLV-based retroviral gene-trap vector, ROSAβgeo, which transduces a lacZ-neo fusion gene downstream of an adenoviral splice acceptor10. On integration into an intron, the promoterless reporter functions as an artificial 3′ terminal exon to intercept and terminate transcription from the endogenous promoter. ROSAβgeo and ROSAβgeo*, which contains a more sensitive reporter14, are used widely and are highly mutagenic, with most gene-trap insertions analyzed occurring at the 5′ end of genes. This may be attributable to preference of retrovirally mediated insertions in transcriptional start regions16, strength of the adenoviral splice acceptor and instability of longer fusion transcripts or proteins. One drawback of the insertion site preference is that cloning of the trapped transcripts with 5′ RACE17 can be technically challenging due to the higher G+C content in start regions18, and cDNA fragments amplified are often too short to be suitable for expression studies.

To increase cloning efficiency and obtain longer cDNAs for array fabrication, we constructed a new retroviral gene-trap vector, called reverse orientation splice acceptor for array (ROSAFARY), by incorporating a poly-A trap cassette into ROSAβgeo* (Fig. 1). The poly-A trap cassette comprises a PGK promoter–driven hygromycin gene and an adenoviral splice donor and can thus function as an artificial 5′ terminal exon to initiate transcription from the insertion site. Gene-trap events are obtained with the promoter trap module by G418 selection alone, and we used the poly-A trap module only to efficiently amplify longer, 3′-enriched cDNAs of trapped genes by 3′ RACE.

Figure 1: The ROSAFARY gene-trap vector.
figure 1

The gene-trap vector contains a promoter trap module (SAβgeo*pA) and a poly-A trap module (PGKhygSD) in the same orientation in the retroviral backbone between two long terminal repeats (LTRs). The poly-A trap module is flanked with FRT sites, which can be removed with Flp recombinase. The viral promoter and enhancer are both deleted in the U3 region of the 3′ long terminal repeat (inverted triangle) to avoid potential interference with gene expression in the resulting provirus. The splice acceptor (SA) and the splice donor (SD) are derived from the intron 1–exon 2 and the exon 1–intron 1 boundaries of the adenoviral type 2 major late transcription unit, respectively. The intron-exon junction of each splice site is indicated with an arrow. After inserting into an intron of an endogenous gene at a permissive site and in the correct orientation, the promoter trap module and the poly-A trap module can be activated to form fusion transcripts with the 5′ or 3′ exons, respectively. Only the exon portion of each splice site is included in fusion transcripts. Gene-trapped clones were obtained with the promoter trap module by G418 selection, and trapped genes were cloned with the poly-A trap module by 3′ RACE for higher cloning efficiency and longer sequences. Boxed arrows indicate the transcription start and direction.

Cloning and analysis of trapped transcripts

We used a high-throughput 3′ RACE procedure to isolate trapped transcripts from 2,880 individual gene-trapped ES cell clones, of which 2,160 (75%) yielded high quality products and were reamplified in preparative PCR reactions for array printing. To assess the spectrum of trapped genes, we analyzed RACE products from 607 randomly selected clones (Supplementary Table 1 online). Homology searching showed that 440 gene-trap sequence tags (72%) matched known mouse genes, 60 (10%) matched mouse expressed-sequence tags (ESTs) and 107 (18%) did not match any known mouse cDNA. The trapped genes were distributed on all chromosomes except the Y, suggesting that a large number of genes across the genome could be accessed by this vector. Some loci seem to be more readily trapped than others, as also found by other large-scale gene trapping efforts using different vectors19,20,21,22, and 40 genes were trapped more than once. This phenomenon may be due to recombination hot spots, more permissive integration sites available in larger genes or a limited pool of genes that could be trapped in the genome. Most trapped sequences that matched known genes aligned with the sense strand (Fig. 2a), but 14 (3%) aligned with the reverse strand of the corresponding transcript, which might indicate trapping of new, naturally occurring antisense genes (Fig. 2b).

Figure 2: Genome mapping of selected gene-trap sequence tags.
figure 2

(a) A trapped sequence aligned with the sense strand of a known gene (AA960558). The BLAST matches have a splicing pattern that correlates with annotated exons of the transcript. The slight differences in exon boundaries result from homologies in flanking sequences. An alternative exon (marked with an asterisk) is included in the trapped transcript, which corresponds to an EST transcript (ENSMUSESTT00000023065) of the same cluster. (b) A trapped sequence aligned with the antisense strand of a known gene (Txnip). The splicing pattern indicated by BLAST matches differs considerably from that of the sense transcript, suggesting distinct exon boundaries and thus trapping in an antisense gene. A neighboring EST transcript (ENSMUSESTT00000007795) is also antisense to Txnip, which may belong to the same gene that encodes the trapped transcript. (c) A trapped new sequence aligned with an annotated intron of a known gene (NM_145823), which might have originated from alternative or cryptic splicing. (d) A trapped new sequence aligned with new exons predicted by both Ensembl and Genescan. This may correspond to an insertion in the gene encoding the EST transcript ENSMUSESTT00000030315, which has identical 3′ splicing pattern to the computer predictions. (e) A trapped new sequence not aligned with any known or predicted exons but showing a splicing pattern. The precise positions of the gene-trap splice donors within introns are unknown.

Of the 107 new transcripts identified, 91 mapped to the available sequences on the mouse genome. Several sequences matched annotated introns of known genes (Fig. 2c), probably representing previously undiscovered splice variants, although the possibility of cryptic splicing cannot be excluded. Other sequences overlapped with predicted exons (Fig. 2d). Most of the new transcripts aligned with regions without any known or predicted genes in the vicinity, and some had splicing patterns (Fig. 2e). These observations, and the microarray experiments (described below) showing expression of most of these transcripts, suggested that the transcripts probably represent bona fide new genes. Therefore, in addition to including many known genes or ESTs, the gene-trap array may identify new genes that are difficult to capture by conventional cDNA cloning.

We assessed the putative gene-trap insertion site in the endogenous transcript in 392 gene traps that occurred in known genes with full-length sequence available (Fig. 3). The trapping frequency gradually declined from 5′ to 3', with most insertions (328, 83%) occurring at the 5′ untranslated region (UTR; 123, 31%) and the first half of the coding sequence (205, 52%). The actual insertion site may lie further upstream, owing to 5′ sequence quality trimming, under-representation of 5′ sequences in current databases and potential alternative splicing that could skip the immediate next exon. Therefore, most ROSAFARY insertions should result in little or no endogenous protein attached to the gene-trap reporter and probably constitute null alleles.

Figure 3: Insertion site preference.
figure 3

The gene-trap insertion site was determined by aligning each trapped sequence with the corresponding endogenous transcript, and the percentages of insertions in the 5′ UTR, each portion of the coding sequence (CDS) and the 3′ UTR were calculated and plotted. Most insertions occurred in the 5′ UTR and the beginning of coding sequence.

Evaluation of the 2K gene-trap array set

To evaluate the performance of the 2K gene-trap array set, we generated expression profiles of adult mouse brain and testis by paired hybridizations with fluorescently labeled cDNA targets (Fig. 4a). After filtering out low intensity and low quality data, a total of 1,705 (79%) genes remained. Pairwise comparison showed that 254 genes (15%) were differentially expressed by a factor of 2 or more, with 80 (5%) and 174 (10%) more highly expressed in testis or brain, respectively (Fig. 4b). Sequence analysis of selected differentially expressed genes identified several genes known to be abundant in one tissue or the other, as expected (data not shown). Among the 607 annotated genes, the proportion and intensity distribution of expressed known genes (327 of 440, 74%), ESTs (44 of 60, 73%) and new transcripts (73 of 107, 68%) were similar (Fig. 4b), experimentally supporting the idea that most new sequences were transcribed from genes not yet identified.

Figure 4: Expression profiling of adult brain and testis tissues with the 2K gene-trap array set.
figure 4

Cy3- (green) or Cy5-labeled (red) cDNA targets were prepared from total RNA isolated from the two tissues and hybridized in pairs onto arrays. (a) Global and enlarged views of array images of hybridizations with dye reversal as indicated. Each block contains duplicated spots of RACE products from the same clone arrayed evenly on both sides of the dotted line. Red or green spots indicated that the corresponding genes are more abundantly expressed in one tissue or the other, whereas yellow spots indicated that the expression levels are similar in both tissues. Signal intensities between duplicated spots and hybridizations were very similar. (b) Scatter plot analysis. For each gene, average signal intensities were calculated from duplicated spots of both hybridizations. Spots corresponding to sequence-annotated traps in known genes, ESTs and new transcripts are highlighted in different colors as indicated. The intensity distribution between categories was similar. M, log2 (R/G); A, (log2R + log2G)/2.

Gene-trap array screening for transcriptional targets

After confirming the functionality of ROSAFARY and the gene-trap array, we sought to use the platform to search for trapped PDGF targets. Although a number of established cell lines can be used for this purpose, we chose to use low-passage primary mouse embryonic fibroblasts (MEFs) to identify physiologically relevant targets. We first profiled transcriptional responses at multiple time points in wild-type MEFs by stimulating them with PDGF-BB, which interacts with all three forms of PDGF receptor dimers (αα, αβ and ββ). The pairwise comparison of control over reference ('same versus same') hybridizations showed a low level of technical variation. Accordingly, we first set the arbitrary cut-off to 1.5 times greater than baseline (>8 s.d.). Fifty-two genes were up- or downregulated within the 4-h observation window and clustered into different groups (Fig. 5a). Most genes showed a gradual change in expression kinetics, providing internal validation for differential expression observed at individual time points.

Figure 5: Hierarchical clustering of PDGF transcriptional target genes.
figure 5

(a) PDGF-BB transcriptional responses in wild-type (WT) MEFs. Only genes present at 75% or more data points and showing a change of expression by a factor 1.5 or more at one or more data points were selected. Genes were grouped based on the similarity of their expression kinetics in the time course. Red, induced; green, repressed; black, no change; gray, not expressed or low-quality data. Induction or repression was reproducible at neighboring time points (h). (b) PDGF-BB transcriptional responses in wild-type (WT), PDGFRα-deficient (Pdgfra−/−) and PDGFRβ-deficient (Pdgfrb−/−) MEFs. Genes were selected and clustered according to the same criteria as above. The expression kinetics and differences in levels of induction or repression between cells of different genotypes were reproducible. (c) Cluster analysis of PDGF-regulated genes using combined data and higher stringency. Only genes present at 75% or more data points and showing a change of expression by a factor 1.5 or more at two or more data points were selected. The highest relative change and sequence identity for each gene are indicated.

Next, we examined responses of mutant MEFs derived from PDGFRα- or PDGFRβ- deficient embryos to PDGF-BB after 1 h and 4 h. Because they have only homodimers of one receptor, profiling PDGF-BB responses in these mutant cells not only independently validates the general targets found in wild-type cells but also may identify genes preferentially activated through each receptor homodimer. To control for inherent variances in steady-state expression levels of target genes, we carried out paired hybridizations using cDNA targets prepared from stimulated and unstimulated cells of the same genotype. Hierarchical clustering of induced and repressed genes in MEFs of all three genotypes showed markedly similar expression patterns (Fig. 5b). Although most genes showed the greatest change in expression in wild-type cells, suggesting an additive effect, several were more robustly induced or repressed in one or the other mutant. This phenomenon could be due to different baseline expression levels, interchip variance or noise associated with the low cut-off. PDGF responses in general seemed much weaker in PDGFRβ-deficient than in PDGFRα-deficient cells, prompting further investigation described below.

Identification and verification of target genes

Twenty-nine PDGF transcriptional targets were identified as being induced or repressed by a factor of more than 1.5 at two or more data points in the merged data set (Fig. 5c and Supplementary Table 2 online). These include 25 unique genes of various functional categories, including transcriptional factors, metabolic enzymes, membrane proteins, cytoplasmic signaling molecules and cytoskeletal proteins. Some of these genes are regulated by PDGF or other growth factor signaling pathways, are involved in PDGF-regulated cellular responses or developmental processes, or are implicated in tumorigenesis commonly associated with PDGF overactivity (Table 1). Each pair of the four genes obtained from recurring gene-trap events (Cdh11, Bteb1, Lmna and AA960558) was clustered next to each other pair (Fig. 5c), showing high reproducibility and stringency. The induction or repression of most target genes was resistant to treatment with cycloheximide, indicating that they do not require additional protein synthesis and are probably involved in the primary PDGF transcriptional response (data not shown).

Table 1 PDGF transcriptional target genes in MEFs identified in gene-trap array screens

We carried out northern-blot analysis using an independent batch of wild-type MEFs, which verified that 23 of the 25 genes were induced or repressed by PDGF-BB with similar kinetics to those observed from the array analysis (Fig. 6). Expression of the other two genes was below the level of detection (Table 1). Because PDGF-BB binds to PDGFRβ with higher affinity than it does to PDGFRα, which might result in lower responses in PDGFRβ-deficient MEFs, we treated wild-type MEFs in parallel with PDGF-AA, which only recognizes the αα receptor homodimer. Most genes examined were responsive to this ligand, but the extent of induction or repression was generally much less than with PDGF-BB (Fig. 6). This was in agreement with the array results and preliminary observations that PDGF-BB stimulation of wild-type MEFs resulted in a higher level of MAPK activation than PDGF-AA treatment (data not shown). Therefore, our results indicated that most PDGF target genes are preferentially regulated by PDGFRβ in MEFs, consistent with evidence from kinase domain swapping studies that PDGFRβ, but not PDGFRα, can functionally substitute for the other receptor in vivo23. Alternatively, although these MEFs express both receptors and respond to both ligands, the expression levels of each receptor or their downstream signaling effectors may vary, which could contribute to the observed differences in transcriptional responses.

Figure 6: Northern-blot hybridizations of selected PDGF target genes identified in the gene-trap array screens.
figure 6

Independently isolated wild-type MEFs were grown in low serum medium for 48 h and induced with PDGF-BB or PDGF-AA for indicated times (h).Most genes were responsive to both growth factors, but the magnitude of induction or repression was always higher with PDGF-BB than with PDGF-AA, consistent with the array data.

Mutant generation and analysis

The most important advantage of the gene-trap array over conventional microarrays is that mice with mutations in target genes can be directly generated without de novo gene targeting. In a secondary, unbiased phenotype-driven screen, we started systematic blastocyst injections of the corresponding clones. So far, we have obtained high-percentage chimeras from most clones, and 10 of 12 test-bred lines have resulted in germline transmission. One line, ROSA71, was established in early phase of the study and has been characterized in some detail. The trapped gene encodes serine-threonine kinase receptor associated protein (STRAP), which inhibits TGFβ signaling by recruiting SMAD7 to TGFβRI24. We identified Strap as a PDGF-BB-inducible gene in a pilot gene-trap array screen using MEFs serum-deprived for 24 h rather than 48 h (Fig. 7). Notably, transcripts of inhibitory SMADs, SMAD6 and SMAD7, are both upregulated by epidermal growth factor25, suggesting a possible mechanism by which engagement of a receptor tyrosine kinase antagonizes TGFβ signaling. The gene-trap insertion results in a null allele and recessive embryonic lethality between embryonic day (E) 10.5 and E12.5. Homozygous mutant embryos had defects in angiogenesis, cardiogenesis, somitogenesis, neural tube closure and embryonic turning. The phenotypes observed in the original line were indistinguishable from those found in a subline in which the poly-A trap module was deleted by breeding to ROSA26Flper mice26. We are further investigating the molecular mechanisms underlying mutant phenotypes and their relevance to TGFβ and PDGF signaling pathways in vivo.

Figure 7: Characterization of the ROSA71 gene-trap mutation.
figure 7

(a) Schematic representation of the Strap locus disrupted in ROSA71. Arrow indicates the gene-trap insertion site. (b) Partial genomic DNA sequence of Strap encompassing intron 1–intron 3. Exon 2 and exon 3 are shaded, and the initiator ATG codon is highlighted in bold. Partial 3′ RACE and 5′ genomic anchoring PCR sequences immediately flanking the gene-trap vector sequences are singly and doubly underlined, respectively. (c) MEFs of indicated genotypes were grown in low-serum medium for 24 h and induced with 30 ng ml−1 of PDGF-BB for 1 h and 4 h before northern-blot analysis. Strap was induced 4 h after treatment in both wild-type (WT) and PDGFRα-deficient (Pdgfra−/−) cells. (d) RT-PCR analysis of Strap and Actb expression in MEFs isolated from wild-type (+/+), heterozygous (+/−) and homozygous (−/−) ROSA71 embryos. Full-length Strap transcript was not detected in homozygotes. (eh) Gross morphology of wild-type (left) and ROSA71-Strap−/− (right) embryos at E9.5 (e,g) and E10.5 (f,h). Mutants had underdeveloped yolk sac vasculature, arrested neural tube closure and embryonic turning, as well as abnormal hearts and somites.

Discussion

Here we show that PDGF transcriptional targets can be efficiently and reliably identified in physiologically relevant cells with the gene-trap array, and that mutant mice can be directly generated from the corresponding gene-trap clones to determine their in vivo functions. Subsequent genetic analysis may help assess the relevance of the identified targets to the phenotypes observed in PDGF mutants. Although we are just starting to analyze functionally the uncovered PDGF targets, the available mutants of a subset of these genes have already shown that this is a promising approach. For example, disrupting Klf2 in the mouse results in blood vessel destabilization and reduction in the number of vascular smooth muscle cells27, a crucial target of signaling through PDGFRβ. Mutating Arid5b leads to defects in testis development, including small undescended testis, reduced interstitium and arrested spermatogenesis28, indicative of testosterone deficiency and Leydig cell loss as observed in Pdgfa and Pdgfra mutants1. More focused searches could identify target genes specifically expressed in PDGF-regulated cell populations, and genetic epistasis analysis could be done by crossing the gene-trap lines with PDGFR mutants to place target genes in the genetic networks regulated by one or the other receptor. Individual targets may also be subject to regulation by other signaling pathways, however, and a null mutation could lead to early embryonic lethality that could complicate the assessment of their contributions to PDGF functions. This limitation, inherent to any general transcription profiling and gene ablation approach, may be circumvented in the future by constructing gene-trap arrays using vectors that allow conditional mutagenesis.

One technical limitation of gene-trap mutagenesis is that mutations generated in ES cells must be transmitted through the germ line before functional analysis. Because large-scale phenotype-driven gene-trap screens are prohibitively laborious and costly, preselection of gene-trap events of interest is highly desirable. A number of expression-driven screening strategies have been designed to search for genes that are restricted to specific lineages, are responsive to external stimuli or encode secreted proteins29. But all these screens are confined to ES cells or their differentiated derivatives and rely on reporter expression. Sequence-driven screens based on the identity of trapped genes19,20,21,22,30 are intrinsically biased towards highly annotated genes, and limited functional cues can be deduced from the trapped sequences representing ESTs or new genes. With the gene-trap array, virtually any cell type can now be screened for differentially expressed genes. Moreover, the versatility of microarrays allows for diverse experimental designs to find genes of specific biological relevance, including a substantial proportion of new genes currently unavailable elsewhere. Although standard microarrays and available gene-trap lines can be used in combination for the same purpose, reciprocal BLAST analysis comparing the most current Affymetrix mouse array with public gene-trap resources found limited overlap, and many matches through the short gene-trap sequence tags were unreliable (Supplementary Fig. 1 online). As each spot on our array is directly correlated to a physical clone in which the corresponding gene is disrupted, the gene-trap array possesses the combined power of expression profiling and gene mutagenesis, providing a fast track from gene discovery to functional analysis.

Methods

Gene-trap retrovirus.

We assembled the poly-A trap cassette PGKhygSD by three-way ligation of a BamHI-NotI fragment containing the plasmid backbone, the PGK promoter and a KOZAK-optimized initiator ATG derived from PGKβgeo; a BglII-XbaI fragment containing the hygromycin coding sequence amplified from pCEP4 (Invitrogen); and an XbaI-NotI fragment containing the splice donor sequence amplified from pIVTQ (a gift from S. Berget, Baylor College of Medicine, Houston, Texas). The splice donor sequence corresponds to nucleotides 6,025–6,195 of the adenovirus type 2 genome (accession number J01917), which includes the entire leader exon 1 and part of intron 1 of the major late transcription unit. To prevent potential activation of trapped genes due to translational readthrough or frameshifting, we introduced amber codons in each reading frame in front of the splice donor. After adding flanking FRT sites, we inserted the poly-A trap cassette into pSAβgeo* using a SalI linker to place it at the same orientation as the promoter trap cassette. We constructed the retroviral vector pGep like pGen (ref. 32), except that the 3′ long terminal repeat did not include the gene supF and we deleted a fragment (NheI-Ecl136II) that removes the viral enhancer and promoter as well as a cryptic splice site31. We then excised the fragment containing both promoter trap and poly-A trap cassettes and inserted it into the unique XhoI site of pGep. We linearized the resulting retroviral gene-trap vector, ROSAFARY, with DraI and electroporated it into GP+E86 cells for packaging32. The titer of the gene-trap retrovirus was about 104 colony forming units per ml as assayed on NIH 3T3 cells for G418-resistant colonies.

ES cell culture.

We maintained AK7.1 ES cells derived from 129S4 in Dulbecco's modified Eagle medium (DMEM) with 15% fetal bovine serum, 50 U ml−1 penicillin, 50 μg ml−1 streptomycin and 0.1 mM β-mercaptoethanol on γ-irradiated SNL76/7 (G418R) feeder cell layers. To obtain gene-trapped clones, we infected log-phase ES cells with medium containing virus at low multiplicity and selected cells in medium containing 200 μg ml−1 G418 for 10–12 d (ref. 33). We picked resistant colonies individually for expansion in 96-well plates and selected quality clones for further culture. We kept frozen stocks in 96-format microtubes (CBL, 2600.mini) to facilitate the retrieval of individual clones.

We used G418 selection to obtain promoter trap events, and our screens are limited to genes expressed in ES cells. Due to the high sensitivity of the βgeo* reporter, however, many weakly expressed genes are also accessible to this vector, as indicated by the lack of X-Gal staining in 40% of trapped clones. We did not isolate poly-A trap events, as hygromycin expression is driven by an internal promoter and can occur regardless of the insertion site in a gene. Because shorter fusion transcripts tend to be more easily stabilized and most mammalian genes have a long 3′ UTR34, poly-A trapping may enrich for insertions at the 3′ end of genes. In addition, in initial tests of the ROSAFARY vector, we obtained roughly equal numbers of resistant clones with G418 and hygromycin selection, but fewer than 20% of clones were resistant to both. This suggests that functional proteins of one or the other selectable marker are usually either not produced or not stabilized, probably because most gene-trap insertions would interfere with the endogenous cis-acting elements required for efficient splicing35 and thus compromise the functionality of one or the other trap module. Therefore, poly-A trapping may not effectively disrupt the trapped genes but instead create hypomorphic or neutral mutations, as have been observed in as many as 30% of mutant lines obtained with a retroviral poly-A trap36.

MEF isolation and culture.

We obtained wild-type, Pdgfra−/− and Pdgfrb−/− embryos from intercrosses of Pdgfra+/− or Pdgfrb+/− mice in a hybrid 129S4 × C57BL/6J genetic background and genotyped them as previously described37,38. We isolated primary MEFs from individual E12.5 embryos and maintained them in DMEM with 10% fetal calf serum, 50 U ml−1 penicillin and 50 μg ml−1 streptomycin. For PDGF induction, we pooled 6–8 individual MEF cultures of the same genotype at passage 3 and expanded them to subconfluence on 15-cm plates. After growing them in low serum medium (0.5% fetal calf serum in DMEM) for 48 h, we treated MEFs with 30 ng ml−1 PDGF-BB or PDGF-AA for desired time periods and collected them for RNA preparation. We used MEFs in untreated plates of the same batch to prepare control or reference RNA at time 0. Wild-type MEFs isolated in this manner express both PDGF receptors at high levels, respond to both PDGF-AA and PDGF-BB and show the expected kinetics in immediate early gene induction after PDGF stimulation, suggesting that they are an appropriate model for studying PDGF signaling through each receptor.

3′ RACE.

We designed a simple and economical procedure to directly isolate poly(A)+ RNA from gene-trapped ES cell clones in a 96-well format. We lysed confluent cultures of ES cell clones in a 96-well plate in 100 μl of lysis buffer (4 M guanidine thiocyanate, 0.5% lauroylsarcosinate, 1% β-mercaptoethenol, 100 mM Tris-HCl, pH 7.2) per well. We then mixed the cell lysate in each well with 20 μg MPG streptavidin-biotinylated oligo(dT)25 complex (CPG) diluted in 200 μl of binding buffer (400 mM LiCl, 20 mM EDTA, 100 mM Tris-HCl, pH 7.2) and transferred it into the corresponding well in a Multiscreen-HV filter plate (Millipore) fitted into a vacuum manifold. After incubating at room temperature for 10 min, we filtered the mixtures and washed them twice in 200 μl of wash buffer I (150 mM LiCl, 1 mM EDTA, 0.1% lauroylsarcosinate, 10 mM Tris-HCl, pH 8.0) and 200 μl of wash buffer II (150 mM LiCl, 1 mM EDTA, 10 mM Tris-HCl, pH 8.0). We then eluted the bound poly(A)+ RNA in each well in 40 μl of elution buffer (0.1 mM EDTA, 1 mM Tris-HCl, pH 8.0) after incubating at 55 °C for 15 min and filtered it into a 96-well plate.

We carried out reverse transcription and 3′ RACE PCR reactions essentially as described33. We primed 10 μl of denatured poly(A)+ RNA from each clone with anchoring oligo QT in a 20-μl reverse transcription reaction containing 12.5 U of RNaseOUT RNase inhibitor and 50 U of Superscript II Reverse Transcriptase (Invitrogen). We diluted the resulting cDNA pools to 1:5 and 1:25 in two sets of 96-well plates. We then carried out paired primary 3′ RACE PCR reactions in a volume of 20 μl using a high-fidelity PCR system with mixed KlenTaq1 (Ab Peptides) and pfu (Stratagene) polymerases and diluted the products to 1:100 in 96-format microtubes for diagnostic nested PCR. We used the hygromycin-specific primer HYGF and the SD exon-specific primer SDEXF in the primary and nested PCR reactions, respectively, paired with anchoring primers QA and QB. Both PCR reactions were done on GeneAmp 9700 thermocyclers using the same cycling parameters as described33. Primer sequences are available on request.

We separated diagnostic nested PCR products in 1% agarose gel, blotted them and hybridized them with an SD intron-specific oligo to detect products resulting from aberrant splicing events or genomic DNA contamination. We compared products from paired nested PCR reactions to select for the diluted template that led to more robust amplification and less complex banding than the other, and we abandoned clones that did not amplify in either nested PCR reactions or yielded only SD intron-positive products. We reassembled the selected microtubes containing the diluted primary PCR products into 96-well format for preparative nested PCR in a volume of 100 μl using the same primers and cycling conditions, with 10-μl templates in standard PCR buffers and mixes of Taq (Invitrogen) and pfu polymerases.

We observed robust amplification in >90% of the selected clones. 3′ RACE products mostly ranged from 0.5 kb to 2.0 kb in size, comparable to the average size of amplicons used to make conventional cDNA arrays. Most clones yielded only one prominent product, but 20% of clones yielded multiple bands (usually two to three). Sequencing of the differently sized products from representative clones indicated that these mostly resulted from alternative splicing or nonstringent priming during reverse transcription reactions, to a lesser extent from coamplification of unspliced products and rarely from clonal contamination. The adenoviral splice donor was correctly used in fusion transcripts, as indicated by the replacement of vector sequence by the trapped sequence precisely at the exon-intron junction.

Sequencing and bioinformatics.

RACE products were either gel-purified before sequencing or directly sequenced without purification in 96-well format with ABI Big-Dye terminator cycle sequencing in 10-μl reactions using 5 pmol SDEXF as primer. Sequence quality analysis, trimming and masking were done with Phred, Sequencher and RepeatMasker, respectively. We considered masked sequence tags with consecutive sequences shorter than 50 nucleotides to be uninformative and excluded them from further analysis (47 of 654 in the random set). We used quality tags to search against the NR and EST databases at National Center for Biotechnology Information with BLASTN to determine homology to known transcripts for gene identification. An expect value of e−20 or less was considered significant, and transcripts of best BLAST matches were taken to annotate the corresponding tags. We determined putative gene-trap insertion sites in known candidate genes by aligning the sequence tags to the best-matching transcripts with full-length sequence information available. Genome mapping of masked sequence tags was done using the Ensembl genome browser, with BLASTN and expect cut-off of e−10.

Array construction, expression profiling and data analysis.

We constructed the 2K gene-trap array set using standard cDNA microarray methodologies as described elsewhere39. We purified RACE products from the preparative nested PCR reactions with the Multiscreen-PCR filtration system (Millipore) and mechanically spotted them in duplicate onto polylysine-coated microscope slides using an OmniGrid high-precision robotic gridder (GeneMachines). We isolated total RNA from tissues or cultured cells with TriZol reagent (Invitrogen) according to the instructions provided by the manufacturer and assessed RNA quality with a 2100 Bioanalyzer (Agilent). We labeled and hybridized targets essentially as described40. We generated cDNA targets using a standard aminoallyl labeling protocol, in which 30 μg total RNA was coupled to either Cy3 or Cy5 fluorophores. Pairs of labeled cDNA targets were cohybridized to microarrays for 16 h at 63 °C and sequentially washed at room temperature in 1× sodium saline citrate (SSC) and 0.03% SDS for 2 min, 1× SSC for 2 min, 0.2× SSC with agitation for 20 min and 0.05× SSC with agitation for 10 min. We immediately centrifuged arrays until dry and scanned them using a GenePix 4000 scanner (Axon Instruments). We analyzed images using GenePix Pro 3.0.

We carried out hybridizations in duplicate with fluorophore reversal to compensate for dye bias. Signal intensities from duplicated spots and duplicated hybridizations were quantified, corrected over background and normalized. To account for sequence-dependent fluorophore incorporation biasing, we generated expression profiles twice for each sample in which both fluorophore-labeled orientations were represented. For each array, we filtered spot intensity signals and removed those values that did not exceed 3 s.d. above the background signal in at least one signal channel and those spots flagged as questionable by the GenePix Pro software. Spot-level ratios (Cy5/Cy3) were log2 transformed and a loess normalization (f = 0.67) strategy was applied using S-Plus (MathSoft) to correct for observed intra-array intensity-dependent ratio biasing. For each sample comparison, we analyzed data by first averaging the spot-level, normalized log ratios of the two reverse-complement arrays and then taking the average of the intra-array, gene-level duplicate features. At the spot-level averaging step, we analyzed only those features possessing both fluorophore-orientation data points. Under this analysis strategy, paired 'same versus same' comparisons using independently cultured and labeled samples resulted in the global array mean log2 (Cy5/Cy3) = 0.00 ± 0.07 (1 s.d.). We carried out average linkage clustering as described41 using the GeneCluster/TreeView software package. To exclude genes expressed at very low levels, we included only genes represented at >75% of all data points in the cluster analysis.

Northern-blot hybridization.

We separated 5 μg of total RNA in 1.2% denaturing agarose gel, transferred it onto Hybond N+ membranes and fixed it by ultraviolet crosslinking. We prepared radioactive probes from gel-purified RACE products and used a mouse 28S rDNA probe as a loading control. We carried out hybridization in modified Church and Gilbert buffer (7% SDS, 10 mM EDTA, 0.5 M NaH2PO4−Na2HPO4, pH 7.2) for 16 h and washed membranes in 1× SSC, 0.1% SDS for 2 h before exposing to films.

Mutant mouse generation.

We injected ES cells carrying the desired gene-trap mutation into C57BL/6J blastocysts to generate chimeras according to standard procedures. The poly-A trap module could be deleted by breeding with ROSA26Flper mice26. Mice were housed in microisolator racks in a facility accredited by the Association for the Assessment and Accreditation of Laboratory Animal Care and experimentation was reviewed by the Hutchinson Center Institutional Review Committee.

Molecular analysis of ROSA71 gene-trap mutation.

We carried out 5′ genomic anchoring PCR as described using gene trap–specific primers SASP and U5SP33. We genotyped ROSA71 adults and embryos with three-primer PCR using a forward gene-specific primer 5′ to the insertion site, a reverse gene-specific primer 3′ to the insertion site and a β-gal-specific reverse primer. Wild-type, heterozygous and homozygous MEFs were derived from E9.5 embryos and subjected to RT-PCR analysis using Strap primers corresponding to the 5′ and 3′ cDNA sequences flanking the insertion site. Actb primers were described elsewhere42. Primer sequences are available on request.

URLs.

Further information regarding ROSA-series gene-trap vectors can be found at http://www.fhcrc.org/labs/soriano/trap.html. Supplementary materials and the complete microarray data sets are available at http://parma.fhcrc.org/GTA. The gene-trap array set is now being fully annotated, and the gene-trap sequence tags can be found at Genome Survey Sequences Database at National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/dbGSS). The gene-trap arrays and corresponding ES cell clones are an additional resource of the International Gene Trap Consortium (http://www.igtc.ca).

Note: Supplementary information is available on the Nature Genetics website.