Introduction

Breast cancer is a complex disease involving genetic and lifestyle components. Our current understanding of breast cancer susceptibility is based upon a polygenic model in which several alleles conferring variable risk act in concert. Under this model, it is hypothesized that 88% of all breast cancer cases may be found in 50% of the population more at risk (Pharoah et al. 2002). Thus, finding the underlying genetic components conferring such risk may have a great impact on breast cancer prevention and treatment. However, our knowledge of breast cancer susceptibility remains incomplete. Mutations and/or allelic loss of tumor suppressor genes, such as BRCA1, BRCA2, TP53, as well as other genes, are associated with an increased predisposition to breast cancer tumorigenesis (Antoniou and Easton 2006). However, mutations in the two major predisposition genes BRCA1 and BRCA2 account for less than 25% of families with hereditary breast cancer (Antoniou and Easton 2006), indicating that many of the underlying genetic factors involved in breast cancer susceptibility remain to be uncovered.

GADD45A (Growth Arrest and DNA Damage-induced 45, Alpha) is a p53- and BRCA1-regulated gene acting in the control of the G2/M cell-cycle transition, apoptosis, and DNA repair (Wang et al. 1999; Harkin et al. 1999; Smith et al. 1996). Following genotoxic stress, the p53 protein binds directly to a consensus sequence located in GADD45A third intron and indirectly, through its interaction with transcription factors, to the promoter region, therefore activating GADD45A transcription (Zhan et al. 1998; Jin et al. 2001). In absence of DNA damage, BRCA1 represses GADD45A expression through interaction with the zinc finger protein ZNF350 (ZBRK1), this complex recognizing a DNA binding site also located in the GADD45A third intron (Zheng et al. 2000). Moreover, BRCA1 can activate GADD45A gene expression through interactions with transcription factors binding to the gene promoter (Fan et al. 2002).

Given that GADD45A is a target of two major tumor suppressor proteins involved in breast cancer, namely, BRCA1 and p53, and that specific haplotypes of the gene encoding BRCA1 corepressor ZNF350 have also been recently associated with a modulation of breast cancer risk in our cohort of non-BRCA1/2 high-risk breast cancer families (Desjardins et al. 2008), GADD45A, therefore, represents an attractive candidate gene with regard to breast cancer susceptibility. Although no alterations of GADD45A have been found in previous studies performed on a limited set of breast tumors, human cancer cell lines of various origins, and familial breast carcinomas (Blaszyk et al. 1996; Campomenosi and Hall 2000; Sensi et al. 2004), Gadd45a−/− cells and mice display genomic instability (Hollander et al. 1999, 2005). The absence of Gadd45a has been associated with an acceleration of Ras-driven mammary tumor formation in mice (Tront et al. 2006).

Therefore, we investigated the plausible implication of GADD45A in breast cancer susceptibility by analyzing 96 individuals affected with breast cancer from our cohort of high-risk non-BRCA1/2 breast and ovarian cancer families, along with 95 healthy controls from the same origin. The entire GADD45A coding region, intron–exon boundaries, and the p53- and BRCA1/ZNF350-binding consensus sequences were examined.

Materials and methods

Ascertainment of families and DNA extraction

All 96 non-BRCA1/2 individuals from French Canadian families with a high risk of breast and ovarian cancer, along with 95 healthy controls included in this study, provided written informed consent. The research project has also been reviewed by the ethics committee of each participating institution. Details regarding selection criteria for breast cancer cases, experimental and clinical procedures, and the INHERIT BRCAs research program have been described previously (Simard et al. 2007; Durocher et al. 2006; Desjardins et al. 2008). Control blood DNA was either used directly or subjected to whole-genome amplification using Illustra GenomiPhi V2 DNA amplification kit (GE Healthcare, formerly Amersham Biosciences, Piscataway, NJ, USA) according to the manufacturer’s instructions.

Polymerase chain reaction amplification, mutation analysis, and variant characterization

The GADD45A gene (NM_001924.2) consists of four exons covering more than 3 kb of genomic DNA. Direct sequencing of genomic sequence was performed using sequencing primers listed in Supplemental Table 1. Direct sequencing was performed using an ABI3730XL automated sequencer (Applied Biosystems, Foster City, CA, USA), according to manufacturer’s instructions. Staden preGap4 and Gap4 programs were used for sequence data analysis. Deviation from Hardy–Weinberg equilibrium (HWE) and allelic differences between both series were measured using a X 2 test with 1 degree of freedom. The effect of a given variant on splice-site consensus strength was evaluated with the Splice Site Prediction Program using Neural Networks (SSPNN) (Reese et al. 1997) with default parameters. The putative impact of the exonic variant on exonic splicing enhancers was also assessed using ESE Finder (Cartegni et al. 2003; Smith et al. 2006).

Table 1 GADD45A sequence variations and genotype distribution in familial breast cancer cases and controls

Linkage disequilibrium analysis, haplotype estimation, and tagging single nucleotide polymorphism selection

The Linkage Disequilibrium Analysis (LDA) program (Ding et al. 2003) was used to calculate pairwise LD defined by Lewontin’s |D′| and r 2 measures on cases and controls combined (Lewontin 1964; Devlin and Risch 1995). The PHASE 2.1.1 software (Stephens et al. 2001) was used with default parameters to estimate the total set of haplotypes present among the case and control data sets and to evaluate the global test of significance associated with a case-control comparison. Evaluation of a positive association of a specific haplotype displaying a frequency >5% with breast cancer was further analyzed using the WHAP program (Purcell et al. 2007), implementing a regression-based association test. The determination of haplotype blocks and tagging single nucleotide polymorphisms (tSNPs) for each LD block was performed with genotyping data from both sample sets combined using the Haploview software (Barrett et al. 2005).

Electronic links

UCSC Genome Bioinformatics: http://genome.ucsc.edu/; NCBI dbSNP: http://www.ncbi.nlm.nih.gov/SNP/; SSPNN: http://www.fruitfly.org/seq_tools/splice.html; ESE Finder: http://rulai.cshl.edu/cgi-bin/tools/ESE3/esefinder.cgi; PHASE: http://www.stat.washington.edu/stephens/software.html; WHAP: http://pngu.mgh.harvard.edu/purcell/whap/; Haploview: http://www.broad.mit.edu/mpg/haploview; HapMap: http://www.hapmap.org;

Results

Analysis of GADD45A sequence variations

Although no truncating mutation was found in the GADD45A coding region of our French Canadian breast cancer cases, we identified 12 variants in GADD45A exonic and flanking intronic sequences, including two novel sequence variations (c.147-103G/C and c.385-174C/T) not reported in the NCBI Single Nucleotide Polymorphism Database (dbSNP Build 128) (Table 1). Among these sequence variations, one is a coding silent variant (c.492A/G, p.E164E), whereas the nine remaining sequence changes are intronic nucleotide substitutions. No deviation from HWE was observed for any of the nucleotide changes identified, with the exception of one intronic variation (c.384 + 116T/C), displaying borderline significance (p = 0.045) due to an excess of rare homozygotes among cases (Table 1). When considering all nucleotide variations, seven are common variants with minor allele frequency (MAF) higher than 5%, whereas five are considered as rare sequence variations, as they display an MAF below 5%. Among the rare variants, two (c.492A/G and c.498 + 27C/T) are observed simultaneously within the same breast cancer and control individuals, suggesting that both individuals carry a specific allele. The allele frequency of variants observed in the breast cancer series was also genotyped in healthy French Canadian controls, as denoted in Table 1. No significant difference in MAF was observed. In addition, all variants identified also displayed similar frequencies to those reported in the dbSNP, including the CEPH-Utah residents with ancestry from Northern and Western Europe (CEU) population.

As consensus-binding sites located in the GADD45A third intron are important regulatory sequences, the relevant genomic regions were also analyzed by direct sequencing. No sequence alteration was found directly within the consensus p53-binding region located in the 5′ end of the intron, the closest variation being situated 20 nucleotides downstream (c.384 + 168T/C, Table 1). Likewise, amplification of the BRCA1/ZNF350-binding region near the beginning of exon 4 in the 3′ part of the intron did not lead to identification of any sequence alteration directly within the consensus sequence. However, a rare variant (c.385-205A/G) located two nucleotides apart was identified in only one control and one breast cancer case, but no genomic DNA from other affected family members was available for testing. Another rare nucleotide change, present within the same amplicon (c.385-174C/T), was also identified in only one breast cancer case and one control individual, whereas variant c.385-137T/C (rs3171012) situated 137 nucleotides upstream of the fourth exon of GADD45A was present at an MAF that was similar for both series and within the range of frequencies reported (Table 1).

In silico analysis of the effect of variants on splicing

The possible effect of all intronic variants on splicing consensus sequences was also assessed using in silico analysis with the SSPNN program. None of the genetic variations observed displayed a significant change in the splicing score (data not shown), including the intronic variant most likely to have a potential effect (c.45-23C/T) given its proximal location to a known exon boundary. The strongest effects on the splicing score change were observed for the c.384 + 93A/G (rs3783468) and the c.385-174C/T variants. The c.384 + 93A/G variant could affect a pseudo donor site located 86 nucleotides downstream of the exon 3 3′ boundary by decreasing the splicing score from 0.60 to 0.41, whereas c.385-174C/T could create a weak pseudo acceptor site located 189 nucleotides upstream of exon 4 (splicing score of 0.42). In addition, this variant (c.385-174C/T) could also strengthen another pseudo acceptor site situated 164 nucleotides before exon 4 by increasing the splicing score from 0.68 to 0.76, which would be as strong as the splice site currently used, although no evidence indicates that this pseudo donor site is utilized by the splicing machinery. The putative impact of the c.492A/G exonic variant was also assessed using the ESE Finder program, which examines sequences for exonic splicing enhancer motifs. Although the c.492A/G variant slightly modified the putative ESE motifs identified, all changes were very close to the default thresholds and thus would be expected to act on relatively weak motifs, which would most likely not be used by the splicing machinery (data not shown).

Evaluation of GADD45A intragenic linkage disequilibrium

LD calculations for each SNP pair were performed in both series combined using |D′| and r 2 measures and are represented in Fig. 1. Perfect LD (|D′| = 1) was observed between the two most distant intragenic variants (SNP1: c.45-23C/T and SNP12: C.498 + 27C/T), which indicates that LD at the GADD45A locus does not decline significantly with distance. This is not surprising given that the GADD45A gene covers less than 4 kb of genomic sequence. Indeed, the lowest pairwise LD value involves c.147-53G/T (SNP3) with c.384 + 116T/C (SNP5) (|D′| = 0.158). As expected, r 2 coefficients calculated for GADD45A genomic region displayed lower values, as this measure is sensitive to variation in allelic frequency, which is well represented by the large spectrum of r 2 values ranging from 0.0001 to 1.0. A high r 2 coefficient was observed between c.384 + 118C/T (SNP6) and c.384 + 168T/C (SNP7), which displays high and similar MAF, whereas the majority of lowest r 2 values were observed for the three SNPs displaying an MAF below 1% (c.147-103G/C, c.492A/G and c.498 + 27C/T). As described above, both c.492A/G and c.498 + 27C/T are rare and are always observed within the same individuals. Hence, this pair displays perfect LD (r 2 coefficient of 1.0).

Fig. 1
figure 1

Pairwise linkage disequilibrium (LD) measures of |D′| and r 2 for the 12 single nucleotide polymorphisms (SNPs) identified in the breast cancer and control series. SNP numbers are denoted according to Table 1

Haplotype analysis

Analysis of GADD45A haplotypes with the PHASE program, using the 12 sequence variations genotyped in both series, led to 18 estimated haplotypes (Table 2). Among these, five haplotypes (PH1, 4, 7, 9, and 15) displayed a frequency >5%, which represent 95.5% of all haplotypes estimated in both sample sets combined, whereas the remaining haplotypes displayed frequencies <1%. Estimated p value for case–control permutation did not indicate any significant difference in the global haplotype composition and frequency distribution between both groups (p = 0.94). Among the common haplotypes, only the haplotype PH9 showed a notable difference in frequency, with this haplotype being more represented among the controls. To further ascertain a possible association of this specific haplotype with breast cancer, a second haplotype estimation program, WHAP, was used, allowing for regression-based haplotype association testing. Estimation of haplotypes within the same data sets yielded the same five common haplotypes, displaying remarkably similar estimated frequencies as those obtained with the PHASE program (Table 2). Although the global test for association was not significant (p = 0.16), this haplotype-specific testing indicated a weak but significant association of haplotype WH2 with breast cancer (p = 0.032), with an overrepresentation of this specific haplotype among controls. Using a sliding window, we were able to circumscribe the specificity of this haplotype to the first six genetic variants without loosing the significant association (p = 0.0158, data not shown), with the strongest effect observed when considering the first nine variants (p = 0.0155).

Table 2 Haplotypes as estimated by PHASE and WHAP programs and their estimated frequencies in cases and controls using all single nucleotide polymorphisms (SNPs) genotyped in both case and control series

Tagging SNPs determination

In order to reduce genotyping costs and efforts in future association studies without loss of power, the selection of a small fraction of SNPs, identified as tSNP—which likely represent with a good reliability the French Canadian population and the underlying variation at the GADD45A locus—represents an avenue of choice. Indeed, the selection of tSNPs that would be both frequent (MAF ≥0.05) and in strong LD with the other variants will allow capture of the common alleles of GADD45A. Thus, a causal variant may not be genotyped but will still be represented by the set of tSNPs. Therefore, genotyping data of the 12 variants from both series have been used for final haplotype block as well as tSNP identification. Based on an algorithm of solid block of LD, only one region of strong LD among the French Canadians was identified using the Haploview software (expectation maximization algorithm) (Fig. 2). This LD block encompasses the whole exonic region, given that SNPs 1 and 12 are comprised within the same LD block. Thereafter, considering exclusively haplotypes having a frequency ≥5%, four tSNPs were identified within this LD block, namely, variants c.384 + 93A/G, c.384 + 116T/C, c.384 + 118C/T, and c.385-137T/C (rs3783468, rs3783469, rs681673, and rs3171012). Given that only three genomic variants were available from HapMap data (rs3783468, rs681673, and rs532446) and that they are all located in intron 3, it was not relevant to perform similar analyses using Centre d’Etude du Polymorphisme Humain (CEPH)/CEU sample data. To further test the usefulness of our selected tSNPs in capturing haplotype diversity, a second haplotype analysis was performed using only the tSNPs selected here. These four tSNPs still allowed us to distinguish the five common haplotypes present in our data set, demonstrating the efficacy of the tSNPs identified here to accurately pinpoint all common haplotypes found in our population. However, when using only the three tSNPs present in the HapMap database, just three haplotypes could be determined, delineating the importance of the added tSNPs from the study.

Fig. 2
figure 2

Haploview predictions of haplotype blocks using single nucleotide polymorphisms (SNPs) displaying a major allele frequency (MAF) higher than 5%. The identification of SNPs is performed on a block-by-block basis and are denoted with an asterisk (*) above the SNP number. Tagging SNPs (tSNPs) are selected based on haplotypes showing an estimated frequency higher than 5%. Haploview estimation of haplotype frequencies is displayed on the right

Discussion

In search of additional breast cancer susceptibility genes, we undertook analysis of the GADD45A gene in our cohort of high-risk, non-BRCA1/2 breast cancer families and healthy controls based on the close involvement of GADD45A with the products of the known predisposition genes TP53 and BRCA1. All individuals were drawn from the French Canadian founder population, thus theoretically decreasing the underlying genetic heterogeneity. In addition, breast cancer cases were purposely selected from high-risk families (one per family), which has been demonstrated to increase statistical power (Antoniou and Easton 2003).

Although no truncating mutations were found, a total of 12 sequence variations were identified in our data sets, including two variations not previously reported. No nucleotide change on its own seemed to be significantly associated with breast cancer risk, including c.384 + 118C/T and c.384 + 168T/C, which are both near but not directly within intronic regions of strong conservation observed in GADD45A third intron (from UCSC conservation data, data not shown). Despite this, we can assume they have no functional significance given that these variants displayed similar frequencies in both sample sets. The only noncoding variation present in a region of strong conservation was the rare c.498 + 27C/T nucleotide change located in the 3′ untranslated region (UTR), which displayed similar frequencies in both groups.

Further LD analyses of the genetic variants identified highlighted relatively strong LD between all GADD45A sequence variations. Among these, variants c.385-205A/G, c.492A/G, and c.498 + 27C/T were in complete LD (for both |D′| and r 2 measures), with variants c.492A/G and c.498 + 27C/T being observed in the same case and control individuals, therefore suggesting that these variations are present on the same allele. As for c.385-205A/G, we can expect this variant to be present on the other allele. Indeed, although it was present on the same control individual as c.492A/G and c.498 + 27C/T, it was not present on the breast cancer sample bearing the other two variants. HapMap data within this genomic region displayed a block of strong LD encompassing GADD45A and nearly the entire adjacent GNG12 gene (G-protein gamma 12 subunit, data not shown), which is approximately 14 kb apart in the opposite direction and is implicated in cellular signal transduction. On its 5′ end, GADD45A’s nearest neighbor is the SERBP1 gene involved in the binding of messenger ribonucleic acid (mRNA) 3′ ends. However, as the SERBP1 gene is located more than 260 kb apart, it would not be expected to share LD with GADD45A according to HapMap data, although we could not exclude the possibility of a greater extent of LD in a founder population, such as the French Canadian population (Vézina et al. 2005; Laberge et al. 2005).

Given that the association of a gene with disease can be specific to certain alleles, the haplotype diversity of GADD45A was first estimated with the use of the PHASE software. PHASE estimated that an ensemble of 18 haplotypes solves all case and control genotypes observed, but no significant differences in haplotype structure and frequency were estimated between both series. However, a closer inspection of the attributed haplotypes between both groups highlighted a difference in frequency of the common haplotype PH9. Therefore, to further refine our haplotype analysis, we used the WHAP program, which enables an estimation of a possible haplotype-specific association with breast cancer risk and which confirmed that WH2 (which is identical to PH9) was indeed overrepresented among the control group. An analysis of this same haplotype using only a subset of variants indicated that a strong component of this specific difference is dependent on the 5′ end of the GADD45A gene.

Although none of the identified variants of GADD45A were clearly causative, we cannot rule out the possible effect of yet unidentified variants in other regulatory portions of the gene other than those analyzed here. Indeed, GADD45A is subject to complex multilevel regulation in response to various extracellular signals, and the association of transcriptional inductors and repressors such as p53 and ZNF350/BRCA1 with specific regions of its genomic sequence tightly regulates its expression. However, analysis of the consensus sequences located in intron 3 in both sample sets did not lead to identification of any genomic variants potentially disrupting these motifs.

Nonetheless, GADD45A is also regulated through posttranscriptional events, mainly changes in its mRNA stability through the binding of stabilization (HuR, nucleolin) and destabilization (AUF1) proteins on the 3′ UTR distal sequence. In addition, GADD45A expression has been recently shown to be regulated through a cap-independent and internal ribosome entry site (IRES)-dependent mechanism in response to cellular stress signals such as arsenic-induced cytotoxic and genotoxic damage (Chang et al. 2007). This expression is dependent on an IRES element located in the 5′ UTR region proximal to the start codon. As mutations of IRES elements or impairment of binding between transacting factors and IRES elements have been associated with cancer (Chappell et al. 2000), we carefully screened both sample sets for deleterious mutations that could potentially affect the expression through the IRES element of GADD45A. No sequence variant was identified within the GADD45A IRES sequence. However, it has to be stated that the possibility remains that changes in mRNA stability could be eventually associated with breast cancer risk.

Although GADD45A is part of a cellular pathway that includes several genes demonstrated to be involved in breast tumorigenesis (ATM, TP53, BRCA1), it has not been subjected to an extensive mutation search. Blaszyk et al. (1996) analyzed a series of sporadic breast cancer tumors (with and without p53 mutations) for alterations in GADD45A p53-binding site (n = 53 tumors) and coding sequence (n = 26 tumors) and only identified one polymorphism in intron 3. To our knowledge, the only other study of GADD45A alterations in familial breast cancer was performed on tumors from individuals showing a high incidence of early onset breast cancer (n = 34 families), male breast cancer (n = two families), or breast and ovarian cancer (n = seven families), the majority of which were not previously screened for the presence of a BRCA1 or BRCA2 mutation (Sensi et al. 2004). In agreement with what we found in non-BRCA1/BRCA2 high-risk breast cancer families, they did not identify any alterations of coding sequence, exon/intron boundaries, or the p53-binding domain.

Therefore, although GADD45A represents a very attractive breast cancer susceptibility candidate gene, our analysis and the current knowledge of GADD45A alterations in breast cancer individuals, cell lines, and tumors do not support a strong involvement of this gene with breast cancer risk. Nonetheless, additional investigations will be definitely needed to further ascertain the involvement of variations within the regulatory region of GADD45A with regard to breast cancer. However, given the limited information on sequence variant data available in the HapMap database, this study provides identification of tSNPs, which will be useful for further testing the association of the GADD45A gene in larger cohorts or with another syndrome or disease.