Introduction

Breast cancer is the most common malignant form of cancer among occidental women, and mutations in the two major predisposition genes BRCA1 and BRCA2 have been identified as high-penetrance alleles; however, these alleles account for only 25% of families with hereditary breast cancer (Ponder 2001; Easton 1999). Variants in low-penetrance genes could explain a greater proportion of breast cancer under a polygenic or high-risk genes heterogeneity model (Pharoah et al. 2002; Antoniou and Easton 2003, 2006; Johnson et al. 2007; Weber and Nathanson 2000). Among French Canadians, about two-thirds of the high-risk families tested for mutations in these two high-penetrance genes BRCA1 and BRCA2 will yield an inconclusive result (Simard et al. 2007), supporting the existence of other still unknown high-penetrance susceptibility genes or several weak-to-moderate predisposing alleles contributing to breast cancer risk in this population (Pharoah et al. 2002; Hopper 2001).

The central role of BRCA1 and BRCA2 genes in DNA repair, recombination, cell cycle control and transcription (Welcsh et al. 2000; Kerr and Ashworth 2001; Venkitaraman 2002) has led researchers to investigate the implications of several similarly acting genes on breast and/or ovarian cancer predisposition including ataxia telangiectasia-mutated (ATM), CHEK2, TP53, PTEN, STK11 (for review see Oldenburg et al. 2007) and a few other genes involved in DNA repair, hormone metabolism or detoxification (Dunning et al. 1999; Kuschel et al. 2002), some of which have also been investigated in the French Canadian population (Durocher et al. 2006; Guénard et al. 2007; Desjardins et al. 2007). More recently, specific FANC genes have also been associated with breast cancer susceptibility, namely FANCD1, subsequently identified as BRCA2, FANCN/PALB2 and FANCJ/BRIP1 (Durocher et al. 2005; Seal et al. 2006; Rahman et al. 2007; Erkko et al. 2007; Tischkowitz et al. 2007). Recent genome-wide association studies have also identified common variants conferring susceptibility to breast cancer (Cox et al. 2007; Easton et al. 2007; Stacey et al. 2007; Hunter et al. 2007). However, these genes likely represent less than 5% of the excess risk of familial breast cancer, and therefore, after accounting for all of these genes, at least 75% of the familial risk of breast cancer is still unexplained (Easton 1999).

The poly(ADP-ribose) polymerase (PARP/ADPRT) protein family catalyzes the synthesis of cellular poly(ADP-ribose) following DNA damage. The ADPRT protein is involved in genomic integrity since it regulates cellular response to DNA repair and apoptosis and has been shown to display a protective effect against cancer development (Burkle et al. 2002; Masutani et al. 2003). Although several additional PARPs are expressed in humans (Burkle 2005; Burkle et al. 2005), ADPRT/PARP-1 is the founding family member, showing a high and wide expression in human cells (Yamanaka et al. 1988; Ludwig et al. 1988). Several years ago, it was suggested that the ADPRT gene is involved in Fanconi anemia (FA) (Schweiger et al. 1987); however, no ADPRT abnormalities were found in FA patients of complementation group A (Flick et al. 1992).

ADPRT is involved early in double-strand signaling and repair pathways (Bryant et al. 2005; Farmer et al. 2005; McCabe et al. 2006; Andrabi et al. 2006), through the BRCT (BRCA1 C-terminal) domain situated in the central region within the ADPRT protein, which mediates the protein–protein interactions with DNA repair proteins, while the ADPRT DNA-binding domain binds specifically with high affinity to DNA single- (SSB) and double-strand breaks (DSB) upon DNA damage (Dantzer et al. 1999; Audebert et al. 2006). These properties suggest the implication of the ADPRT protein with base excision repair (BER), SSB and DSB repair, as well as homologous recombination (HR) processes (Trucco et al. 1998; Burkle 2000). In addition, a recent study has shown a functional and physical interaction between ADPRT and ATM, which suggests that ADPRT might act as a DNA damage sensor, playing a key role in early signaling of DNA damage (Haince et al. 2007). Acceptor molecules poly(ADP-ribosyl)ated by ADPRT activity (histones, topoisomerases, p53 or Fos) are proteins predominantly involved in nuclear functions such as DNA synthesis and repair, chromatine structure modulation, transcription and cell cycle regulation (D’Amours et al. 1999), which are cellular processes intimately related to BRCA protein functions.

Overexpression of a dominant negative ADPRT DNA-binding domain sensitizes cells to γ-irradiation and alkylating agents, and increases DNA amplification and mutations, therefore promoting cancer progression (Kupper et al. 1996). ADPRT expression has been related to cancer development in several studies. Shiokawa et al. (2005) recently suggested that the ADPRT gene could be involved in the development of teratocarcinomas, and an overexpression of the ADPRT gene in human Ewing’s sarcoma and high-grade lymphoma has also been demonstrated (Prasad et al. 1990; Menegazzi et al. 1999). On the other hand, a lower expression of ADPRT has been shown to be related to a reduction in genomic stability in human breast cancer, and was also observed in certain human tumor cell lines, combined with a structural alteration of the ADPRT gene (Bieche et al. 1996; Masutani et al. 2004). Furthermore, it has been shown that sensitivity to ADPRT inhibition is induced in cells deficient in DNA damage-signaling proteins involved in HR, including BRCA1, BRCA2 and FANC proteins, which lead ultimately to cell cycle arrest, chromosome instability or cell death (Farmer et al. 2005). ADPRT deficiency has also been associated with mammary tumorigenesis in mice (Tong et al. 2006). Taken together, this emphasizes the role of ADPRT in cancer therapy (Helleday et al. 2005).

Based on the multiple functions of the ADPRT gene in genome stability and cellular response to DNA damage, this gene represented an attractive candidate that might explain a given proportion of the remaining inherited susceptibility to breast cancer. Thus the goal of this study was to evaluate the possible involvement of ADPRT germline mutations or sequence variations in breast cancer susceptibility by analyzing all coding exons and flanking intronic sequences in DNA samples from unrelated individuals affected with breast cancer in which no BRCA1/2 mutations were detected, as well as in healthy individuals from the same population.

Materials and methods

Ascertainment of families and DNA extraction

The recruitment of high-risk French Canadian breast and/or ovarian families started in 1996 through a research project, which thereafter evolved into a large ongoing interdisciplinary research program designated INHERIT BRCAs (Simard et al. 2007; Avard et al. 2006). This integrated clinical research program consisted of a network of referring clinicians across the province of Québec. Moreover, clinicians from seven participating hospitals were directly involved in this translational research program and were responsible for BRCA1/2 test result disclosure to participants. Approval was obtained from ethics committees corresponding to the different institutions participating in the INHERIT BRCAs program. More details regarding ascertainment criteria, experimental and clinical procedures as well as the INHERIT BRCAs research program have been described elsewhere (Simard et al. 2007; Avard et al. 2006; Antoniou et al. 2006; Moisan et al. 2006; Vézina et al. 2005). Subsequently, another component was designed for the “localization and identification of new breast cancer susceptibility loci/genes.” Ethics approval for this latter study was also obtained from the different institutions participating in this research project and each participant that knew their inconclusive BRCA1/2 test results status had to sign a specific informed consent for their participation in this component. All participants had to be at least 18 years of age and mentally capable. A subset of 54 high-risk French Canadian breast/ovarian cancer families were recruited in the present study according to the ascertainment criteria described previously (Durocher et al. 2006). Genomic DNA extraction of the 54 French Canadian breast cancer cases as well as 73 healthy unrelated French Canadian individuals has been performed as previously described (Durocher et al. 2006). For the Pro377Ser nonsynonymous missense variant identified only, the allelic frequency was further ascertained in an additional 44 breast cancer cases recruited according to the same criteria, as well as in 41 additional control individuals (18 females and 23 males).

RNA isolation from immortalized cell lines and cDNA synthesis

Lymphocytes were isolated and immortalized from 7–9 ml of blood samples using the Epstein–Barr virus (EBV) in 15% RPMI media as previously described (Durocher et al. 2006). Total RNA was extracted from EBV-transformed β-lymphoblastoid cell lines using TRI REAGENT® (Molecular Research Center, Inc., Cincinnati, OH, USA) according to the manufacturer’s instructions. The purified RNA was stored at −80 °C until use. Following RNA extraction, reverse transcription of 5 μg of RNA was performed as previously described (Durocher et al. 2006).

PCR amplification, mutation analysis and variant characterization

The intron–exon boundaries of the ADPRT/PARP-1 gene were determined by aligning GenBank mRNA records (NM_001618) with genomic sequence records (NC_000001). The ADPRT gene spans approximately 47 kb and is composed of 23 exons (1q41–q42). ADPRT amplicons covering the entire mRNA-encoding portions and flanking intronic sequences from genomic DNA were obtained by PCR amplification using the primer pairs described and published previously (Shiokawa et al. 2005), with the exception of five additional primers including one in intron 3 (Rev_Ex3: 5′-AAGGCTGGAGGCTCATAACA-3′), given that the published primer was located within a DNA sequence comprising a polymorphism (rs3219034) and two primer pairs since original primers yielded low amplification for exon 8 (For_ex8: 5′-CCTATTAACTCTCGAAGGGAGCAA-3′ and Rev_ex8: 5′-AGAGAGACCCTTGACGGATACTTT-3′) and exon 20 (For_ex20: 5′-GCTGATTACCACTGTAGGTCTT-3′ and Rev_ex20: 5′-CTCTTGGATACACTACCACCA-3′). ADPRT direct sequencing was performed on an ABI3730XL automated sequencer, using version 3.1 of the Big Dye fluorescent method according to the manufacturer’s instructions (Applied Biosystems, Foster City, CA, USA). Sequence data were analyzed using the Staden preGap4 and Gap4 programs.

Each SNP was tested for deviation from Hardy–Weinberg equilibrium (HWE) by means of a chi-square test. All P values were two-sided with one degree of freedom. Allelic frequency was also evaluated in both series by means of a chi-square test. P values of less than 0.05 were considered to be significant.

LD analysis, haplotype estimation and tagging SNP selection (tSNP)

To estimate the pattern of linkage disequilibrium (LD), all 33 SNPs identified in our breast cancer case series were genotyped. The LDA program (Ding et al. 2003) was used to calculate the pairwise LD for each SNP pair. Lewontin’s \( |{\text{D}}'| \) and r 2 measures were used to illustrate a graphical overview of LD between SNPs (Ding et al. 2003; Devlin and Risch 1995).

Haplotype analysis was performed using PHASE 2.1.1 software (Stephens et al. 2001b). This program (PHASE) estimates haplotype frequencies with a Bayesian-based algorithm and then uses a permutation test to determine the significance of differences in inferred haplotypes between both sample sets. All association tests were run under default conditions, with 1,000 permutations. Haplotype frequencies were estimated using the seven coding SNPs, two SNPs located in untranslated regions (c.1-17G/C and c.3045 + 15C/A) as well as three intronic SNPs (c.1-17G–C, c.835-21A–G and c.1543 + 60G–C) genotyped in both sample series (due to the proximal location of an exonic variation identified in breast cancer cases). Haplotype blocks were identified using genotyping data from affected individuals and HapMap data from the CEPH cohort using the Haploview software (Barrett et al. 2005). Tagging SNPs (tSNPs) from each LD block were then identified using the same software.

Splice site prediction scores were evaluated using SSPNN, while protein alignment was performed using data extracted from the UCSC database. The effect of amino acid substitutions was predicted using the SIFT and PolyPhen web-based software.

Electronic databases

PHASE:

: http://stephenslab.uchicago.edu/software.html

HapMap:

: http://www.hapmap.org

Haploview:

: http://www.broad.mit.edu/mpg/haploview

Splice site prediction program using neural networks (SSPNN):

: http://www.fruitfly.org/seq_tools/splice.html

UCSC genome bioinformatics:

: http://genome.ucsc.edu/

SIFT:

: http://blocks.fhcrc.org/sift/SIFT.html

PolyPhen:

: http://genetics.bwh.harvard.edu/pph/

fast DB:

: http://www.fast-db.com/fastdb2/frame.html

Cancer Genome Anatomy Project (CGAP):

: http://snp500cancer.nci.nih.gov

Cancer Genetic Markers of Susceptibility (CGEMS):

: http://cgems.cancer.gov

Results

ADPRT mutation analysis and variant characterization

Although no truncating mutation was found in the ADPRT coding region of our French Canadian breast cancer cases, we identified 34 variants in ADPRT exonic and flanking intronic sequences, including one sequence variation (c.1543 + 22C/T) found exclusively in the control series (Table 1). These sequence variations are composed of seven coding variants and two variations located in the 5′UTR and 3′UTR, respectively, while the remaining 25 represent intronic sequence variations. Among these 34 variants, 32 are nucleotide substitutions while the two remaining nucleotide sequence changes consist of one insertion and one deletion. Out of seven coding variations, two imply an amino acid change and five are silent substitutions. Of the 34 variants observed, seven are novel while 27 are reported in the single nucleotide polymorphism database (dbSNP Build 127). No deviation from HWE is observed for any of the nucleotide changes identified, with the exception of one intronic variation (c.402 + 97del15), for which a homozygote carrier is observed (data not shown). When considering all exonic and intronic nucleotide variations, 23 are common variants with a minor allele frequency (MAF) of >5%, and 11 are considered rare sequence variations since they display a low frequency (<5%), including seven variations (c.96G/C, c.402 + 48G/A, c.618-36insATGG, c.1129C/T, c.1506G/T, c.2506-187G/A and c.3045 + 15C/A) observed only once and in different breast cancer families.

Table 1 Observed coding and intronic sequence variants and genotype frequencies of ADPRT gene in familial breast cancer cases (n = 54)

The variants situated in the coding sequence, and the intronic variations located in the proximal region of coding variants identified in the case series, were also genotyped in healthy French Canadian controls. Corresponding frequencies are denoted in Table 2. Including the additional intronic variant found exclusively in controls (c.1543 + 22C/T), 12 nucleotide changes (seven coding, two UTR and three intronic variants) were genotyped in both series. No significant difference in terms of allelic frequency was observed between both series based on single-marker analysis (Table 2). Of the 12 sequence variations genotyped in the control group, seven were considered common polymorphisms (MAF higher than 5%).

Table 2 ADPRT sequence variations and genotype distribution in familial breast cancer cases and controls

Regarding the two exonic variants resulting in amino acid substitutions, c.2285T > C (Val762Ala) is located in the Parp-like domain; this region encompasses amino acids 662–997. The other amino acid substitution (Pro377Ser) is located in the automodification domain (D’Amours et al. 1999; Bouchard et al. 2003; Hassa et al. 2005). Comparison of these missense substitutions was performed across relevant species in order to obtain a more representative prediction of the importance of a specific residue in protein function. Alignment of ADPRT ortholog sequences, illustrated in Table 3, revealed that both amino acids are conserved in all species, including more distant species like Xenopus tropicalis and Tetraodon nigroviridis (Val762Ala only). Given that Pro377 and Val762 residues are invariant from human to frog, this could suggest that these positions are under strong functional constraint. Considering the Val762Ala substitution, given that valine and alanine are both hydrophobic amino acids with similar structures, such a change is unlikely to have a significant effect on protein folding or cause steric hindrance, and so has a limited impact on protein function. On the other hand, regarding the Pro377Ser variant, proline is a nonpolar residue while serine is a polar residue, which potentially represents an important change in amino acid properties. This variant was observed only once among our breast cancer cases, in a subject diagnosed with breast cancer at the early age of 47 years. No DNA sample was available from the other three first-degree relatives affected with breast and/or ovarian cancer. However, as illustrated in the pedigree displayed in Fig. 1 (panel A), an unaffected sister (66 years) was tested for the presence of this variant and was found to be noncarrier.

Table 3 Nonsynonymous sequence variants detected in human ADPRT and residues found in orthologs
Fig. 1A–B
figure 1

Pedigrees of Pro377Ser carrier’s families. The age of onset of breast cancer and age of death are described below the indicated patient. For each family, the affected case that was sequenced first is designated by an arrow. The genotype call is indicated below the respective individual analyzed for the Pro377Ser variant. d, deceased

Given that the Pro377Ser variant could be of potential interest, the frequency of this nucleotide alteration was further ascertained in an additional 44 non-BRCA1/2 breast cancer cases from 44 additional distinct families, and this variant was observed at the heterozygous state in two additional individuals affected with breast cancer, at the age of 53 and 60 years, respectively. A DNA sample from an unaffected third-degree 61-year-old relative could be genotyped and was found to be noncarrier of the Pro377Ser variant (Fig. 1B). No DNA sample was available from relatives for the other family carrying this variant, where the mother was affected with breast cancer at the age of 37 years. Moreover, this Pro377Ser change has also been analyzed in 82 additional chromosomes, consisting of 18 additional unaffected women and 23 unaffected men. All tested individuals were found to be noncarriers of the Pro377Ser variant. This nonsynonymous nucleotide variation was thus observed in three different breast cancer cases (out of 196 chromosomes), while all unaffected individuals (228 chromosomes) tested in our analysis were found to be noncarriers. Even when the 23 unaffected men were included in the calculation, the allelic distribution of this variant, based on a single-marker analysis between both sample sets, yields a nonsignificant difference (P = 0.06).

Given that the Pro377Ser substitution is not located within the BRCT domain (although at its proximal boundary), it was not possible to visualize the effect of this amino acid change on the tertiary structure of the protein using available web-based protein structure prediction software. However, using Polyphen and SIFT software, both residue substitutions (Val762Ala and Pro377Ser) are predicted to be benign or tolerated amino acid changes, depending of the prediction program used for analysis.

The possible effect of all coding and intronic variants on splicing consensus sequences was also assessed using in silico analysis. None of the genetic variants observed showed a significant change in the splicing score, including the intronic variants most likely to have a potential effect (c.287-6C/A, c.617 + 12G/A, c.835-21A/G and c.1543 + 22C/T), given their location in the proximal region of a splice site junction (data not shown).

LD analysis

Linkage disequilibrium (LD) was calculated for pairs of SNPs using D′ and r 2 measures (Devlin and Risch 1995; Lewontin 1964). A graphical representation of the pairwise LD between all 33 SNPs identified in our breast cancer cases is shown in Fig. 2. Perfect LD (\( |D'| \) = 1) is observed between the two most distantly separated intragenic SNPs (SNP1 and SNP34), which indicates that the LD at the ADPRT locus does not decline significantly with distance. However, three SNPs (SNP22, SNP26 and SNP29) are involved in the majority of the low pairwise LD values observed. Indeed, SNP29 displays a weak LD value of 0.0217 in association with its adjacent SNP28, while SNP24 in association with SNP28 shows the lowest D′ value (0.0156). For several pairwise associations involving SNP22, a large spectrum of LD values is noted, the weaker LD values being observed in association with SNP14 and SNP23 (D′ = 0.217), and with SNP32 (D′ = 0.206). As expected, the r 2 coefficient calculated for the ADPRT gene region displays lower values, as this measure depends on allelic frequency, which is well represented by a large spectrum of r 2 values, ranging from 0.0001 to 1.0. Indeed, most of the lowest r 2 values are observed for the nine SNPs displaying a MAF of below 2%.

Fig. 2
figure 2

Pairwise linkage disequilibrium (LD) measures of \( {\left| {D'} \right|} \) and r 2 for the 33 SNPs identified in our breast cancer cases series. All SNPs are denoted numerically with reference to Table 1. The SNP18 found exclusively in the control series is not included

Haplotype analysis and tSNP identification

Analysis of ADPRT haplotypes estimated with the PHASE program, using the 12 SNPs genotyped in both series, led to 13 proposed haplotypes. Among these, four haplotypes display a frequency of >1%, which represent 95.7% of all haplotypes in cases–controls combined (Fig. 3A). The permutation test of these 12 SNP haplotypes indicates no significant difference in the estimated haplotype frequency distributions between both groups (P = 0.42, data not shown).

Fig. 3A–B
figure 3

A The table denotes the frequencies, using PHASE, of haplotypes using coding SNPs and intronic SNPs located in the proximal region of the coding sequence, estimated in cases and controls. B Haplotype blocks predicted using SNPs identified in case series showing a MAF of higher than 5% (23 SNPs). tSNPs identified on a block-by-block basis are denoted by asterisks above the SNP number and have been selected based on haplotypes showing a frequency of higher than 5%. Population haplotype frequencies are displayed on the right of each haplotype combination, while the level of recombination is displayed above the connections between two blocks. Thick connections represent haplotypes with frequencies of higher than 10%, while frequencies below 10% are represented by thin lines

As extensive resequencing for each individual currently remains time-consuming and expensive, the genetic information is forthcoming among a set of genetic markers that contain information on their neighborhood due to LD. Consequently, it makes sense to select a small fraction of the SNPs, identified as tSNP, to reduce genotyping costs and efforts in future association studies without losing power. However, prior to the identification of ADPRT haplotype blocks and tSNPs that aid well-powered studies performed with the Haploview program, analyses were performed to see if the use of genotyping data from cases only could be justified, given that a higher number of SNPs have been analyzed in this sample set. At first, several independent analyses of haplotype block identification were performed using the 12 SNPs genotyped in both series, in (1) cases alone; (2) controls alone; and (3) cases and controls combined, based on the algorithm of solid spine of LD. Similar results were obtained in all analyses (data not shown), supporting the use of the case series, in which a greater number of SNPs with MAFs of higher than 5% were genotyped, and which likely represent the French Canadian population with good reliability.

Therefore, genotypes of the 23 coding and intronic SNPs with a MAF of ≥ 5% in affected individuals were used for final haplotype block identification. Four regions of strong LD among the French Canadians were identified using the Haploview software (expectation maximization algorithm) (Fig. 3B). The first LD block comprises the region encompassing exons 1–17, while the 3′ exonic region is separated into three different small LD blocks. Thereafter, exclusively considering haplotypes with a frequency of ≥5%, 11 tSNPs were identified within these four LD blocks, namely SNPs 1, 4, 8, 20 and 24 found in block 1, SNPs 26 and 28 and SNPs 29–30, representing blocks 2 and 3, respectively, while block 4 consists of SNPs 31 and 32. This haplotype block analysis was also performed using an alternative program, namely HaploBlockFinder, and similar results were obtained (data not shown).

Corresponding analyses were performed using HapMap data from the CEPH/CEU cohort. When exclusively using SNPs with a MAF of ≥5%, two LD blocks were identified, including a large LD block encompassing exons 3–23 (Fig. 3B). Eight tSNPs could be identified from these analyses. Among these, it should be noted that three tSNPs (rs3219123, rs3219142 and rs752307) have been identified in both the French Canadian and the CEPH/CEU sample sets.

Discussion

Based on its role in genome stability, and given the demonstration of its altered expression in many cancers, ADPRT may be considered to be a candidate gene that could possibly explain a small fraction of the still to be elucidated familial breast cancer. In the present study, we analyzed the nucleotide sequence of the ADPRT gene in a cohort of French Canadian high-risk breast cancer families, considered a founder population, to assess the possible involvement of ADPRT in breast cancer risk. As this study was performed on individuals affected with breast cancer drawn from high-risk families (one case per family) without BRCA1/2 mutations, this allowed us to increase the likelihood of potentially identifying genetic variants associated with breast cancer risk (Antoniou and Easton 2003). Although no deleterious germline mutations leading to the premature termination of the protein were identified in the coding region, 33 sequence variants were identified in breast cancer cases, among which seven were coding variants (two changing an amino acid: Pro377Ser and Val762Ala) and seven were novel changes. As clearly illustrated in Table 1, most variants displaying a low frequency were identified in only one individual in the heterozygous state, and were found in different families, indicating that no specific allele carries these rare sequence variations together.

Two SNPs resulting in amino acid substitutions, namely Pro377Ser and Val762Ala, were observed in our breast cancer cases. Val762Ala is located within the catalytic domain (a.a. 523–1014) and is highly conserved among species, as also observed for Pro377Ser (Table 3). The MAF of the Val762Ala protein alteration observed in our study was higher (cases 13.9%, controls 14.4%) than that reported (5.9%) in the SNP 500 Cancer Database of the NIH Cancer Genomic Anatomy Project, but lower than the value (35%) reported in the NCBI (National Center for Biotechnology Information) dbSNP database. This may be attributed to population-specific differences. Recently, Lockett et al. (2004) showed that the Val762Ala variant in the ADPRT gene was associated with an increased risk for prostate cancer in Caucasian subjects (n = 488), and a similar finding was also reported for lung cancer (n = 1000) (Zhang et al. 2005). However, in this latter study, no significant association of the Val762Ala variant with breast cancer risk was found, in concordance with a recent report in Chinese women conducted by Zhai et al. (2006) (n = 302). In addition, this variant has recently been shown to display a significant reduced enzymatic activity associated with the PARP1-Ala allele (Wang et al. 2007).

As for Pro377Ser, it is situated in the automodification domain (a.a. 372–523), but outside the BRCT domain (a.a. 385–476), although it is located at its proximal boundary (Schreiber et al. 2006). As stated earlier, this nucleotide variation was found exclusively in three unrelated breast cancer cases (three out of 196 alleles) and was not observed in two unaffected close relatives, as well as in healthy control individuals (228 alleles). However, given that only a limited number of additional DNA samples from relatives of these breast cancer families carrying this variant were available, it is not possible to determine whether this variant might be associated with an increase in breast cancer susceptibility in our cohort. In a recent study by Cao et al. (2007), this variant was not observed in their series of French breast cancer cases and controls. Given that this sequence variation is reported with a MAF of 2% in the CEU cohort (NCBI database), this could suggest that this nucleotide change would be a polymorphism. As such, one would have expected Cao et al. to observe this nucleotide change among their controls of European individuals. However, it is worth mentioning that the frequency reported in the CEU cohort is based on a single heterozygote out of only 24 individuals. Therefore, one has to be cautious, as such a frequency cannot be extrapolated to other cohorts and other populations. In addition, this variant is not present in either Affymetrix or Illumina genechips, and was not genotyped in other previously published breast cancer association studies (to our knowledge), nor is it present in CGEMS or CGAP databases. Therefore, a multiethnic cohort would be needed to reliably establish the frequency of this sequence variation and to establish the potential contribution of this variant to breast cancer.

A common variant (c.1-17G/C; MAF >15%) was identified in the 5′-UTR region in both French Canadian sample sets. This sequence variation is located 17 bases upstream of the first ATG and does not overlap with the Kozak’s consensus sequence (Kozak 1984). However, it should be noted that the major transcription site of the mRNA is present 146 bases upstream of this 5′-UTR alteration, and that a putative binding site for the transcription factor Ets-1 could be observed 9 bps downstream of this c.1-17G/C variant (Soldatenkov et al. 1999). Further investigations would therefore be required to fully assess the potential effect of this variant on transcription efficiency.

The possible effect on splicing of all coding or intronic variants located in the splice junctions of ADPRT gene was assessed using the SSPNN website. None of the variants analyzed revealed any significant change in their splicing scores. Regarding the four intronic variants (c.287-6C/A, c.617 + 12G/A, c.835-21A/G and c.1543 + 22C/T) demonstrating a higher potential effect on splicing mechanism, due to their proximal location from a splice junction site, no corresponding alternative splice mRNA could be identified in expressed sequence tags (ESTs) reported in the NCBI database. Besides, no ADPRT alternative splice mRNAs are reported in alternative transcript databases such as fast DB (Friendly Alternative Splicing and Transcripts Database).

It is well recognized that haplotype analysis is more powerful than single marker analysis at revealing a possible association with a given disorder, and this is true both within genes (Daly and Day 2001; Stephens et al. 2001a) as well as in genomic regions (Grindflek et al. 2006). We have thereafter performed a detailed haplotype characterization of the ADPRT gene in the French Canadian population. When using the 12 SNPs commonly genotyped in both series to estimate the haplotype frequencies, no difference was found between both groups, considering all 13 haplotypes identified.

The pairwise LD analysis (Fig. 2) did not identify any distinct LD blocks in the ADPRT gene. This is supported by the observation that SNP1 is in perfect LD with the most distal SNP34 considering the D′ values. The discrepancy between both LD measures (D′ and r 2) can be explained by several SNPs (nine out of 33 SNPs) displaying low MAF (MAF <2%) and the fact that r 2 is more sensitive to allele frequencies than D′ (Devlin and Risch 1995). Although no distinct LD blocks have been identified following LD analysis, when using the Haploview software with 23 SNPs identified in cases and displaying a MAF >5%, four distinct LD blocks could be identified, including one large block encompassing exons 1–17. The breakage of strong LD seems to be located in the regions of exon 17, exon 20, and intron 20, although the last three blocks are particularly small. Moreover, two of the three SNPs showing the lowest pairwise LD values following LDA analysis (Fig. 2), namely SNPs 26 and 29, are located in the vicinity of LD breakage identified with the Haploview program.

Using the same algorithm with SNP data genotyped the in HapMap database (HapMap data rel#22 on NCBIB36 assembly, dbSNP b126) showing a MAF of >5%, two LD blocks were identified. The breakage of strong LD appears to be located in the 5′-region of the ADPRT gene, between exons 2 and 3. However, it should be noted that only 38 out of 123 SNPs reported in the HapMap database displayed a MAF of >5%, which represent common SNPs found in many different populations and therefore possibly exclude SNPs observed only in our French Canadian founder population.

Interestingly, when identifying tagging SNPs, the analysis revealed that 11 tSNPs could be enough to represent the majority of ADPRT haplotypes in our French Canadian individuals. This valuable information will undoubtedly facilitate additional studies. Moreover, using data from the HapMap database, among the 38 SNPs used for the analysis, eight tSNPs were identified by the Haploview software. It should be noted that ten common SNPs were used for the tSNP identification analyses in both cohorts and that three tSNPs have been commonly identified in both CEPH individuals (HapMap data) and the French Canadian cohort. Most SNPs reported in the Hapmap database have not been genotyped in our cohort as they were located in deeper intronic regions (>150–200 bp).

Although no deleterious truncating germline mutations have been identified in French Canadian breast cancer cases, seven novel polymorphisms were identified, in addition to two coding variations. The Pro377Ser amino acid change has been observed in three unrelated breast cancer cases only, in the heterozygous state, and was not found in unaffected healthy individuals from the same population. We have conducted here an exhaustive and detailed mutation and haplotype tagging analysis of the ADPRT gene in breast cancer cases. The data presented in this study clearly identified 11 ADPRT tSNPs, which will be useful for other large-scale association studies. We cannot however exclude that sequence alterations, located in promoter or deep intronic regions leading to altered expression or alternative splicing, respectively, could be responsible for an ADPRT-associated predisposition to breast cancer. Therefore, further analyses of other populations and larger cohorts will be required to definitely exclude the possible involvement of ADPRT, as a low-penetrance gene, in breast cancer susceptibility.