INTRODUCTION

The microsatellite instability (MSI) arises in short repetitive DNA sequences (or microsatellites) due to defects and loss of function of the MMR family genes (MLH1, MLH3, MSH2, MSH3, MSH6, PMS1, and PMS2), a system that recognizes and repairs errors that occur during DNA replication, as well as some forms of DNA damage.1 The MSI is a change in length due to either insertion or deletion of repeating units of microsatellite in tumor DNA that is not seen in the corresponding normal DNA.2, 3

The MSI phenotype was first described and constitutes a hallmark of tumors associated with hereditary non-polyposis colorectal cancer (HNPCC) or Lynch’s syndrome that is a hereditary cancer predisposition syndrome caused by inactivating germline mutations in the MMR gene family.2, 4, 5, 6, 7, 8 These patients develop tumors at early ages and frequently have multiple tumors in colon and rectum, and less frequently have endometrial, ovarian, and stomach cancers.5, 6, 7, 9 In sporadic colorectal cancer (CRC), the MSI is also observed, but at a lower rate in 10–15%.2, 3, 10 Tumors with the MSI phenotype appear to be associated with particular molecular, histopathological, and clinical features, including mutation profile, specific location, tumors poorly differentiated, high tumor lymphocyte infiltration, low frequency of distant metastasis, and good prognosis.11, 12, 13

At a molecular level, studies have showed that oncogenic alterations found in MSI tumors are somatic mutational events that affect coding repeated sequences in numerous target genes.13, 14, 15 Several MSI-target genes have been identified in MSI-positive tumors, which are involved in important diverse cellular functions and pathways such as DNA repair (eg, MRE11, MSH6, BRCA1, and BRCA2), cell growth (eg, TGFbRII and EGFR), and apoptosis (eg, IGFR and BAX).14, 16, 17 In sporadic tumors, the MSI is strongly associated with the presence of BRAF oncogene mutations,18 a lack of KRAS mutation, and the inactivation of PTEN tumor suppressor gene.19, 20 Thus, the mutation profile in MSI tumors is fundamentally different from other CRCs.14

From a histopathological and clinical point of view, CRCs with MSI have distinct features, such as location in the proximal colon and the major prevalence in stage II (20%) compared with stage III (12%). In stage IV or metastatic CRC, the MSI is relatively uncommon (4%).13, 21, 22 It is well accepted that CRC patients with MSI have an overall better survival than MSS patients.23, 24, 25 Importantly, MSI is being considered as a predictive biomarker of 5-fluorouracil (5-FU), irinotecan, and other chemotherapeutic agents response.14 Following an initial study of Elsaleh et al23 where MSI tumors were associated with better response to a 5-FU regimen, many others report tried to validate these findings; however, its predictive value remains controversial. Large retrospective studies confirmed that the effect of 5-FU is restricted to stage II MSI-positive cases24 and is not applicable to stage III MSI-positive ones.25 Yet, a recent meta-analysis interrogated this issue and did not confirm the predictive value of MSI for 5-FU therapy at any stage.26 Interestingly, a recent study of Dorard et al15 identified the HSP110 as a novel MSI-target gene in colorectal tumors. The authors reported that the overexpression of the mutant HSP110 caused the sensitization of cells to oxaliplatin and 5-FU, and raised the hypothesis that the MSI-target genes rather than the MSI phenotype can be predictive of chemotherapy regimens.15

Due to the need of a better understanding of the clinical and histologic manifestations of HNPCC, the National Cancer Institute (NCI) hosted in 1997 an international workshop on HNPCC, which culminated with the development of a panel of genetic markers that allows the identification of MSI-positive individuals at-risk for HNPCC. This panel is known as Bethesda panel27 and includes five markers: two mononucleotide (BAT-25 and BAT-26) and three dinucleotide (D5S346, D2S123, and D17S250) repeats. Tumors with instability at two or more of these markers were defined as MSI-high (MSI-H), whereas those with instability in one repeat or showing no instability were defined as MSI-low (MSI-L) and microsatellite stable (MSS), respectively.28 Following this landmark panel, several others were reported using tetra, di, and mononucleotide repeats, for MSI screening/diagnosis. However, all of these assays required standardization and the need to include matched-normal DNA as a reference.2, 8, 28

Posteriorly, in 2002 another HNPCC workshop was held at NCI for re-evaluation and improvement of the Bethesda Guidelines.27, 28 This workshop concluded the existence of limitations on the application of the above-mentioned markers, and it was recommended that the dinucleotide repeats should be substituted by mononucleotide repeats.28 Consequently, an optimized assay of five mononucleotide markers (BAT-25, BAT-26, NR-21, NR-24, and NR-27) was established for MSI screening avoiding the use of paired normal DNA.8, 29 Thereafter, this panel of markers has been used by some authors in a pentaplex PCR assay to evaluate MSI status without the need of matched-reference DNA.8, 28, 29, 30, 31 Using this panel, the criterion used to classify a tumor as MSI is not consensual. Several authors suggest that the minimum number of unstable markers required are 2 out of 5,29, 32 whereas other studies suggest 3 out of 5 markers to determine an MSI-H phenotype.27, 28

These five mononucleotide markers show a quasi-monomorphic variation in accordance with different ethnicities: Sub-Saharan Africans showed variant alleles above or approaching 10% for NR-27, BAT-25, and BAT-26, and the other populations (European, East Asian, and Native American) did not reach a frequency of 10% variant alleles for any of the markers.28 In Europe, >95% of individuals do not show any variant alleles in the five markers, whereas in the remaining 5%, individuals present variant alleles in only one marker.28 In the current admixed Brazilian population, these markers were not evaluated and the quasi-monomorphic variation range (QMVR) is still undetermined.

One accurate estimative of ancestry proportions at the population level is possible by making use of ancestry-informative markers (AIMs).33, 34, 35, 36 Some of these studies are focused on insertion deletion polymorphisms (INDELs) that are likely to be important factors underlying inherited traits in humans.33 Recently, using a set of AIM-INDELs, we estimated the ancestry proportions of individuals from Amazonas, Belém, and Rio de Janeiro populations in Brazil.33, 37 We concluded that this panel has accuracy and is suitable to estimate genetic ancestry in highly admixed populations such as the Brazilian.33, 37

Herein, we aimed to establish the QMVR of BAT-25, BAT-26, NR-21, NR-24, and NR-27 markers, in a Brazilian healthy population, to evaluate the feasibility of MSI status determination of tumors without the need of matching normal DNA. Furthermore, we intend to determine the ancestry of Brazilian individuals using specific AIM-INDELs and correlate with their QMVR.

MATERIALS AND METHODS

Samples

Blood normal DNA was obtained from 214 healthy individuals provided by the BioBank of the Barretos Cancer Hospital, Barretos, São Paulo, Brazil. This study was approved by the local ethic commit (600/2012). The average age of the individuals was 33 years old, 52.3% were male, and 90% of the individuals came from southeast region of Brazil (São Paulo and Minas Gerais states), whereas others came from Paraná, Rio Grande do Sul, Bahia, Mato Grosso, Mato Grosso do Sul, Paraíba, Pernambuco, and Rondônia regions. Blood DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. In addition, cancer cell lines were used as controls. The CRC cell lines Co115, DLD1, and LoVo were used as an MSI-positive control, and CaCo2, DIFI, HT29, SW480, and SW620 were used as an MSI-negative control. The DNA from cancer cell lines was extracted using Trizol reagent (Life Technologies, Gaithersburg, MD, USA) following the manufacturer’s protocol.

MSI analysis

The MSI evaluation was performed using a multiplex PCR comprising five quasi-monomorphic mononucleotide repeat markers (BAT-25, BAT-26, NR-21, NR-24, and NR-27).28, 38 Primer sequences were described elsewhere.28 Each antisense primer was end labeled with a fluorescent dye: FAM (6-carboxyfluorescein) for BAT-26 and NR-21; VIC (2′-chloro-7′-phenyl-1,4-dichloro-6-carboxyfluorescein) for BAT-25 and NR-27; and NED (2,7,8-benzo-5-fluoro-2,4,7-trichloro-5-carboxyfluorescein) for NR-24. PCR was performed using the Qiagen Multiplex PCR Kit (Qiagen), with 0.5 μl of DNA at 50 ng/μl and the following thermocycling conditions: 15 min at 95 °C; 40 cycles of 95 °C for 30 s, 55 °C for 90 s and 72 °C for 30 s; and a final extension at 72 °C for 60 min. PCR products were then submitted to capillary electrophoresis on an ABI 3500 xL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s instructions and the results were analyzed using GeneMapper v4.1 (Applied Biosystems) software.

The QMVR of each marker was established from an average of the alleles’ size with a range of plus or minus three nucleotides in accordance with the literature.29, 30

Ancestry analysis

The ancestry analysis was performed using a set of 46 AIMs among the most informative INDELs for each population group as previously described.33 Primer sequences and PCR conditions were according to Pereira et al.33 A 46plex PCR was performed and the amplified products were further submitted to fragment analysis on an ABI 3500 xL Genetic Analyzer (Applied Biosystems), according to the manufacturer’s instructions. The electropherograms were analyzed and genotypes were automatically assigned with GeneMapper v4.1 (Applied Biosystems).

Ancestry proportions were then assessed using the Structure v2.3.3 software,39, 40 considering the four major population groups of Native American, European, African, and East Asian as possible contributors to the current genetic makeup of Brazilians. Using the data available for the HGDP-CEPH panel as a reference for the ancestral populations, a supervised analysis was performed to estimate ancestry membership proportions of the individuals involved in the study. Structure runs considering K=4 consisted of 100 000 burnin steps followed by 100 000 Markov Chain Monte Carlo iterations. The option ‘Use population Information to test for migrants’ was used with the Admixture model, considering allele frequencies correlated, and updating allele frequencies using only individuals with POPFLAG=1.

The SPSS 19.0 software (IBM Corp, Armonk, NY, USA) was used for all statistics analysis. The P-value established for the statistics significance was <0.05.

RESULTS

Determination of QMVR using the 5-marker panel

All the 214 samples were successfully amplified for the five markers, generating 428 alleles for each marker. The allele’s sizes and the absolute and relative frequencies for each marker are showed in Table 1 and Figure 1. All raw data are shown in Supplementary Table 1.

Table 1 Sizes of the alleles and the QMVR for each marker
Figure 1
figure 1

Frequency of allele size distribution (in base pairs) for the five markers from 214 normal DNA specimens. For each marker, the gray shading indicates the QMVR established.

NR-27 demonstrated a stable profile with a QMVR of 82–88 bp, with only one allele (81 bp) outside the QMVR. For NR-21, the size of the quasi-monomorphic alleles ranged from 101 to 107 bp, and five alleles of 100 bp were outside the established QMVR. NR-24 showed a QMVR of 119–125 bp and no allele outside the range was found. Both BAT-25 and BAT-26 exhibited a bimodal distribution: the determined QMVR of BAT-25 was 142–148 bp and 6 individuals (1.38%) harbored variant alleles outside it, while BAT-26 had an established QMVR of 174–180 bp and 14 individuals (3.23%) had alleles outside the QMVR range (Table 1; Figure 1).

Overall, we identified a total of 23 individuals that showed alleles outside the QMVR. Importantly, none of them showed more than one marker outside the range (Table 2), and independently of the 2/5 or 3/5 cutoff used for MSI-H definition, none of the healthy individuals would be erroneously classified as MSI. As expected, all colorectal cell lines known to be MSI (HCT15, DLD1, and LoVo) exhibited a MSI-H profile, and the known MSS cell lines (CaCo2, DIFI, HT-29, SW480, and SW620), were correctly classified (Supplementary Table 2).

Table 2 Detail of all markers of the individuals with alleles outside the QMVR

Ancestry analysis

The results obtained for the complete AIM-INDEL panel allowed us to estimate the AFR, EUR, EAS, and NAM ancestral proportions of the 214 healthy individuals (all raw data are shown in Supplementary Table 1). The average ancestry proportions for all individuals were 67.5% for EUR, 19.6% for AFR, 6.7% for NAM, and 6.2% for EAS (Table 3; Figure 2).

Table 3 Ancestral membership proportions (average) for the HGDP-CEPH Diversity Panel reference samples and the studied Brazilian population
Figure 2
figure 2

Individual ancestry estimates for the testing Brazilian population using the HGDP-CEPH diversity panel genetic data as a training set.

We further compared the ancestry estimates of individuals exhibiting alleles outside and within the QMVR. We observed that the 23 individuals with alleles outside the QMVR showed mean ancestry proportions of 49.6% for EUR, 37.2% of AFR, 7.6% of EAS, and 5.6% of NAM, whereas the 191 individuals within the QMVR showed 69.7% for EUR, 17.5% of AFR, 6.8% of NAM, and 6% of EAS (Table 4). These differences were statistically significant for the AFR (P<0.001) and EUR (P=0.001) ancestral contributions (Table 4). In fact, when we consider individuals with essentially European ancestry (EUR membership proportion >90%) only one allele outside the QMVR was observed.

Table 4 Ancestral membership proportions (average) considering the subsets of individuals within and outside the QMVR

DISCUSSION

The determination of MSI status is of extreme importance for at-risk identification of HNPCC individuals and molecular characterization of sporadic CRC.2, 7 In fact, the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Working Group recommends MSI testing to all newly diagnosed CRC patients.41 Of the several panel of genetic markers available, the use of pentaplex quasi-monomorphic markers (NR-27, NR-21, NR-24, BAT-25, and BAT-26) is one of the most disseminated worldwide, and are the base of the Promega MSI kit (Promega, Madison, WI, USA).2, 7, 30 The optimization of QMVR for each quasi-monomorphic marker is necessary because the allelic size estimation can be influenced by several factors, such as the use of specific reagents and sensibility of the sequencer apparatus utilized, as well as population ethnicity. In the current admixed Brazilian population, the variation of the pentaplex quasi-monomorphic markers was not evaluated and the QMVR was undetermined. In the present study, we analyzed the allelic size variation of the five mononucleotide markers in a series of germline DNA from 214 healthy Brazilian individuals, determined their QMVR and correlated with individuals ancestry determined by AIMs.

The QMVRs established from each marker were 82–88 bp for NR-27, 101–107 for NR-21, 119–125 for NR-24, 142–148 for BAT-25, and 174–180 for BAT-26. The frequency of variant alleles for each marker was variable: BAT-26 marker showed the major proportion of variant alleles, with 14 alleles outside the QMVR. BAT-25 presented six alleles outside the QMVR, followed by NR-21 with four alleles and NR-27 with one allele. For NR-24, no alleles were observed out of the determined QMVR. The QMVR for all markers in our individuals differed by few base pairs in relation to what had been reported previously.28, 29 When considering the five markers, the analyses showed 191 individuals (89.25%) within the QMVR, and 23 individuals (10.75%) exhibited one allele outside the QMVR. Buhard et al28 had established a QMVR analyzing 1206 individuals from 55 different worldwide populations including the HGDP-CEPH panel and found that 87.5% of cases showed alleles within the QMVR,28 a similar result as observed in our study. Importantly, the authors assessed 45 individuals from 2 native Brazilian populations (Karitiana and Surui) and none of them exhibited a marker outside the QMVR.28 Therefore, analyzing the current Brazilian population, which results from the miscegenation of distinct ethnics groups, our study is of grant relevance for MSI determination for clinical application.

To correlate markers allele size with individual’s ancestry, we performed the analysis of an AIM-INDEL panel that could discriminate a four-group contribution. This four-group analysis is due to the historical migration of populations, where the Native Americans suffered admixture with Europeans, followed by African, and more recently with a significant East Asian community, mainly in São Paulo state.

Using this AIM-INDEL panel, we showed that the average ancestry proportions of the 214 individuals were 67.5% of European, 19.6% of African, 6.7% of Native Americans, and 6.2% of East Asian. Using the same panel, Pereira et al33 analyzed a Brazilian population from Belém (northeastern region of Brazil), and identified distinct frequency of ancestry: 53.5% EUR; 14.8% AFR; 22.9% NAM, and 8.8% EAS,33 most probably representing the higher native American ancestry of the region.

Furthermore, Manta et al37 analyzed a Brazilian population from Rio de Janeiro, and using the same AIMs panel observed 55.2% of EUR, 31.1% of AFR, and 13.7% of NAM. Using a different panel of the markers, Pena et al42 performed a wider ancestry analysis of Brazilian population from distinct geographical regions (North, Northeast, Southeast, and South). The authors described for the southeast region similar frequencies to the ones obtained in our studied population, namely 74.2% of European, 17.3% of African, and 7.3% of Amerindian ancestry.42 The majority of the alleles found in our population outside the QMVR are on the BAT-25 and BAT-26 markers. These loci are described as rather polymorphic in African populations.30 In consonance, we showed that the Brazilian individuals harboring those variants showed a higher African ancestry. Nevertheless, we observed that none of the individuals with polymorphism exhibited more than one marker outside the established QMVR.

An important issue with the use of the present pentaplex panel for MSI testing without the need of reference DNA is the minimum number of unstable markers needed to categorize a tumor as MSI. The literature is ambiguous, varying from 2 to 3 out of 5 markers. In the present study of an admixed population from the Southeast of Brazil, none of the individuals showed more than one unstable marker. Therefore, we suggest classifying as MSI-H, individuals with instability at two or more markers. Since the presence of instability in one marker can be due to polymorphic variants, we proposed that these subjects should be further analyzed by MMR immunohistochemistry, or a PCR using paired normal DNA, for accurately determine the MSI-L status of patients. Therefore, our results demonstrated that this methodology allows characterizing the MSI status without the need of the matched blood for each patient. This constitute a very important advantage since molecular diagnostic laboratories often received only the formalin-fixed paraffin-embedded tumor tissue, not having access to matched blood of the patient.

In conclusion, this is the first study that reports the optimization and establishment of a suitable QMVR for a pentaplex system of MSI markers and characterizes their genetic variation in the Brazilian population for molecular diagnosis purposes. We found a noticeable number of alleles outside the QMVR, which appear directly associated with an important African ancestral component present in the current Brazilian population, as depicted by the use of AIMs. Despite the higher frequency of variant alleles in individuals with African ancestry, no individuals showed more than one allele outside the established QMVR in our study, and therefore our results corroborate that this methodology may be used with confidence to assess MSI status without matched-normal DNA and independently of the ethnicity, even in the highly admixed population of Brazil.