Introduction

Breast cancer metastasis is an intricate process involving the interplay of multiple gene products. Tumor progression is the major cause of breast cancer mortality. Loss of heterozygosity (LOH) was first proposed to cause cancer by Cavenee1 in a hereditary retinoblastoma study. He deemed that the fertilized egg was Rb/rb in heterozygous individuals. Genetic damage to the somatic cell would result in Rb gene mutation or deletion and the subsequent formation of rb/rb recessive homozygotes might lead to tumor formation. This scenario raised the concept of genetic LOH. The mechanisms of LOH include mitotic nondisjunction, chromosome loss, homologous mitotic recombination and recombination (translocation) between two non-homologous chromosomes2,3. Chromosome loss is a major mechanism of LOH that results in an abnormal number of chromosomes; thus, aneuploidy develops. Genetic instability is associated with carcinogenesis and poor prognosis. LOH in a high-frequency region usually includes one or more tumor suppressor genes.

Currently, there are several methods of searching for genes that cause disease or for susceptibility genes, including genome-wide association study (GWAS), exome sequencing and whole-genome sequencing4. Although using Gene Chips has proven to be effective, they are insensitive to rare variants (MAF < 5%) and other structural variations. An increasing number of studies have shown that many complex diseases originate from such rare mutations5,6. In theory, whole-genome sequencing can identify all human genome sequence variations7; however, the high cost is prohibitive for this technique to be widely used. Exome sequencing is the use of sequencing technology to selectively capture the exon coding regions of the whole genome and to enrich for high-throughput genomic sequencing analysis. To a certain extent, the technique overcomes the limitations of genotyping chips and traditional sequencing technology with substantially lower costs, typically approximately 1/50 of the cost for whole genome sequencing, a shorter processing time and more accurate data. More importantly, this technique can produce a more detailed analysis of the genetic variation in the coding region; therefore, it has become an important tool in disease-related genetic research8,9.

Despite rapid research and development of the exome sequencing technique, the use of this technology for the study of LOH has rarely been reported. To explore the feasibility of using this technology to study LOH, differences in genetic LOH between primary and metastatic lesions were compared. We conducted a genome-scale study by investigating genetic LOH in the peripheral blood, a primary tumor and a distant metastatic lesion from the same patient. Here, we present the preliminary results.

Results

Basic statistics

Exome sequencing of a total of three sets of samples – a peripheral blood sample, a primary tumor sample and a metastatic lesion sample – was performed. After the removal of low-quality sequence reads, the alignment of the remaining high-quality sequence reads with the human reference genome was performed using Bowtie 2 software. The average coverage in the target region of the different samples was 56–62× with an average of 60×. The sequencing depth is displayed in Table 1.

Table 1 Sequencing reads of samples

Single nucleotide variants identification

Our analytical pipeline identified 66,635 single nucleotide variants with a minimum coverage of 8× in each sample (Table S1). All further LOH analyses were based on this data file. The identification of somatic mutations in the tumor sample was performed subsequently and three non-silent variants were identified, affecting three distinct genes: CUL7 and RLF (in both the primary tumor and the metastatic lesion) and KIAA2026 (in the metastatic lesion only; Table S2).

LOH distribution

For LOH characterization, we calculated a heterozygosity score (HS) for each SNV locus in each sample using the following formula: HS = 1 − absolute(Ca/(Ca + Cb) − 0.5)/0.5, where Ca and Cb represent the number of reads of allele a and allele b, respectively. Then, a genome-wide scan of tumors based on the HS was performed. To ensure fidelity, an LOH region was identified only when the average HS of at least three consecutive loci was less than 0.1 and the probability of observing such a pattern by chance was smaller than 0.0001 (one-way-ANOVA). Based on genome-wide scanning, LOH in the primary tumor was found at 30 chromosomal loci involving 54 genes (Table S3), some of which may be associated with tumorigenesis (Table 2). In the metastatic lesion, LOH occurred at 48 chromosomal loci (Fig. 1), affecting 1157 genes distributed on 19 chromosomes (Table S4).

Table 2 Selected LOH regions in the primary tumor and genes involved in tumorigenesis
Figure 1
figure 1

LOH across the genome in a metastatic lesion.

An LOH region was identified when the average heterozygosity score (HS) of at least three consecutive SNVs was less than 0.1. The blue lines show the LOH regions.

Metastasis

All LOH regions detected in the metastatic lesion are listed in Table 3 and a comparison shows that ~40% of these regions overlapped with those found in the primary tumor; the remaining ~60% were novel. Some regions observed in this study were consistent with previously reported regions: 3p21.3; 13q12.11; 13q13.2--13q13.3; and 16q22-2310,11,12. Our study found that the incidence of LOH in these regions were 69%, 20–40% and 36–67%, respectively. Additionally, certain hot regions, such as 11q22.2 and 19q13.2, were observed. The heterozygosity score (HS) profile of chromosome 16 is displayed in Fig. 2 and the genes involved in the above-mentioned regions are shown in Table 4.

Table 3 Comparison of genome-wide LOH regions in two tumor sets
Table 4 Selected LOH regions in the metastatic lesion
Figure 2
figure 2

The heterozygosity score (HS) profile of chromosome 16, comparing the metastatic lesion to the blood sample.

There are six “hot” LOH regions, which are marked by stars (P < 2.2e-16; one-way ANOVA test).

Discussion

Recently, SNP chip technology has generally been used for LOH-related studies of particular regions of interest, but this technology is prone to false positives and negatives and therefore has certain limitations. Today, the cost of exome sequencing is lower than in which time period and the processing time is shorter. Moreover, exome sequencing can identify genetic variation at the base-pair level and is regarded as an important tool for disease-related genetic research8,9. We used exome sequencing technology in a genome-scale search to examine genetic LOH in the peripheral blood, a primary tumor and a distant metastatic lesion in the same patient. The results of the study illustrate the credibility of exome sequencing.

The current study indicated that breast cancer metastasis is a highly selective process that involves a complex interaction between the tumor and the host13. Metastasis includes cancer cell separation from the primary tumor, tumor-driven matrix invasion, microvascular invasion and metastasis and colonization and growth at another organ site. The accumulation of genetic alterations increases genetic instability and this process involves a variety of genes, such as oncogenes, tumor or metastasis suppressor genes and genes involved in DNA repair functions14. In this study, a higher incidence of LOH was observed in the metastatic tumor than in the primary tumor. The LOH involved cell differentiation, proliferation, adhesion and motility genes, among which a considerable number were tumor or metastasis suppressor genes.

In the present study, we found that LOH was observed in certain gene families, suggesting that there may be synergistic effects of the family during metastasis, e.g., the carcinoembryonic antigen-related cell adhesion molecule (CEACAM) family, which is located at the 19q13.1-19q13.2 region. The CEACAM family belongs to the immunoglobulin superfamily, which is important for cell adhesion and signal transduction15. It has been reported that CEACAM1 expression levels are significantly lower in breast cancer tissue than in normal breast tissue16. CEACAM1 and other tumor suppressor genes might be jointly involved in apoptosis17. Additionally, it has been reported that CEACAM6 is associated with breast cancer metastasis and recurrence18; however, the relationship to breast cancer has not been observed in other family members. In our study, LOH was observed in CEACAM1, CEACAM3, CEACAM5, CEACAM6,CEACAM7 and CEACAM8, indicating that the structure and function of this family might be closely linked, synergistically regulating carcinogenesis and metastasis.

We also found that the LOH incidence in a zinc finger gene family located at 19q13.1-19q13.3 was the highest, with 40 members of this gene family showing LOH. The gene products of these family members bind to DNA, RNA, double-stranded DNA-RNA hybrid molecules and other zinc finger proteins and they homodimerize19 to regulate gene expression at the transcriptional and translational level. Zinc finger family members play important roles in gene expression regulation, cell differentiation and embryonic development and are associated with many diseases. For example, ZNF303 is associated with the occurrence and metastasis of hepatocellular carcinoma and ZNF202 is associated with leukemia. Additionally, some studies have found that many zinc finger proteins inhibit the transcriptional activity of AP-1 and SRE genes and control the MAPK signaling pathway. Several zinc finger genes, such as ZNF383, ZNF480, ZNF411, ZNF466 and ZNF649, play important roles in tumorigenesis. The LOH of zinc finger genes might cause a loss of gene function, leading to the activation of AP-1 and MMPs. Such activation would, in turn, lead to the degradation of extracellular matrix proteins and, subsequently, to breast cancer recurrence and metastasis19,20,21.

Three members of the matrix metalloproteinase (MMP) family, located at 11q22.3, showed LOH: MMP1; MMP8; and MMP10. These gene products of this family can degrade extracellular matrix proteins, leading to tumor angiogenesis and metastasis. According to a study of breast cancer progression, MMP over-expression degrades the basement membrane and extracellular matrix, leading to angiogenesis and ultimately breast cancer metastasis22.

Similarly to previously reported findings,we found that metastasis-related LOH regions, such as the 3p21.3 locus containing three genes, RBM5, SEMA3F and ALS2CL, showed a corresponding tumor inhibitory effect. The RBM5 gene not only is associated with tumor formation but also is important for tumor proliferation, differentiation and apoptosis. RBM5 is a negative regulator of tumor cell proliferation. Its overexpression can induce apoptosis and G1-phase cell cycle arrest and enhance p53-mediated tumor suppression.

Genes located at the 13q13.2-13q14.3 region include NBEA and ELF1. The downregulation of the NBEA gene has been observed in certain multiple myeloma studies; therefore, the NBEA gene may be a novel tumor suppressor gene23. It was also found that the gene may be associated with breast cancer24. ELF1 expression was found to be associated with breast cancer stage, histological grade and lymph node metastasis and is regarded as an independent factor for invasiveness and prognosis in breast cancer. Although the gene has been considered to be involved in the mechanism of tumor metastasis, more evidence has shown that breast cancer invasion and metastasis are mediated through MMP2 gene activation.

In our study, we discovered that the CDH3 gene, located at chromosome 16q22-23, might be associated with tumor metastasis. The CDH3 gene, also named P-cadherin and E-cadherin belong to the calcium-dependent cell adhesion factors; however, there are differences in function and tissue expression layers. It has been found that P-cadherin is more commonly detected in the basal layer of stratified epithelia and on the myoepithelial cells of the adult mammary gland25. Some studies have found that P-cadherin expression is associated with tumor invasiveness and that the stimulation of tumor cell growth by paracrine growth factors is associated with MMP upregulation. P-cadherin is strongly associated with the invasiveness and metastasis of malignant tumors25. In addition, Madhavan et al.26 found that the expression of the P-cadherin gene is negatively associated with lymph node metastasis, suggesting that this gene is essential for maintaining the structural integrity and stability of the breast cancer tissue, as well as for preventing the shedding of tumor cells and metastasis.

Because this was an exploratory, preliminary study, our initial results only indicate the feasibility of exome sequencing for studying LOH. Additional samples should be analyzed in future studies to increase the reliability of these results.

Methods

Subject characteristics

In this study, the breast cancer patient had a pathological diagnosis of a luminal A type invasive ductal carcinoma that was ER and PR positive (50% and 20%, respectively), HER2 negative (8%) and Ki-67 positive at a low level (8%) after treatment with a modified radical mastectomy before any distant metastases were observed. Post-operative CAF chemotherapy was administered for a total of 6 cycles. One year after the adjuvant chemotherapy, skin and liver metastases were observed. A skin biopsy was performed to confirm the metastasis. The patient's blood, primary tumor sample and skin metastatic lesion were obtained for analysis.

Sample collection, handling and storage

At the time of the first surgery, 10 mL of peripheral blood from the subject was collected in an EDTA blood tube and stored at −80°C. The surgical tumor samples, once resected, were immediately snap-frozen in liquid nitrogen and stored at −80°C. Tumor specimens were assessed and confirmed for the presence of over 75% of cancer cells in the sample by a pathologist. All samples were collected with the consent of the patient and her family and the study was approved by the hospital's ethics committee.

DNA extraction procedure

DNA was extracted from peripheral blood samples and tumor specimens using the QIAGEN Genomic-Tip in accordance with the user manual.

DNA quality, purity and concentration measurements

The OD260/OD280 values of the extracted samples were measured using a UV spectrophotometer. A value of approximately 1.8 indicates DNA of high purity; a value greater than 1.9 suggests contaminating RNA; and a value less than 1.6 indicates the presence of proteins, phenol and other contaminants. The qualifying DNA samples were sent to Shenzhen BGI for exome sequencing.

Exome sequencing procedures and data analysis

First, genomic DNA was randomly cleaved into a fragment library, which was purified and subsequently enriched by NimbleGen 2.1M-probe sequence capture array. The enriched library targeting the exome was sequenced on the HiSeq 2000 platform to acquire paired-end reads with a read length of 90 base pairs. After removing reads containing sequencing adapters and low-quality reads with more than five unknown bases, high-quality reads were aligned with the human genome reference sequence (hg19/GRCh37) using Bowtie2 software27 with default parameters. The PCR duplicates detected from Alignment files were subsequently removed with Picard (http://picard.sourceforge.net/) to improve alignment accuracy. The Genome Analysis Toolkit (GATK)28 was then employed for base quality recalibration, local realignment around the potential insertion/deletion (Indel) sites and variant calling. The raw single nucleotide variants were filtered for low mapping quality, low coverage, snp clusters, etc. All loci with more than three alleles were filtered. Then, the filtered variants were annotated using ANNOVAR29 for the following parameters: function (exonic or splicing); gene; exonic function (synonymous, nonsynonymous, stop gain, non-frameshift or frameshift indels); amino acid change; conservation; dbSNP (version 135) reference number; allele frequency in 1000 Genomes Project (2012 February version).

LOH characterization

The blood sample was used as a control and SNVs with a homozygous genotype in the control were removed. All remaining SNVs with a minimum coverage of 8× in each sample were selected, including the common SNPs reported in dbSNP and 1000 Genomes Project. We calculated a heterozygosity score (HS) for each SNV locus in each sample using the following formula: HS = 1 − absolute(Ca/(Ca + Cb) − 0.5)/0.5, where Ca and Cb stand for the number of reads of allele a and allele b, respectively. Then, a genome-wide scan of tumors based on the HS was performed. To ensure fidelity, a LOH region was identified only when the average HS of at least three consecutive loci was less than 0.1 and the probability (P) of observing such a pattern by chance was smaller than 0.0001 (one-way-ANOVA). Finally, all LOH regions were double-checked manually and annotated using ANNOVAR. The LOH distribution ideogram was generated by Idiographica30.