High prevalence of somatic PIK3CA and TP53 pathogenic variants in the normal mammary gland tissue of sporadic breast cancer patients revealed by duplex sequencing

Kostecka, Anna; Nowikiewicz, Tomasz; Olszewski, Paweł; Koczkowska, Magdalena; Horbacz, Monika; Heinzl, Monika; Andreou, Maria; Salazar, Renato; Mair, Theresa; Madanecki, Piotr; Gucwa, Magdalena; Davies, Hanna; Skokowski, Jarosław; Buckley, Patrick G.; Pęksa, Rafał; Śrutek, Ewa; Szylberg, Łukasz; Hartman, Johan; Jankowski, Michał; Zegarski, Wojciech; Tiemann-Boege, Irene; Dumanski, Jan P.; Piotrowski, Arkadiusz

doi:10.1038/s41523-022-00443-9

Download PDF

Article
Open access
Published: 29 June 2022

High prevalence of somatic PIK3CA and TP53 pathogenic variants in the normal mammary gland tissue of sporadic breast cancer patients revealed by duplex sequencing

Anna Kostecka ORCID: orcid.org/0000-0001-5705-0795^1,2^na1,
Tomasz Nowikiewicz^3,4^na1,
Paweł Olszewski²,
Magdalena Koczkowska^1,2,
Monika Horbacz ORCID: orcid.org/0000-0003-1644-2957²,
Monika Heinzl⁵,
Maria Andreou ORCID: orcid.org/0000-0002-2197-597X²,
Renato Salazar ORCID: orcid.org/0000-0001-8436-9304⁵,
Theresa Mair⁵,
Piotr Madanecki¹,
Magdalena Gucwa¹,
Hanna Davies⁶,
Jarosław Skokowski ORCID: orcid.org/0000-0002-3079-3502⁷,
Patrick G. Buckley⁸,
Rafał Pęksa⁹,
Ewa Śrutek³,
Łukasz Szylberg^10,11,
Johan Hartman ORCID: orcid.org/0000-0002-6500-8527^12,13,14,
Michał Jankowski³,
Wojciech Zegarski³,
Irene Tiemann-Boege⁵,
Jan P. Dumanski^2,6 &
…
Arkadiusz Piotrowski ORCID: orcid.org/0000-0002-0823-0607^1,2

npj Breast Cancer volume 8, Article number: 76 (2022) Cite this article

2680 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The mammary gland undergoes hormonally stimulated cycles of proliferation, lactation, and involution. We hypothesized that these factors increase the mutational burden in glandular tissue and may explain high cancer incidence rate in the general population, and recurrent disease. Hence, we investigated the DNA sequence variants in the normal mammary gland, tumor, and peripheral blood from 52 reportedly sporadic breast cancer patients. Targeted resequencing of 542 cancer-associated genes revealed subclonal somatic pathogenic variants of: PIK3CA, TP53, AKT1, MAP3K1, CDH1, RB1, NCOR1, MED12, CBFB, TBX3, and TSHR in the normal mammary gland at considerable allelic frequencies (9 × 10⁻²– 5.2 × 10⁻¹), indicating clonal expansion. Further evaluation of the frequently damaged PIK3CA and TP53 genes by ultra-sensitive duplex sequencing demonstrated a diversified picture of multiple low-level subclonal (in 10⁻²–10⁻⁴ alleles) hotspot pathogenic variants. Our results raise a question about the oncogenic potential in non-tumorous mammary gland tissue of breast-conserving surgery patients.

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Massively parallel screen uncovers many rare 3′ UTR variants regulating mRNA abundance of cancer driver genes

Article Open access 18 April 2024

High-throughput evaluation of genetic variants with prime editing sensor libraries

Article Open access 12 March 2024

Introduction

Breast cancer affects 24% of women worldwide and is the leading cause of cancer-related deaths in women¹. Most breast cancer cases (85–90%) are not associated with inherited mutations of high penetrance genes, such as BRCA1 (MIM *113705) or BRCA2 (MIM *600185)^2,3. High throughput genomics technologies have highlighted the molecular complexity of breast tumors which has led to the molecular classification of four clinically meaningful subtypes: Luminal A, Luminal B, HER2-enriched and basal-like^4,5. Large cohort studies of breast tumor samples identified somatic driver mutations in key breast cancer-associated genes, such as PIK3CA (MIM *171834), TP53 (MIM *191170), MAP3K1 (MIM *600982), CDH1 (MIM *192090), AKT1 (MIM *164730), CBFB (MIM *121360), TBX3 (MIM *601621), RB1 (MIM *614041)^6,7,8. To date, the identification of somatic driver pathogenic variants has been inferred only from tumors, without providing information on the mutational landscape and allelic frequencies of specific variants in the tissue of cancer origin, i.e., normal tissue of the mammary gland. This is highly relevant as under physiological conditions mammary gland tissue is mitotically stimulated by hormones and undergoes cycles of intense proliferation and remodeling during puberty, pregnancy, and lactation⁹. During life, the mammary gland is exposed to estrogen and its metabolites that damage DNA by single- and double-strand breaks, mutations or, the formation of depurinating adducts^10,11,12. These stress conditions can promote the accumulation of post-zygotic, somatic genetic alterations that create the risk of malignant transformation. Indeed, several studies, including ours, have identified such changes in the uninvolved mammary gland of breast cancer patients that is defined as histologically normal glandular tissue, distant from the primary tumor site^13,14,15. The most pronounced genetic alterations were identified in the normal tissue from mastectomy patients that per se did not have direct clinical implications, as this affected tissue was removed completely during surgery, but might suggest an increased mutational load in the second breast. At the same time, current clinical management of breast cancer includes breast-conserving surgery (BCS) - removing the tumor and sparing normal breast tissue as one of the recommended treatments^16,17. The presumed presence of pathogenic genetic alterations in the seemingly normal mammary gland tissue that is not removed during BCS might create a risk of recurrence and can affect future treatment.

Hence, we aimed to screen at unprecedented sensitivity for the presence of subclonal somatic pathogenic genetic alterations in breast cancer-related genes in the normal mammary gland of sporadic cancer patients (study overview in the Supplementary Fig. 1).

Our study demonstrates that structural chromosomal aberrations and clearly pathogenic point variants in crucial breast cancer driver genes are frequent in the normal mammary glandular tissue that remains after breast-conserving surgery.

Results

Patterns of chromosomal aberrations

We carried out analysis of chromosomal rearrangements with SNP arrays to detect DNA copy number alterations (CNAs) as well as copy number neutral loss-of-heterozygosity events via mitotic recombination. In addition to matched samples of normal uninvolved mammary gland (UM) and primary tumor (PT), we included normal mammary gland samples from 26 age-matched women that underwent breast reduction surgery and served as the control group (Supplementary Fig. 2). Spectrum of CNAs in the studied cohort is presented on Fig. 1. Hierarchical clustering revealed two clusters with PT-only and control-only samples and four additional clusters with mixed sample distribution (Supplementary Fig. 3). We also carried out cross analysis of CNAs type, size and number between the studied sample groups. The PTs stand out in this comparison (Wilcoxon test, p = 0,0094), with slight differences between normal mammary tissue from breast cancer patients and the control cohort. Nonetheless, per individual basis, total number of CNAs, the number of gains, the size of deletions, and size of CNAs in general were the discriminating features between the normal mammary tissue from breast cancer patients and the control cohort, surprisingly suggesting more heterogeneous nature of the control samples (Supplementary Fig. 4).

**Fig. 1: Summary of Copy Number Alterations (CNAs) detected in the studied cohort.**

We identified recurrent chromosomal aberrations in UMs from sporadic breast cancer patients, such as loss of 1p, 16p11.2, and 9p21.3, and 3q25.3, 4q13.1, 8q, and 20q gains, in line with previous studies^5,18. Presence of loss of heterozygosity (LOH) at chromosome 8p, associated with poor outcome in breast cancer, was observed in matched UMs and PTs, but also in the normal mammary gland tissue of healthy controls¹⁹. We observed additional events that frequently accompany 8p LOH, in the UMs: 9p loss and 8q gain. ERBB2 gains were observed exclusively in PT samples, except for one control mammary gland sample.

Subclonal somatic pathogenic variants in breast cancer driver genes present in the normal mammary gland tissue

We applied targeted DNA sequencing to identify variants in sets of UM, BL, and PT samples of 52 individuals diagnosed with sporadic breast cancer to distinguish germline and post-zygotic mutations (Supplementary Table 1, Supplementary Table 2).

Four individuals (4/52, 7.7%) were heterozygous for a constitutional pathogenic variant of a known breast cancer-associated gene, i.e. c.5179 A > T (p.Lys1727Ter) and c.181 T > G (p.Cys61Gly) in the BRCA1 gene, c.509_510del (p.Arg170fs) and c.354del (p.Thr119fs) in the PALB2 and RAD50 genes, respectively (Supplementary Table 3). These results correspond to similar rates from other studies where up to 10% of reportedly sporadic cases turns out hereditary after molecular testing^5,7. Individuals with germline pathogenic variants were excluded from further analysis, resulting in a total of 48 clearly sporadic breast cancer patients. Constitutional variants of breast cancer-associated genes are listed in the Supplementary Table 3.

The summary of somatic variants fulfilling the cut-off criteria detected in known breast cancer-associated and candidate breast cancer-associated genes is provided in Supplementary Tables 4 and 5, respectively. We identified 15 somatic pathogenic, likely pathogenic variants or variants of uncertain significance with predicted deleterious effect on the encoded protein in the normal mammary gland tissue of 19% (9/48) of patients (Fig. 2). The affected genes are tumor suppressors (TP53⁵, RB1²⁰, CDH1²¹), oncogenes (PIK3CA²²), regulate cell death (MAP3K1²³), DNA repair (AKT1²⁴, RAD50²⁵), translation (CBFB²⁶), gene expression (MED12²⁷, TSHR²⁸) and chromatin remodeling (NCOR1⁶). A detailed description of these genes in the context of breast cancer is provided in Supplementary Tables 6, 7 and Supplementary Fig. 8. All of these variants except PIK3CA c.3140 A > G (p.His1047Arg) were detected in BCS patients, in samples from the tissue portion that was not qualified for surgical resection.

**Fig. 2: Somatic variants detected in the uninvolved mammary gland (UM).**

Heterogeneity of PIK3CA and TP53 pathogenic variants revealed in the normal mammary gland tissue

Two driver genes dominate across all subtypes of invasive breast cancer: PIK3CA and TP53⁵. PIK3CA encodes the catalytically active p100alpha isoform that regulates cell proliferation and growth receptor signaling cascade. Activating PIK3CA point variants are the most prevalent in breast tumors and were confirmed to lead to malignant transformation^22,29. We detected four hotspot PIK3CA somatic variants in the uninvolved mammary gland, all of them have been described in the COSMIC database and reported in breast tumors (Fig. 2, Table 2, Supplementary Fig. 5). TP53 tumor suppressor acts as a transcription factor and is frequently inactivated in human malignancies, mostly through loss-of-function TP53 variants^30,31,32. We detected an Ile195Thr hotspot variant in the uninvolved mammary gland that affects the central DNA-binding domain (Fig. 2, Table 2, Supplementary Fig. 5).

To enhance the sensitivity and accuracy of rare variant detection, we employed duplex sequencing (Supplementary Fig. 7). We selected four individuals: P10, P28, P51, and P52 based on the presence of PIK3CA and TP53 hotspot variants in PT samples according to standard NGS data (Fig. 3) and screened for variants in the normal mammary gland samples with high sensitivity duplex NGS sequencing. Ultra-deep targeted duplex sequencing of PIK3CA detected low-level subclonal pathogenic variants: c.1093 G > A (p.Glu365Lys), c.1358 A > G (p.Glu453Gly), c.1633G > A (p.Glu545Lys) c.1634A > C (p.Glu545Ala), c.2164 G > A (p.Glu722Lys), c.3140 A > G (p.His1047Arg), in the uninvolved mammary gland samples of three individuals. The detected variants were located in the known PIK3CA hotspot regions, reported in breast tumors in the COSMIC database and functionally confirmed to affect PIK3CA function^7,22 (Fig. 3, Supplementary Table 8). A screen for TP53 variants not only confirmed the presence of His168Leu variant, but also revealed additional hotspot variants: c.527 G > T (p.Cys176Phe), c.701 A > G (p.Tyr234Cys), c.733 G > A (p.Gly245Ser), c.745 A > T (p.Arg249Trp), c.818 G > A (p.Arg273His), c.839 G > C (p.Arg280Thr). Importantly, all these pathogenic variants are located in the central DNA-binding domain indispensable for p53 tumor-suppressive function^7,32 (Fig. 3, Supplementary Table 8).

**Fig. 3: Somatic *PIK3CA* and *TP53* variants detected in the uninvolved mammary gland (UM) and primary tumor (PT) samples.**

Discussion

Post-zygotic variations contribute to the genetic heterogeneity of an individual, which is reflected in a mosaic pattern of genetic alterations in all cells that make up the human body³³. The mammary gland remains mitotically active during life and under physiological conditions is exposed to DNA-damaging estrogen metabolites¹¹. Subclonal somatic genetic changes acquired during life pose a risk of cancer development. Hence, we hypothesized that these factors can increase the mutational burden in the mammary gland. Other studies have reported the presence of genomic and transcriptomic changes in the normal mammary gland, and suggested that histological normalcy does not exclude pathological biological changes^34,35,36. However, these studies have been carried out on normal mammary tissue obtained from mastectomies or cancer-adjacent samples, hence the clinical relevance of the these findings was limited. In this study, we screened for somatic genetic changes in the normal mammary gland tissue of sporadic cancer patients, including tissue biopsies from the parts of the breast that normally would not have been removed during breast-conserving surgery. We identified widespread genomic structural rearrangements that affect gene dosage and somatic subclonal sequence variants of known breast cancer-associated genes that control proliferation, cell death, metastasis, and genome integrity: PIK3CA, TP53, AKT1, MAP3K1, CDH1, RB1, NCOR1, MED12, CBFB, TBX3, and TSHR (Supplementary Fig. 8). These variants were present in a considerable percentage of cells, suggesting they occurred earlier in the mammary gland development or the carrier cells gained growth advantage and underwent clonal expansion. Further, ultra-sensitive duplex sequencing revealed heterogenous mosaic landscape of low-level subclonal pathogenic variants of main breast cancer drivers: PIK3CA and TP53 in the normal mammary gland tissue. Notably, the setup of these variants was markedly different between tumor and normal mammary tissue from the same individuals which is suggestive of multiple, independent mutational events that occurred in the mammary gland (Fig. 4).

**Fig. 4: Oncogenic potential of the normal mammary tissue.**

In parallel to sequence variants, we identified recurrent CNAs in the mammary gland of breast cancer patients, but also in the age-matched control group (Fig. 1). This facilitated detecting subtle, but noticeable differences in terms of total number and length of all detected CNAs per individual (Supplementary Fig. 4). Both groups: breast cancer and control were age-matched and therefore the mammary gland tissue was exposed to cycles of estrogen for comparable time and that can explain the accumulation of copy number alterations in both cohorts.

The most important finding from this part of our study is that the normal mammary tissue from cancer patients showed DNA copy number alterations as well as evidence of copy number neutral loss-of-heterozygosity. These genomic alterations in concert with damaging sequence variants recapitulate alternative routes of gene inactivation that are typically observed in the malignant tumors, but not in the benign tissue. In this context, our study demonstrates that normal tissue profiling provides direct information on the very origin of the disease and may improve the choice of treatment as well as may aid in further clinical management of the affected individuals^37,38,39. This is in contrast to typical molecular profiling studies that rely on limited retrospective information inferred from the tumors.

The PIK3CA and TP53 genes are the leading oncogenic mutations of breast malignancies and accordingly the most common changes detected in our study were in the PIK3CA gene^5,40. Soysal et al. screened for somatic variants in benign biopsies of patients that subsequently developed breast cancer. PIK3CA and TP53 variants were the most prevalent changes in tumor samples, but not detected in benign biopsies, possibly due to limited sensitivity of standard massively parallel sequencing for rare variant detection⁴¹. To overcome this limitation, we implemented duplex sequencing technology to detect PIK3CA and TP53 variants in the normal mammary gland samples at very low frequency. In the uninvolved mammary gland tissue, we detected known hotspot pathogenic variants that might activate PIK3CA kinase or target DNA-binding domain of TP53 tumor suppressor, disabling its function.

We confirmed that these variants observed in tumor samples were already present in the normal glandular tissue as well, albeit at lower levels compared to the corresponding tumors. Strikingly these changes were accompanied in the same samples by other PIK3CA and TP53 pathogenic variants, present in the normal tissue, but not in the corresponding tumors. This may suggest the existence of potential sites of secondary tumor formation. Notably, the majority of somatic pathogenic variants, including these PIK3CA and TP53 hotspot alterations, occurred in the normal mammary gland samples not removed during breast-conserving surgery, not from radical mastectomy patients.

At the same time PIK3CA and TP53 variant spectra in the normal glandular tissue were more similar to the ones reported in cancer-oriented database (COSMIC) than those in general population (gnomAD), suggesting that the studied UM tissues reflect the repertoire of somatic variants seen in tumor samples (Supplementary Fig. 9, Supplementary Fig. 10, Supplementary Table 9). However, given the limited number of four individuals included in duplex sequencing analysis, these conclusions should be interpreted with caution. Further studies on a larger well-characterized cohort of sporadic breast cancer patients are needed for understanding how specific variants arise and expand during life. Nevertheless, we demonstrate here that ultra-sensitive duplex sequencing approach might be beneficial to detect very low-level frequency somatic mosaicism in different tissue samples, with its potential clinical implications in terms of molecular diagnostics and prognosis.

After surgical intervention, breast cancer patients remain under clinical surveillance with recommended yearly mammogram and physical examination every 3–4 months for the first two years after surgery⁴². The current diagnostic approach has been focused mainly on the identification of constitutional pathogenic variants in known breast cancer-associated genes to catch early these individuals who are in a higher risk of breast cancer development and/or to whom the personalized targeted therapy could be offered. However, over 80% of all breast cancer cases are not associated with inherited changes¹⁷.

Our results demonstrate a complex landscape of mutational burden in the seemingly normal mammary glandular tissue and indicate an oncogenic potential of the tissue not removed during surgery. This study provides a rationale for thorough genetic and clinical surveillance of sporadic breast cancer patients that underwent breast-conserving surgery. Including molecular evaluation of the normal glandular tissue of sporadic breast cancer patients could be beneficial for personalized patient care.

Methods

Patient samples and DNA isolation

We analyzed samples from 52 patients diagnosed with reportedly sporadic breast cancer with an emphasis on breast-conserving surgery (2/3 of the patients studied) and who did not receive neoadjuvant therapy. Altogether a total of 204 uninvolved mammary gland (UM), primary tumor (PT), skin (SK), and peripheral blood (BL) samples were collected via the Oncology Centre in Bydgoszcz and the University Clinical Centre in Gdansk, with the approval of bioethics committee at Medical University of Gdansk (MUG). We have obtained written informed consent from all participants. PT, UM, SK, and BL samples from each patient were collected and stored in −80 °C upon DNA isolation. The overview of sample processing workflow is presented in the Supplementary Fig. 1. The histological subtypes and tumor tissue content of each PT sample were evaluated by pathologists according to the current American Joint Committee on Cancer guidelines⁴³. Tumor samples with less than 50% of neoplastic cell content were excluded. The normal mammary gland was sampled preferably from the opposite quadrant relative to the primary tumor site, with a mandatory cut-off criterion of at least 3 cm in each case, to exclude potential contamination with residual tumor cells. These tissue samples were also evaluated by pathologists to confirm normal histology (Table 1, Supplementary Table 1). All normal mammary gland samples from patients who underwent breast-conserving surgery were derived from the portion of tissue that remained intact in the patient body after breast-conserving surgery. Solid tissues were homogenized in a lysis buffer, then Proteinase K was added and samples were incubated at 55 °C for 48 h. DNA isolation from UM, PT, and SK tissue lysates was performed by phenol–chloroform extraction as previously described¹³. Blood DNA extraction was performed with the QIAamp DNA Blood Mini Kit according to the manufacturer’s protocol (Qiagen, Germantown, MD).

Table 1 Summarized clinicopathological features of sporadic breast cancer patient cohort.

Full size table

Copy number alteration detection

SNP array genotyping was performed for UM and PT samples on an Illumina Infinium Global Screening Array, according to the manufacturer’s recommendations (Illumina, San Diego, CA). SNP genotyping data from mammary gland tissues of 26 age-matched women that underwent breast reduction surgery were used as control samples (Supplementary Fig. 2). Genotyping data was analyzed using Nexus Copy Number software version 10.0 (BioDiscovery). Quality control of samples was performed as described previously^14,44. Briefly, samples with Log R Ratio (LRR) sd > 0.2 were flagged as poor quality and excluded from the analysis. The analysis was performed with default settings except that significance threshold for Copy Number Alterations (CNA) calling was decreased to 5*10⁻¹³- (default 5*10⁻⁷), minimal number of probes per segment was increased to 10 (default 3), gain threshold was set to 0.49 and 0.14 which corresponds to approximately 40% and 10% change for a high gain and gain respectively (the default is 0.41 and 0.06 for a high gain and gain), the loss threshold was set to −0.16 and −0.74 what corresponds to approximately −10% and −40% change for a loss and high loss respectively (the default is −0.09 and −1.1 for a loss and high loss). Hierarchical clustering was performed using the Ward2 algorithm⁴⁵.

Statistical analysis

All statistical analyses were carried out using R version 3.6.2 and package stats. Packages pheatmap and ggpubr were used for plotting. Statistical significance of differences between two groups was tested using the Mann–Whitney U test. Differences were considered significant at a two-sided p < 0.05.

Targeted DNA resequencing

Targeted DNA sequencing panel was designed with Roche NimbleDesign online tool (Roche, https://hyperdesign.com/). The panel included exons with + /- 50 kbp flanking regions of 542 genes selected based on in-house database and literature research (Supplementary Table 2). Sequencing libraries were prepared for sets of UM, BL, and PT samples with the capture-based Roche SeqCap EZ system according to the manufacturer’s protocol (Roche, Pleasanton, CA), followed by 150 bp paired-end sequencing performed on Illumina NextSeq550 and MiniSeq instruments (Illumina, San Diego, CA). Sequencing read alignment to the human reference genome (hg38) was performed with the Burrows–Wheeler transform aligner (http://bio-bwa.sourceforge.net/)⁴⁶. Platypus v.0.8.1.1 (https://www.rdm.ox.ac.uk/research/lunter-group/lunter-group/) was used for variant calling⁴⁷. Variants with poor mapping quality (<30), variants supported by high-quality bases (≥30) in fewer than five reads, and variants outside the targeted regions were excluded from analysis. Variants were annotated with VarAFT (version 2.17-2) software⁴⁸.

For variant selection, only variants with sequencing depth ≥ 30 and tissue allele frequency ≥ 0.07 were included in the analysis. All truncating variants were included. For non-truncating variants, the following criteria were used: variants were filtered by their clinical significance as reported in the ClinVar database (as of June 2021), variants classified as Pathogenic, Likely Pathogenic, Conflicting interpretations of pathogenicity, risk factor, and drug response were included in the study. The remaining non-truncating variants were included based on their frequency in the general population: variants with minor allele frequency (MAF) ≤ 0.001 across all gnomAD populations (“popmax”) or not noted in the database were included. For in silico splicing analysis splice prediction algorithms, i.e. SSF, MaxEntScan, and NNSplice, embedded in Alamut Visual software (version 2.14) were used. Variants described in this study were classified according to the American College of Medical Genetics and Genomics and the Association for Molecular Pathology recommendations⁴⁹. Based on literature^2,7,30,50,51 we selected 155 breast cancer-associated genes that were the primary focus of variant analysis (Supplementary Table 2). Somatic variants presented in Fig. 2 and Table 2 were confirmed by Sanger sequencing or High Resolution Melting analysis (Supplementary Fig. 5). Lollipop plots with variant demonstration were prepared based on images generated with the Protein paint application⁵².

Table 2 Pathogenicity classification of somatic variants detected in the uninvolved mammary gland (UM) samples.

Full size table

Duplex sequencing

UM, PT, BL, and SK samples of four individuals (P10, P28, P51, and P52) were selected for detection of variants by duplex sequencing based on the presence of PIK3CA or TP53 hotspot variants in PT, but not UM tissue, according to standard NGS. The protocols used here are based on the ones described in more detail in Salazar et al.⁵³.

Random DNA shearing and size selection

DNA was ultrasonicated for 10 min at ≤10 °C using a Bandelin Sonorex Super RK 102 H Ultrasonic bath ending up with a fragment size distribution of, on average, 275 bp. A double-size selection was performed using Sera-Mag Select beads (Cytiva) in order to exclude fragments outside a range of 100-400 bp. The size selection was performed in 50 µl of sonicated DNA (2 µg), 20 µl 10x CutSmart buffer (NEB), 47.6 µl PCR grade water with 0.7 volumes beads. The reaction was mixed by pipetting thoroughly and incubated at room temperature (RT) for 10 min. Tubes were then placed on a magnet for 5 min and 190 µl of supernatant was transferred to a fresh tube. Next, 2.5 volumes of beads in total considering the initial bead solution was added to the solution and mixed by pipetting. The mixture was incubated at RT for 10 min. Tubes were placed on a magnet and supernatant was discarded. The beads were washed twice with 80% ethanol, air dried at room temperature and 23 µl of PCR grade water was added to resuspend by pipetting. After incubating at RT for 5 min, the dissolved beads were allowed to stand at RT for 5 min, placed on a magnet and the clear supernatant containing the size-selected DNA was transferred to a new tube.

End-repair, A-tailing, adapter ligation, and bead purification

Size selected genomic DNA was end-repaired and A-tailed using the NEBNext® Ultra™ II End Repair/dA-Tailing Module (New England Biolabs) according to the manufacturer’s instructions followed by adapter ligation with the NEBNext® Ultra™ II Ligation Module (New England Biolabs) following the manufacturer’s instructions. The adapters ligated to the A-tailed DNA were synthesized as previously described (Adapter 2)⁵³. The ligation reaction was then purified using 1.2 volumes of Sera-Mag Select beads (Cytiva). A total of 96.5 µl sample was thoroughly mixed with 115.8 µl beads by pipetting and incubated at RT for 10 min. Tubes were placed on a magnet and the supernatant was discarded. The beads were washed twice with 80% ethanol. Next, the beads were dried at room temperature and 23 µl of PCR grade water was added to resuspend by pipetting. After incubating the dissolved beads at RT for 5 min they were placed on a magnet and the clear supernatant containing the purified DNA was transferred to a fresh tube.

Pre-capture amplification

Ligated fragments were amplified with KAPA HiFi HotStart ReadyMix PCR Kit (KAPA Biosystems). Reaction components, primer sequences, and cycling conditions are listed in the Supplementary Table 10. For libraries with input DNA higher than 240 ng, two parallel reactions were prepared and pooled in the end, just before purification. The first step of amplification was 6 or 12 cycles of single primer extensions followed by the addition of the primer NEBNext Universal and a standard PCR amplification of 2 cycles. PCR products were purified with 1.2 volumes Sera-Mag Select beads as described above, followed by two rounds of targeted capture steps to enrich the templates of interest.

Targeted captures and post-capture amplification

Two rounds of targeted captures followed by PCR amplification were performed as described in Salazar et al., with minor modifications on the post-capture amplification (Supplementary Table 10)⁵³. The biotinylated probes used to target exonic regions of TP53, and PIK3CA are detailed on Supplementary Table 10.

Duplex sequencing data analysis

FastQ files were analyzed with Galaxy platform (available on a private server provided by the Medical University of Gdansk) and first processed by the tool Trim Galore! to trim Illumina-specific adapter sequences including the barcode and spacer sequence at the 3' end of the raw reads. Next, the reads were analyzed according to a duplex sequencing (DS) specific pipeline that includes an error correction tool⁵⁴. After creating the duplex consensus sequence (DCS), a trimming step of 5 nucleotides from both 5' and 3' end was included. The trimmed consensus sequences were then aligned by BWA-MEM and BamLeftAlignIndels to the human genome assembly hg38. To avoid false-positive variants that would occur within any partial adapter sequences and barcodes at the 3' end of the consensus sequence and were not removed by the first adapter trimming step, the tool clipOverlap from the package BamUtil was applied. Variant calling was then performed by the variant caller LoFreq. Finally, the variants (substitutions only) were further inspected and assigned to tiers using the Variant Analyzer⁵⁵. Variants with DCS coverage below 500 and variants outside the probe regions were discarded from our analysis and only Tier 1 variants were kept, together with Tier 2 that were detected more than once. For more details on this analysis see Povysil et al.⁵⁵. The full Galaxy workflow is publicly available: https://usegalaxy.org/u/jku-itb-lab/w/gdansk-paper---galaxy-workflow.

The variant frequency was calculated by dividing the number of DCS calling the variant by the DCS coverage at the position of the variant within the library it was detected. The variant frequency was calculated by the count for each alteration type (e.g. A > C) divided by the frequency of the sequenced reference allele (e.g., frequency of A’s in the reference sequence multiplied by the sum of the mean DCS coverage for that library). The relative count is the count for each variant type divided by the sum of all occurring variants within the tissue.

Data availability

Raw microarray, NGS and duplex sequencing data are available upon request in the EGA archive, study ID EGAS00001005698.

References

Heer, E. et al. Global burden and trends in premenopausal and postmenopausal breast cancer: a population-based study. Lancet Glob. Heal 8, e1027–e1037 (2020).
Article Google Scholar
Coughlin, S. S. Epidemiology of breast cancer in women. Adv. Exp. Med. Biol. 1152, 9–29 (2019).
Article CAS PubMed Google Scholar
Kleibl, Z. & Kristensen, V. N. Women at high risk of breast cancer: molecular characteristics, clinical presentation and management. Breast 28, 136–144 (2016).
Article PubMed Google Scholar
Sorlie, T. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS 98, 10869–10874 (2001).
Article CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article CAS Google Scholar
Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Article CAS PubMed PubMed Central Google Scholar
Macias, H. & Hinck, L. Mammary gland development. Wiley Interdiscip. Rev. Dev. Biol. 1, 533–557 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dall, G. V. & Britt, K. L. Estrogen effects on the mammary gland in early and late life and breast cancer risk. Front. Oncol. 7, 1–10 (2017).
Article Google Scholar
Almeida, M., Soares, M., Fonseca-Moutinho, J., Ramalhinho, A. C. & Breitenfeld, L. Influence of estrogenic metabolic pathway genes polymorphisms on postmenopausal breast cancer risk. Pharmaceuticals 14, 1–9 (2021).
Article CAS Google Scholar
Yager, J. D. & Davidson, N. E. Estrogen carcinogenesis in breast cancer. N. Engl. J. Med. 354, 270–282 (2006).
Article CAS PubMed Google Scholar
Ronowicz, A. et al. Concurrent DNA copy-number alterations and mutations in genes related to maintenance of genome stability in uninvolved mammary glandular tissue from breast cancer patients. Hum. Mutat. 36, 1088–1099 (2015).
Article CAS PubMed Google Scholar
Forsberg, L. A. et al. Signatures of post-zygotic structural genetic aberrations in the cells of histologically normal breast tissue that can predispose to sporadic breast cancer. Genome Res. 25, 1521–1535 (2015).
Article CAS PubMed PubMed Central Google Scholar
Danforth, D. N. Genomic changes in normal breast tissue in women at normal risk or at high risk for breast cancer. Breast Cancer Basic Clin. Res. 10, 109–146 (2016).
Article Google Scholar
Waks, A. G. & Winer, E. P. Breast cancer treatment: a review. JAMA -J. Am. Med. Assoc. 321, 288–300 (2019).
Article CAS Google Scholar
Loibl, S., Poortmans, P., Morrow, M., Denkert, C. & Curigliano, G. Breast cancer. Lancet 397, 1750–1769 (2021).
Article CAS PubMed Google Scholar
Parris, T. Z. et al. Clinical implications of gene dosage and gene expression patterns in diploid breast carcinoma. Clin. Cancer Res. 16, 3860–3874 (2010).
Article CAS PubMed Google Scholar
Cai, Y. et al. Loss of chromosome 8p governs tumor progression and drug response by altering lipid metabolism. Cancer Cell 29, 751–766 (2016).
Article CAS PubMed Google Scholar
Witkiewicz, A. K. & Knudsen, E. S. Retinoblastoma tumor suppressor pathway in breast cancer: prognosis, precision medicine, and therapeutic interventions. Breast Cancer Res. 16, 207 (2014).
Article PubMed PubMed Central Google Scholar
Christgen, M. et al. Lobular breast cancer: clinical, molecular and morphological characteristics. Pathol. Res. Pract. 212, 583–597 (2016).
Article CAS PubMed Google Scholar
Martínez-Saéz, O. et al. Frequency and spectrum of PIK3CA somatic mutations in breast cancer. Breast Cancer Res. 22, 1–9 (2020).
Article CAS Google Scholar
Pham, T. T., Angus, S. P. & Johnson, G. L. MAP3K1: Genomic alterations in cancer and function in promoting cell survival or apoptosis. Genes Cancer 4, 419–426 (2013).
Article PubMed PubMed Central CAS Google Scholar
Plo, I. et al. AKT1 inhibits homologous recombination by inducing cytoplasmic retention of BRCA1 and RAD5. Cancer Res. 68, 9404–9412 (2008).
Article CAS PubMed Google Scholar
Fagan-Solis, K. D. et al. A P53-independent DNA damage response suppresses oncogenic proliferation and genome instability. Cell Rep. 30, 1385–1399.e7 (2020).
Article CAS PubMed PubMed Central Google Scholar
Malik, N. et al. The transcription factor CBFB suppresses breast cancer through orchestrating translation and transcription. Nat. Commun. 10, 1–15 (2019).
Article CAS Google Scholar
Chang, H. Y. et al. MED12, TERT and RARA in fibroepithelial tumours of the breast. J. Clin. Pathol. 73, 51–56 (2020).
Article CAS PubMed Google Scholar
Liu, Y. C., Yeh, C. T. & Lin, K. H. Molecular functions of thyroid hormone signaling in regulation of cancer progression and anti-apoptosis. Int. J. Mol. Sci. 20, 1–27 (2019).
Article Google Scholar
Thorpe, L. M., Yuzugullu, H. & Zhao, J. J. PI3K in cancer: divergent roles of isoforms, modes of activation and therapeutic targeting. Nat. Rev. Cancer 15, 7–24 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vogelstein, B. et al. Cancer genome landscapes. Science 340, 1546–1558 (2013).
Article CAS Google Scholar
Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Article CAS Google Scholar
Baugh, E. H., Ke, H., Levine, A. J., Bonneau, R. A. & Chan, C. S. Why are there hotspot mutations in the TP53 gene in human cancers? Cell Death Differ. 25, 154–160 (2018).
Article CAS PubMed Google Scholar
Mustjoki, S. & Young, N. S. Somatic mutations in “benign” disease. N. Engl. J. Med. 384, 2039–2052 (2021).
Article CAS PubMed Google Scholar
Gadaleta, E. et al. Characterization of four subtypes in morphologically normal tissue excised proximal and distal to breast cancer. npj Breast Cancer 6, 38 (2020).
Article CAS PubMed PubMed Central Google Scholar
Aran, D. et al. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat. Commun. 8, 1–13 (2017).
Article CAS Google Scholar
Troester, M. A. et al. DNA defects, epigenetics, and gene expression in cancer-adjacent breast: a study from the cancer genome atlas. npj Breast Cancer 2, 16007 (2016).
Article PubMed PubMed Central Google Scholar
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Article CAS PubMed Google Scholar
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).
Article CAS PubMed Google Scholar
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Article CAS PubMed Google Scholar
Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705.e9 (2018).
Article CAS PubMed PubMed Central Google Scholar
Soysal, S. D. et al. Genetic alterations in benign breast biopsies of subsequent breast cancer patients. Front. Med. 6, 1–6 (2019).
Article Google Scholar
Gradishar, W. J. et al. NCCN clinical practice guidelines in Oncology. Breast Cancer Version 4. 2021. Natl. Compr. Cancer Netw. 16, 310–320 (2021).
Amin, M. B., et al. AJCC Cancer Staging Manual (Springer International Publishing, 2017).
Rydzanicz, M. et al. Variable degree of mosaicism for tetrasomy 18p in phenotypically discordant monozygotic twins—diagnostic implications. Mol. Genet. Genom. Med. 9, 1–9 (2021).
Google Scholar
Murtagh, F. & Legendre, P. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31, 274–295 (2014).
Article Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Article CAS PubMed PubMed Central Google Scholar
Desvignes, J. P. et al. VarAFT: A variant annotation and filtration system for human next generation sequencing data. Nucleic Acids Res 46, W545–W553 (2018).
Article CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Polyak, K. & Metzger Filho, O. SnapShot: breast cancer. Cancer Cell 22, 562–562.e1 (2012).
Article CAS PubMed Google Scholar
Mahdavi, M. et al. Hereditary breast cancer; genetic penetrance and current status with BRCA. J. Cell. Physiol. 234, 5741–5750 (2019).
Article CAS PubMed Google Scholar
Zhou, X. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat. Genet. 48, 4–6 (2015).
Article CAS Google Scholar
Salazar, R. et al. Discovery of an unusually high number of de novo mutations in sperm of older men using duplex sequencing. Genome Res. 32, 499–511 (2022).
Article PubMed PubMed Central Google Scholar
Stoler, N. et al. Family reunion via error correction: an efficient analysis of duplex sequencing data. BMC Bioinform. 21, 96 (2020).
Article Google Scholar
Povysil, G. et al. Increased yields of duplex sequencing data by a series of quality control tools. NAR Genom Bioinform. 3, lqab002 (2021).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Center, Poland grant (award no. UMO-2015/19/B/NZ2/03216) to A.P. and partially funded by the Foundation for Polish Science (FNP) under the International Research Agendas Program (grant number MAB/2018/6) to J.P.D. and A.P., co-financed by the European Union under the European Regional Development Fund.

Author information

These authors contributed equally: Anna Kostecka, Tomasz Nowikiewicz.

Authors and Affiliations

Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
Anna Kostecka, Magdalena Koczkowska, Piotr Madanecki, Magdalena Gucwa & Arkadiusz Piotrowski
3P Medicine Lab, Medical University of Gdansk, Gdansk, Poland
Anna Kostecka, Paweł Olszewski, Magdalena Koczkowska, Monika Horbacz, Maria Andreou, Jan P. Dumanski & Arkadiusz Piotrowski
Department of Surgical Oncology, Ludwik Rydygier’s Collegium Medicum UMK, Bydgoszcz, Poland
Tomasz Nowikiewicz, Ewa Śrutek, Michał Jankowski & Wojciech Zegarski
Department of Breast Cancer and Reconstructive Surgery, Prof. F. Lukaszczyk Oncology Center, Bydgoszcz, Poland
Tomasz Nowikiewicz
Institute of Biophysics, Johannes Kepler University, Linz, Austria
Monika Heinzl, Renato Salazar, Theresa Mair & Irene Tiemann-Boege
Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Hanna Davies & Jan P. Dumanski
Department of Surgical Oncology, Medical University of Gdansk, Gdansk, Poland
Jarosław Skokowski
Genuity Science Genomics Centre, Dublin, Ireland
Patrick G. Buckley
Department of Patomorphology, Medical University of Gdansk, Gdansk, Poland
Rafał Pęksa
Department of Tumor Pathology, Prof. F. Lukaszczyk Oncology Center, Bydgoszcz, Poland
Łukasz Szylberg
Department of Perinatology, Gynaecology and Gynaecologic, Oncology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, Bydgoszcz, Poland
Łukasz Szylberg
Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
Johan Hartman
Department of Pathology, Karolinska University Hospital, Stockholm, Sweden
Johan Hartman
MedTech Labs, Bioclinicum, Karolinska University Hospital, Stockholm, Sweden
Johan Hartman

Authors

Anna Kostecka
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Nowikiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Olszewski
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Koczkowska
View author publications
You can also search for this author in PubMed Google Scholar
Monika Horbacz
View author publications
You can also search for this author in PubMed Google Scholar
Monika Heinzl
View author publications
You can also search for this author in PubMed Google Scholar
Maria Andreou
View author publications
You can also search for this author in PubMed Google Scholar
Renato Salazar
View author publications
You can also search for this author in PubMed Google Scholar
Theresa Mair
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Madanecki
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Gucwa
View author publications
You can also search for this author in PubMed Google Scholar
Hanna Davies
View author publications
You can also search for this author in PubMed Google Scholar
Jarosław Skokowski
View author publications
You can also search for this author in PubMed Google Scholar
Patrick G. Buckley
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Pęksa
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Śrutek
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Szylberg
View author publications
You can also search for this author in PubMed Google Scholar
Johan Hartman
View author publications
You can also search for this author in PubMed Google Scholar
Michał Jankowski
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Zegarski
View author publications
You can also search for this author in PubMed Google Scholar
Irene Tiemann-Boege
View author publications
You can also search for this author in PubMed Google Scholar
Jan P. Dumanski
View author publications
You can also search for this author in PubMed Google Scholar
Arkadiusz Piotrowski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study design and conception: A.P., I.T-.B., A.K. Sample collection and preparation: T.N., M.G., H.D., J.S., E.Ś., R.P., M.J., Ł.S., W.Z., J.H. Experiments: A.K., M.H., R.S., M.A., T.M. Data analysis and interpretation: A.K., P.O., A.P., M.K., M.H., I.T.-B. Manuscript writing: A.K., A.P., M.K., I.T.-B., J.P.D. All authors have read and approved the manuscript. A.K. and T.N. contributed equally.

Corresponding authors

Correspondence to Anna Kostecka, Tomasz Nowikiewicz or Arkadiusz Piotrowski.

Ethics declarations

Competing interests

The authors declare no competing financial interests, but the following competing non-financial interests have been declared: J.P.D. is cofounder and shareholder in Cray Innovation AB.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Table 4

Supplementary Table 5

Supplementary Table 6

Supplementary Table 7

Supplementary Table 8

Supplementary Table 9

Supplementary Table 10

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kostecka, A., Nowikiewicz, T., Olszewski, P. et al. High prevalence of somatic PIK3CA and TP53 pathogenic variants in the normal mammary gland tissue of sporadic breast cancer patients revealed by duplex sequencing. npj Breast Cancer 8, 76 (2022). https://doi.org/10.1038/s41523-022-00443-9

Download citation

Received: 29 September 2021
Accepted: 10 June 2022
Published: 29 June 2022
DOI: https://doi.org/10.1038/s41523-022-00443-9

This article is cited by

Respective roles of Pik3ca mutations and cyproterone acetate impregnation in mouse meningioma tumorigenesis
- Pierre-Cyril Cômes
- Tuan Le Van
- Matthieu Peyre
Cancer Gene Therapy (2023)