Introduction

Breast cancer is a common malignant disease among women population worldwide and is characterized by the highest cancer incidence, high recurrence rate, morbidity, mortality and poor prognosis1. Globally breast cancer accounted for 11.7% of the estimated 19.3 million new cancer cases in 2020 in both the sexes and about 24.5% of all cancer cases among women2. The International agency for cancer research has reported that breast cancer ranks number one cancer worldwide with an incidence of 22.5 lakh and a mortality rate of nearly seven lakh3. The latest cancer registry of Indian Council of Medical Research has reported that breast cancer ranks number one cancer in India with an incidence rate of 25.8 per lakh, and a mortality rate of 12.7per lakh4. The breast cancer incidence cases are expected to increase by more than 46% by 20405. As per the cancer death cause analysis, it is not the primary breast tumor but metastasis to distant organs that is responsible for the death of over 90% of breast cancer patients6. Breast cancer accounts for 11.6% of distant malignancies in both men and women globally7.

The TNM (tumor, node, metastasis) classification is a globally recognised standard to assign the stages for breast cancer. Stage 0 the non-invasive; stage I, IIA, IIB, IIIA describes early breast cancer which is confined to the breast with or without axillary lymph node involvement; stage IIIB, IIIC describes locally advanced and, stage IV describes breast cancer that has spread to other parts of the body like liver, lungs, brain and bones termed as metastatic breast cancer8,9.

Axillary lymph nodes represent the main stay of lymphatic drainage from the breast, and the first lymph node among them to be affected is called the sentinel lymph node10. Sentinel lymph node biopsy (SLNB) is the standard diagnostic procedure for patients with early-stage breast cancer to assess the status of lymph node11. Identification of this node is made by injecting radioactive tracer into the breast in the vicinity of the tumour, followed by its excision, and a biopsy that reflects the histological characteristics and status of nodes in the axilla12,13. Surgeons carry out axillary lymph node dissection (ALND) in breast cancer for cases where SLNB is positive and abandon this intervention if it is negative14,15. Identification and assessment of sentinel lymph node metastasis dictates the right surgical approach adopted and the prognosis in patients with early breast cancer16.

Recently, potential biomarkers have been identified to understand lymph node metastasis in early breast cancer17,18,19,20. Unfortunately, each of these have poor sensitivity and specificity, and has therefore not been translated into a diagnostic tool as desired by the surgeon. A number of imaging techniques too have been used to pick up axillary lymph node metastasis, but these techniques are neither efficient, nor applicable during the surgical procedures21,22,23,24,25,26.

Our team has been identifying potential protein biomarkers in various clinical phenotypes in the last few years27,28. In the recent past, we have carried out a gel based proteomics experiment to identify biomarkers that can flag lymph node metastasis in early breast cancer29. In this study, we seek to carry out an isobaric label based proteomic experiment to delineate protein signatures that can accurately reflect the metastatic state of the sentinel lymph nodes in early breast cancer.

Methodology

Ethics and patient recruitment

This study was conducted after approval was obtained from the Institute Ethics Committee at All India Institute of Medical Sciences New Delhi, India (Ref. No. IECPG- 27/23.01.2019, RT-03/28.02.2019). The procedures were followed as per the ethical standards formulated in the Helinski declaration. Early breast Cancer patients were screened and admitted at the Department of Surgery. Detailed information on the study was explained to the recruited patients and informed consent was obtained from them. Bedside examination included clinical history, symptoms, signs and general examination. Patients suspected of having early breast cancer were subjected to mammography imaging. As explained later in this section, sentinel lymph node biopsy tissues from these patients were sent to Department of Pathology for histopathology, which was the gold standard for assigning the clinical phenotypes of sentinel lymph node metastasis (SLNM +) and sentinel lymph node without metastasis (SLNM-). For the discovery phase of the proteomic experiment, we took 5 patients with SLNB + , 5 patients with SLNB- and, two benign breast tumor tissues as cancer controls. For the validation phase of experiment, we took 13 patients each with SLNM + and SLNM-.

Patient inclusion and exclusion criteria

Staging of the breast cancer were determined as per to the American Joint Committee on Cancer (AJCC) cancer staging criteria. Inclusion and exclusion criteria were used just as in our previous study29. Women with early invasive intra-ductal breast cancer as per the WHO classification of the tumor, and who had not undergone any therapeutic intervention were recruited into the study. Patients with advanced breast cancer, and patients with early breast cancer who had either received chemotherapy or radiotherapy were excluded from the study.

Identification and excision of sentinel lymph node

Technetium tagged sulphur colloid was injected intra-dermally into the lower inner quadrant of the affected breast, two hours prior to the surgery29. In operation theatre, 1 ml of 1% fluorescent methylene blue dye diluted in 4 ml saline was injected at multiple sites intra-dermally around areolar region and in sub-areolar region. After five minutes of gentle massage, an incision was made on axillary skin crease at the site of maximum radioactivity. By using blunt and sharp dissection, methylene fluorescent lymphatics was identified using a blue light lamp, and blue lymphatics was identified by direct visualization30. Lymph node having highest count was considered as sentinel node and was excised.

Sentinel lymph node tissue sample collection

After the excision of sentinel lymph node, adherent fat tissues were neatly removed, and blood stains were washed thoroughly with 1X PBS (pH 7.4)29. The nodes were longitudinally sectioned to obtain 5 mm thick slices. One set of alternate slices was sent to the Department of Pathology for histopathological assessment. The other set of slices were taken to proteomics facility and stored at − 80 °C for iTRAQ based proteomic experiments.

Histopathology

Histopathological procedures that were followed were those standardized in our previous study29. Lymph node slices were fixed into formalin fixed paraffin embedded blocks. These were further sectioned into 4 μm poly-L-lysine-coated slides. These paraffin sections were deparaffinised with three subsequent washes in xylene and then rehydrated by washing them stepwise in 100% ethanol, 90% ethanol, 70% ethanol and distilled water. The sections were stained with hematoxylin and washed in running water for 5 min. The slides were then stained in eosin solution for two minutes and then rinsed with 95% ethanol. The slides were then subjected to 100% ethanol for two minutes, twice. After final exposure to xylene, a drop of Distyrene Plasticizer Xylene (DPX) was used to mount the tissue on each slide and covered with a glass cover-slip. The slides were examined under the microscope and sentinel lymph node tissue samples were annotated as either Sentinel Lymph Node Metastatic (SLNM +), or Sentinel Lymph Node Non-Metastatic (SLNM-).

Mammography

All recruited patients in the study underwent full-field digital mammography in cranio-caudal projection and medio-lateral oblique projection. The effective dose of a four view mammogram ranged from 4 to 6 mega gray. The evaluation of mammogram was done according to the Breast Imaging Reporting and Data System classification (BIRADS).

Sample phenotyping and protein isolation

Phenotyping and protein isolation was done using protocols standardized in our previous study29. The sentinel lymph node tissue sections which were stored at − 80 °C were annotated as either SLNM + or SLNM- based on histopathology finding. The tissue samples were minced and the proteins were solubilized in 120 μl of lysis buffer that contained 8 M urea, 2 M thiourea and 4% 3-[(3-Cholamidopropyl)dimethylammonio]1-propanesulfonate (CHAPS). The tissue was homogenized by sonication at an interval of 3 s and vortexed for 2 min. The samples were then centrifuged at 15,000 rpm for 20 min at 4 °C, debris was discarded and the supernatant was transferred onto a fresh eppendorf tube. Protein extracted with lysis solution was buffer exchanged with 100 mM Triethylammonium bicarbonate (TEAB) using a 3 kDa cut off membrane filters to bring down the concentration of urea well below 0.1 M. Protein amount was quantified using the Bradford assay using 1 μg/μl of Bovine serum albumin as a standard.

Isobaric tags for relative and absolute quantitation (iTRAQ) labelling

Five sets of SLNM + tissue samples, five sets of SLNM- tissue samples and two cancer control benign breast tumor tissues were taken for iTRAQ experiment. The design of iTRAQ experiment is illustrated in Fig. 1. Each experiment composed of at least one SLNM + , one SLNM- and, one of either of these two phenotypes or a cancer control benign breast tumor tissue. The equimolar culmination of the three phenotypes was made into an internal standard for the sake of normalization for each of the four experiments27. 80 μg of protein from each phenotype sample was reduced with 25 mM DTT for 30 min at 60 °C and alkylated with 55 mM iodoacetamide for 20 min at room temperature. Each of these proteins samples were digested for 16 h with trypsin in 1:10 ratio at 37 °C. Digested peptides were then labelled with iTRAQ 4-plex reagents, 114 (sentinel lymph node metastasis), 115 (sentinel lymph node without metastasis, 116 (one of either of these two phenotypes or cancer control benign breast tumor tissues), and 117 (internal standard) following the protocol provided by manufacturer (AB Sciex, Foster city, CA, USA). In brief, all vials of iTRAQ labelling tags were reconstituted in 70 μl of absolute ethanol. This was then added to each sample and incubated for 2 h at room temperature, and the reaction was quenched using 50 μl mili-Q water. iTRAQ labelled samples in each experiment were then pooled into a single vial and dried using speedvac. These samples were reconstituted in 8 mM ammonium formate buffer (pH: 3) and were fractionated by cation exchange using isotope coded affinity tag cartridge. Peptides were then eluted with 500 μl of gradient elution with 5 mM to 500 mM concentrations range of ammonium formate (pH: 3) to obtain a total of eleven fractions from each experiment. These 44 fractions from the four experiments were vaccum dried and taken for mass spectrometry analysis.

Figure 1
figure 1

Flowchart depicting the methodology used in the study.

Mass spectrometry data acquisition

In-house protocols were used for Mass spectrometry data acquisition27. Peptides from the 44 fractions were desalted and concentrated using reversed phase ZipTip, and reconstituted in 0.1% formic acid. The peptide fractions were loaded onto analytical column (Acclaim PepMap RSLC C18, 2 μm, 100 Å, 50 μm x 15 cm; Thermo Scientific, Rockford, USA) associated with trap column (Acclaim PepMap C18, 3 μm, 100 Å, 75 μm x 2 cm; Thermo Scientific, Rockford, USA). The peptide separation were performed using EASY-nLC 1200 which was coupled with Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Scientific, Rockford, USA) for Mass spectrometry analysis. The peptide fractions were premixed in loading buffer (Mobile phase A: 100% water and 0.1% formic acid) and 1 μg were loaded on a trap column with a flow rate of 300 nl/min. The retained peptides were washed iso-cractically by loading buffer for 45 min to remove excess salt. The peptides were then resolved on an analytical column with a multi step linear gradient of loading buffer and elution buffer (mobile phase B: 80% acetonitrile and 0.1% formic acid) at a flow rate of 250 nl/min. The gradient elution were initiated using 5% elution buffer and were held for 1 min, with linear increase rate of 10% for 10 min, 35% for 70 min and 50% for 80 min. The gradient elutions were held at 80% mobile phase B for 8 min before being re-equilibrated to 5% mobile phase B for 18 min. The mass spectrometer was operated in data dependent acquisition (DDA) mode. The full MS spectra was acquired in positive ion mode in m/z ratio of 350–2000 Da, with a 100 milli second MS accumulation time, whereas the MS/MS product ion scans were performed in the mass range of 100–2000 Da with a 7 milli second accumulation time in Orbitrap mass analyzer. The mass spectrometric setting included Ion spray voltage floating of 1900 V. For 3 s former target was excluded and 25 ions were monitored per MS cycle. DDA advanced ‘rolling collision energy’ were applied for subsequent MS/MS scans with normalized High energy Collision induced Dissociation (HCD) collision energy set to 35%.

Data analysis and reporter ion quantification

Raw files from the Orbitrap Fusion Mass Spectrometer were processed using Proteome Discoverer (version 2.4.1.15) analysis software31. Both MS and MS/MS spectra were searched using Sequest HT algorithm against a combined UniProt Human proteome database appended to a list of common contaminants provided by Thermo Scientific. Sequest HT parameters were specified as trypsin enzyme, two missed cleavages allowed, minimum peptide length of 6, precursor mass tolerance of 10 ppm and a fragment mass tolerance of 0.05 Daltons. The static modification was set to carbamidomethylation (+ 57.021 Da) of cysteine. The dynamic modifications on peptide terminus were set to methionine oxidation (+ 15.995 Da) and iTRAQ 4-plex (+ 144.102 Da) modification, on N-terminus and Lysine (K) residues. Since iTRAQ modification was used as dynamic modification, the unmodified or unlabelled peptides and associated proteins were removed from the analysis. Also, dynamic modification was assigned for acetylation (+ 42.011 Da) of protein’s N-terminus. Peptide spectral match (PSM) error rates were determined using the target-decoy strategy coupled to Percolator PSM validation node to trigger the positive and false matches. In Percolator node, the false discovery rate (FDR) was calculated based on the q-values of Decoy database search. Data were filtered at the peptide spectral match-level using a strict FDR cut off of 0.01 and relaxed FDR cut off of 0.05 as determined by Percolator. Contaminant and decoy proteins were removed from all data sets prior to downstream analysis. Reporter ion values were calculated on “Reporter Ions Quantifier” node using FTMS mass analyzer setting and HCD activation process. Reporter ions were quantified from MS/MS scans using an integration tolerance of 20 ppm with the most confident centroid setting. The following settings were used to obtain the quantification results: the protein ratio type was the ‘weighted’ geometric mean, normalization with summed intensities and outlier removal was ‘automatic’. The peptide threshold was ‘at least homology’ where peptide score does not exceed absolute threshold but is an outlier from the quasi-normal distribution of random scores. Minimum of two unique peptides were required to be the top ranking matches. In Consensus workflow in “Reporter Ion Quantifier” node, the following settings were applied to increase the quantification accuracy of the analyzed proteins: (i) only unique peptides were used for protein quantification, (ii) precursor co-isolation threshold was considered 25%, (iii) average reporter signal to noise ratio threshold was considered 10 and, (iv) peptide normalization was done with respect to the total peptide amount. At the level of protein analysis, further normalization was done where the protein abundance of individual sample was scaled by the abundance of the internal standard which was labelled with channel 117 of iTRAQ 4-plex reagent. On obtaining the results, multiple filter criteria were applied and only those proteins were considered for differential expression analysis which had: (i) FDR confidence threshold as medium during identification by Sequest-HT, (ii) presence of atleast two unique peptides and, (iii) peak found at found in sample.

Expression fold change ratio of ≥ 1.5 and ≤ 0.66 were considered as up-regulated and down-regulated proteins. Proteins with fold change ration between 1.5 and 0.66 were considered as house-keeping proteins. A multiple students t-test was applied to the whole set of differentially expressed proteins and a volcano plot with a p-value < 0.05 was generated to graphically represent the up-regulated, down-regulated and house-keeping proteins. Those proteins that had a consistent expression pattern in atleast four of the six experiments were considered to be potential biomarker candidates to differentiate SLNM + and SLNM-. The relative ratios of protein abundances of only the up-regulated proteins was compared between metastatic group and cancer control breast tissues was used to estimate the possible tissue source.

KEGG pathway and Gene Ontology analysis

Differentially expressed genes were imported on DAVID32 (Database for Annotation, Visualization and Integrated Discovery, version 6.8, https://david.ncifcrf.gov/tools.jsp) functional annotation tool and Functional Enrichment analysis was done. Homo Sapiens was used as background species and the enrichment analysis was run for Cellular Component in Gene Ontology (GO_CC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways. Only those results that had FDR adjusted p-values ≤ 0.05 were considered.

ELISA

Twelve differentialy expressed proteins: α-crystallin B chain, monoamine oxidase, caveolin-1, collagen α-1, desmin, fibrillin-1, long-chain-fatty-acid–coA ligase 1, laminin subunit α-4, heterogeneous nuclear ribonucleoprotein D, non-histone chromosomal protein, cathelicidin antimicrobial peptide, rho GDP-dissociation inhibitor 2 were chosen for validation phase of the experiment. The protein concentrations of these proteins were quantified in validation set of 13 SLNM + and 12SLNM- patients using ELISA according to the manufacturer's instructions (Bioassay Technology Laboratory, Shanghai, China). All determinations were performed in duplicates according to the manufacturer’s recommendations. Differences between SLNM + and SLNM- groups of patients were calculated using independent Student t-test; values of p < 0.05 were considered significant.

Statistical analysis

Normalization of the proteins was done using MetaboAnalyst (version 5.0) software (https://www.metaboanalyst.ca/MetaboAnalyst/ModuleView.xhtml)33 using sum of protein abundances. Data transoformation using generalized logirithm and has been scaled using Pareto scaling option. Data analysis for ELISA was carried out using STATA version 16.0 version. Protein concentrations that could be used as cut-off to distinguish between metastatic state and non-metastatic state, were estimated based on Receiver Operating Characteristics (ROC) analysis of the ELISA data. The non-parameteric ROC analysis was carried out using DeLong method. Area under the curve (AUC) was obtained with 95% confidence limits. Optimum cut-off value was obtained at which Yuden Index (sensitivity + specificity-1) was maximum. Percentage correct classification and likelihood ratio values were computed. Statistical significance level P < 0.05 was adopted to test the significance of the AUC.

Ethics approval and consent to participate

This study was conducted after approval was obtained from the Institute Ethics Committee at All India Institute of Medical Sciences New Delhi, India (Ref. No. IECPG- 27/23.01.2019, RT-03/28.02.2019).

Result

Clinical profile

Based on the inclusion and exclusion criteria a total of twenty eight patients with early breast cancer were recruited to this study. From among these, thirteen patients were those who had sentinel lymph node metastasis; thirteen patients were those who did not have sentinel lymph node metastasis, and two patients who were diagnosed with fibroadenoma were recruited to procure breast tissue that would serve as cancer controls. From among these, five with SLNM + , five with SLNM-, and the two cancer control benign breast tumor were chosen for the discovery phase of proteomic experiments by iTRAQ. The clinical profile of these twelve patients is provided in Table 1. All patients with early breast cancer were confirmed by mammography and sentinel lymph node tissue samples annotation either with sentinel lymph node metastasis or without metastasis were done using histopathology evaluation. The mammography images of patients who were considered for the discovery phase of the study are shown in Fig. 2. The assessment of mammogram was performed according to scores of BIRADS (Breast Imaging Reporting and Data System) classification. The hematoxylin and eosin stained slides for the five metastatic, five non-metastatic sentinel lymph nodes and two cancer control benign breast tumor tissues are shown in Fig. 3.

Table 1 Clinical profile of patients recruited in the discovery phase of the iTRAQ experiment.
Figure 2
figure 2

Medio-lateral oblique and cranio-caudal mammography images of patients recruited in the discovery phase of the study. (A)–(J) shows features of early invasive breast cancer and K-L shows features of benign breast tumor. (1) focal architectural disruption; (2) speculated mass; (3) micro-calcification; (4) micro-calcification along with subtle focal asymmetry; (5) multiple speculated masses; (6) macro-lobulated mass; (7) subtle area of focal architectural distortion; (8) circumscribed mass with focal speculation; (9) architectural distortion; (10) unilateral axillary lymphadenopathy; (11) well circumscribed masses; (12) fat containing masses.

Figure 3
figure 3

Photomicrographs showing sentinel lymph node with metastasis (ae), sentinel lymph node without metastasis (fj), and benign breast tumor tissue (k,l) in hematoxylin and eosin stained sections under light microscopy. (1) complete effacement of architecture by a metastatic cancer cells; (2) infilteration by a metastatic cancer cells; (3) presence of tumor cell nests in the sub-capsular area with a part of the lymph node seen at one edge; (4) presence of large area of tumor cell nest replacing the normal lymph node parenchyma; (5) complete effacement of architecture by metastatic cancer cells. (6–10) shows prominent reactive centre without any evidence of tumor cells in the lymph node parenchyma and depicts paracortical expansion with vascular proliferation indicative of reactive lymphadenopathy. (11–12) depicts benign breast lesion consisting of terminal duct lobular units with bi-layered epithelium and a slant inter venium stroma.

iTRAQ labelling

In our study, proteins isolated from metastatic and non-metastatic sentinel lymph node tissues, and cancer control benign breast tumor tissues were labelled with isobaric tags 114, 115 and 116. Equimolar protein concentrations of proteins from each representative phenotype in the experiment were labelled with tag 117 for the sake of normalization. Out of 276,572 Peptide Spectrum Matches (PSMs), 269,279 (97.4%) are iTRAQ modified and out of a total of 42,105 peptides, 40,074 peptides (95.2%) were found to be iTRAQ labelled reflective of the labelling efficiency.

Protein expression

From the 44 fractionated iTRAQ four plex labelled samples, 6335 unique protein groups were identified and after applying stringent filter criteria 2398 proteins were taken up for further analysis. Distribution of these proteins is shown by a normal curve indicative of the quality of the normalization (Supplementary Fig. 1). This process has adjusted for differences among different samples, data transformation and scaling to make individual protein expressions comparable across metastatic and non-metastatic lymph node groups. A pair-wise multiple Students’s t-test that incorporates p-value was used to arrive at 109 differentially expressed proteins between SLNM + and SLNM- of which 49 proteins are up-regulated 60 proteins are down-regulated as shown in volcano plot (Fig. 4). The relative abundance ratios of only the up-regulated proteins were compared between metastatic group and cancer control breast tissues. Proteins such as desmin, fibrillin 1, tau-tubulin kinase, transgelin, calponin 1 and myosin 11 that had a ratio of more than one are ones that are native to the lymph node, and proteins such as heat shock protein 6, α-crystalline B, amine oxidase 3, caveolin 1, collagen α1, fibrinogen gamma chain, GAPDH, long chain fatty acid Co-A ligase, laminin subunit α4, membrane primary oxidase, microfibril-associated glycoprotein 4, perilipin 1 and redox regulatory protein that had a ratio of less than one are derived from breast tissue. Based on the functional annotation and their relevance in this study few of these proteins are discussed in Tables 2 and 3.

Figure 4
figure 4

Volcano plot of proteins identified in iTRAQ experiment. Red dots represent up-regulated proteins with > 1.5 fold change, green dots represent down-regulated proteins with < 0.66 fold change. House-keeping proteins with expression fold between 1.5 and 0.66 is shown as greydots. FC indicates Fold Change.

Table 2 Proteins up-regulated in tissue of sentinel lymph node with metastasis as compared to sentinel lymph node without metastasis.
Table 3 Proteins down-regulated in tissue of sentinel lymph node metastasis as compared to sentinel lymph node without metastasis.

Validation by bioinformatic analysis and ELISA

KEGG pathway analysis highlights the role of the differentially expressed proteins in various pathways (Fig. 5a). The most interesting feature is the ECM-receptor interaction that implicates seven proteins that include various isoforms of collagen-α. Apart from ECM-receptor interaction pathways, focal adhesion and PI3K-Akt signalling feature prominently in the pathway analysis. Gene Ontology for Cellular Component (GO_CC) enrichment that was carried for the 49 upregulated proteins shows that majority of the proteins belong to extra cellular component (Fig. 5b). The next component with highest number of proteins was focal adhesion component. Focal adhesions are large macromolecular assemblies through which mechanical force and regulatory signals are transmitted between the extracellular matrix and an interacting cell. Implication of extracellular proteins is therefore quite evident in breast cancer metastasis. Gene Ontology for Cellular Component (GO_CC) enrichment that was carried for the 61 down-regulated proteins shows majority of the proteins being confined to cytoplasmic, cytosol and membrane component (Fig. 5c). Based on the results of bioinformatic studies, ELISA was performed on upregulated extracellular proteins in SLNM + . Caveolin 1, Desmin, Microfibrillar associated glycoprotein,Collagen α4 and Fibrillin 1 were confirmed to be elevated in SLNM + as compared to SLNM- (Fig. 6) (Supplementary Table 1). These proteins have a minimum of two fold higher expression in SLNM + as compared to SLNM-.

Figure 5
figure 5

KEGG pathways analysis and Gene Ontology of differentially expressed proteins. (a) KEGG pathway analysis showing the various possible pathways regulated by the differentially expressed proteins; (b) Gene Ontology for cellular components showing distribution of 49 up-regulated proteins across various cellular organelles; (c) Gene Ontology for cellular components showing distribution of 60down-regulated proteins across various cellular organelles. Square boxes designate the FDR corrected p-values.

Figure 6
figure 6

ELISA. Bar diagrams showing the ratio of protein expression of (a) caveolin-1, (b) desmin, (c) microfibrillar associated glycoprotein 4 (d) collagen α − 1, and (e) fibrillin-1 in metastatic patients (black) and non-metastatic patients (grey). *indicates P < 0.05; ****indicates P < 0.0001.

Discussion

The predominant age group of early breast cancer patient recruited in our study was 36–59 years. Most of the patients presented with lump, nipple discharge, nipple retraction, pain in one part of the breast. Progesterone, esterogen and HER-2 status did not have any correlation with the lymph node status of early breast cancer. Patients clinically diagnosed as early breast cancer were subjected for mammography screening to confirm the type of breast cancer. Twenty five patients were diagnosed with early breast cancer on the basis of their characteristic features like abnormal masses and collection of calcification. As normal control tissues could not be procured due to ethical concerns, Two fibroadenoma tissues that were closest representations of ‘cancer control’ breast tissues were used to know the source of the protein. Methylene blue was used to map the sentinel lymph node in axilla region of breast, followed by its excision. Histopathological analysis was done to annotate the tissue phenotype either SLNM + or SLNM- or benign tumor.

Proteins from sentinel lymph node tissues were isolated, quantified, digested with trypsin and labelled with different isobaric tags combined into one sample mixture for identification and quantification by LC–MS/MS analysis. Isobaric tags for relative and absolute quantitation (iTRAQ) technology relies on the quantitation of low molecular mass reporter ion groups released from isobaric tags that are covalently bind to primary amines of tryptic peptides that need to be quantified via amine labelling. The final experiment so designed was to enable: (i) six phenotypic protein profile comparisons, (ii) intra-experimental normalization and (iii) understand the cellular source of the protein. While the Up-regulated proteins in SLNM + , are related to tumorogenesis, cell proliferation, motility, cell survival, progression and anti-apoptosis, the up-regulated proteins in SLNM- are involved in cell motililty suppression and influenzing decreased cell growth. Expression of five proteins caveolin-1, collagen α1, desmin, fibrillin-1, and microfibrillar associated glycoprotein 4 were validated and were found to be consistent with the discovery phase results. The functions of these proteins in the context of understanding sentinel lymph node metastasis are: (a) Caveolin-1 (Cav-1), a 22 kDa small oligomeric scaffolding protein encoded by CAV1 gene is a major structural protein of membranes called caveolae and plays very crucial role in many cellular processes, including endocytosis, receptor internalization, ECM organization, lipid transport, signal transduction86,87; (b) Collagen αtype1 (COL1A1), a 138 kDa protein encoded by the COL1A1 gene is a most abundant protein of extracellular matrix forms a characteristic triple helix structure of three polypeptide chains, and contributes to the integrity, elasticity and strength of body's connective tissues, entrapment, local storage and delivery of growth factors and cytokines and therefore plays an important role during organ development, wound healing and tissue repair88,89; (c) Desmin (DES), a 53 kDa protein encoded by DES gene is a muscle-specific protein and a key subunit of the intermediate filament in cardiac, skeletal and smooth muscles, and plays an essential role in maintaining extracellular matrix interactions, cytoarchitecture, structural integrity and function of muscles by forming three dimensional scaffold across sarcomeres of smooth muscles90,91; (d) Fibrillin-1 (FBN1), a 312 kDa protein encoded by FBN1 gene, is a large cysteine rich glycoprotein produced by fibroblasts, and is the principal structural component of extracellular matrix forming microfibrils in the connective-tissue, and Interacting with other components of the extracellular matrix (ECM), this ubiquitous glycoprotein exert pivotal roles in tissue development, homeostasis and repair. In addition to mechanical support, FBN networks also exhibit regulatory activities on growth factor signalling, ECM formation, cell behaviour and the immune response92,93; and (e) Microfibrillar-associated glycoprotein 4 (MFAP4), a 36 kDa protein encoded by MFAP4 gene is an extracellular matrix protein that plays major role in elastin fiber formation and is associated with ECM remodeling processes during vascular injury, and interacts with other ECM proteins such as FBN1 that provides cell adhesion, intercellular interactions and the assembly and/or maintenance of elastic fibres94,95.

Label based differential proteomic experiments, Pathway analysis, Gene Ontology studies, and ELISA experiments clearly establish the role of extracellular matrix proteins in sentinel lymph node metastasis. Extracellular matrices (ECMs) are highly specialized and dynamic three‐dimensional (3D) scaffolds into which cells reside in tissues and its principal components are collagens, glycoproteins, and proteoglycans96. Upon physiological and pathological triggers, ECM-degrading enzymes, matrikines, are released to remodel the ECM, to re-establish an appropriate functional meshwork and maintain cellular homeostasis97,98. But in metastasis, ECM remodeling is hijacked and there is perturbation and degradation in ECM architecture by matrix metalloproteinases99,100. Due to ECM degradation there is loss of endothelial integrity allowing cancer cells to escape from primary tumor to other tissues including lymph nodes101. During this process of metastasis, cancer cells undergo epithelial-to-mesenchymal transition (EMT), which can be induced by increased deposition of ECM proteins102. This action alters the phenotypic properties of cells and affects their propensity to escape primary tumor and cause metastasis103. In addition, an increased regulation of ECM proteins through recruitment and activation of cancer-associated fibroblasts (CAF) results in activation of biophysical and biochemical oncogenic signalling pathways104. The oncogenic signalling pathways of the identified extra cellular matrix proteins are (a) caveolin-1: PI3K/AKT and Ras/Raf signaling through the ERK; (b) Desmin: PI3K/AKT through caspase; (c) microfibril associated glycoprotein 4: ERK/MMP signalling through FAK and c-Jun; (d) collagen α-1: FAK signalling through PI3K/AKT and MAPK/ERK, and (e) fibrillin 1: SMAD2/3/4 and MEK pathway through ERK; which induces cell proliferation, survival, motility, angiogenesis, hypoxia, cancer stem cell activity, epithelial to mesenchymal transition and eventually lymph node metastasis105,106,107,108,109. Caveolin-1, Desmin and Collagen α-1 which mediate their functions through PI3K/Akt signalling which is represented along side ECM-receptor interaction in the pathway analysis. The overview of this detailed analysis is pictorially represented in Fig. 7. Therefore, it is this interplay between the up-regulated extra cellular matrix proteins, active growth factors of cancer cells, fibroblasts and signalling pathways, which together promote lymph node metastasis.

Figure 7
figure 7

Diagrammatic representation showing oncogenic signalling pathways in breast cancer. Abbreviations: PI3K, Phosphatidylinositol 3-kinase; PIP3, Phosphatidylinositol (3,4,5)-trisphosphate; AKT, Protein kinase B; FAK, Focal adhesion kinase; GrB2, Growth factor receptor-bound protein 2; Ras, Rat sarcoma; Raf, Rapidly accelerated fibrosarcoma; Mek, Mitogen-activated protein kinase kinase; Erk, Extracellular signal-regulated kinase; mTOR, mechanistic target of rapamycin; PDK2, Pyruvate dehydrogenase kinase 2; Bcl2, B-cell lymphoma 2; NFkB, Nuclear factor kappa-light-chain-enhancer of activated B cells; JAK, Janus kinase; JNK, Jun N-terminal Kinase; SMAD, Small mothers against decapentaplegic.

Caveolin-1, Desmin, microfibril associated glycoprotein 4, fibrillin 1 and collagenα-1 have been identified as potential biomarkers that can discriminate metastatic from non-metastatic sentinel lymph nodes in early breast cancer. To understand their ability to differentiate the two clinical phenotypes AUCs were plotted based on the ELISA concentrations (Supplementary Fig. 2). The areas were estimated to be 0.81 and 1.0 for these five proteins. Diagnostic parameters of sensitivity, specificity, positive predictive value and Negative predictive values are over 80% for each making them fairly accurate as a translational tool (Table 4). Caveolin-1 and Desmin with cut-off values of 17.4 and 28.5 ng/μg of tissue protein respectively, in particular are promising candidates with 100% values for all the diagnostic parameters. Since both the sensitivity and specificity measures are independent of prevalence rate, the practical utility of these two markers need to be validated among the population with varying prevalence rates.

Table 4 Diagnostic parameters of identified proteins to differentiate SLNM + from SLNM- in early breast cancer.

Conclusion

iTRAQ based proteomic experiment is an ideal platform for comparative protein profiling in identification of potential biomarker candidates for sentinel lymph node metastasis in early breast cancer. Extra cellular matrix proteins caveolin-1, collagen α-1, desmin, fibrillin-1, and microfibrillar associated glycoprotein 4, have been identified as potential biomarkers that can differentiate the two metastatic states of sentinel lymph nodes. Each of these by themselves or as a collective panel offers translational scope for the design of ‘on-table’ diagnostics to flag sentinel lymph node metastasis in early breast cancer.