Definition of a cell surface signature for human cardiac progenitor cells after comprehensive comparative transcriptomic and proteomic characterization

Adult cardiac progenitor/stem cells (CPC/CSC) are multipotent resident populations involved in cardiac homeostasis and heart repair. Assisted by complementary RNAseq analysis, we defined the fraction of the CPC proteome associable with specific functions by comparison with human bone marrow mesenchymal stem cells (MSC), the reference population for cell therapy, and human dermal fibroblasts (HDF), as a distant reference. Label-free proteomic analysis identified 526 proteins expressed differentially in CPC. iTRAQ analysis confirmed differential expression of a substantial proportion of those proteins in CPC relative to MSC, and systems biology analysis defined a clear overrepresentation of several categories related to enhanced angiogenic potential. The CPC plasma membrane compartment comprised 1,595 proteins, including a minimal signature of 167 proteins preferentially or exclusively expressed by CPC. CDH5 (VE-cadherin), OX2G (OX-2 membrane glycoprotein; CD200), GPR4 (G protein-coupled receptor 4), CACNG7 (calcium voltage-gated channel auxiliary subunit gamma 7) and F11R (F11 receptor; junctional adhesion molecule A; JAM-A; CD321) were selected for validation. Their differential expression was confirmed both in expanded CPC batches and in early stages of isolation, particularly when compared against cardiac fibroblasts. Among them, GPR4 demonstrated the highest discrimination capacity between all cell lineages analyzed.

. Global scheme of the differential expression analysis. (a) Three independent CPC isolates (CPC1-3; previously referred as H1, H3 and H4, respectively) were compared with three MSC isolates (19, 33, 45) and three HDF isolates (F1-3), at the indicated cell passages. (b) Flow chart of the different analyses carried out with the indicated samples.   show the isotype control stainings. (b) FACS analysis of CD105 expression (blue peak) in CPC-3 compared with MSC (MSC19) (red peak); negative controls with the isotype control stainings are indicated (white peak). The assays were repeated three times; data shown correspond to a representative experiment.  indicates that the specific marker has been also validated by proteomics techniques; (F) indicates that the specific marker has been also validated by FACS. Main references used for the analysis are the following: hCDC 8 , hCPC 9 , mCDC 10 , mckit-CSC 11 , mSca1-CSC 12 and BMi1-CPC 13 . All data for murine (m) populations were obtained from freshly purified fractions (Fr), a clear difference with human (h) cells.

Figure S8. Validation of putative markers of CPC in human and porcine tissue.
(a) RT-qPCR analysis of GPR4, IGFBP2, CACNG7, F11R and CDH5 in human cell samples corresponding to CPC isolates (C1-3; blue bars), cardiac fibroblasts (cFib; grey bars), MSC (red bars) and fibroblasts from different origins (Fib; green bars). The assays were performed three times and data are expressed as mean ± SD; black lines indicate the p-value summary (Mann-Whitney test,***<0.002, **<0.02 *<0.05, ns = not significant) of CPC vs. HCF. b) Comparative expression analysis (RT-qPCR) of F11R and CACNG7, both in long-term expanded human (CPC1 and 2) and porcine (pCPC1 and 2). c) RT-qPCR analysis of CACNG7 and F11R expression in porcine samples, both in pCPC and in heart tissue during early isolation stages (p2, p5) The assays were performed three times and data are expressed as mean ± SD; black lines indicate the p-value summary (***<0.002, **<0.02 *<0.05, ns = not significant) of pCPC vs. heart tissue (one-way analysis of variance followed by the Bonferroni multiple comparison test).
(d) FACS analysis of CDH5 expression in freshly isolated cardiac pig cells (left; p2), compared with longterm expanded pCPC3 (right); CDH5 (black); isotype negative control (red). Table S1. Complete list of DEG in CPC derived from RNAseq Table S2. Complete list of proteins identified by label-free proteomics, organized by preferential expression in CPC, MSC and HDF. Organization based on subcellular location.

Human bone marrow-derived MSC and human fibroblasts
Human bone marrow-derived MSC (hMSC) and human dermal fibroblasts (HDF) were obtained from the Inbiobank Stem Cell Bank (www.inbiobank.org). Briefly, cadaver bone marrow was harvested from braindead donors under the supervision of the Spanish National Transplant Organization (Organización Nacional de Trasplantes, ONT). Relatives gave informed consent. Each sample donor was tested and found negative for: HIV-1/2, hepatitis B-C, cytomegalovirus and mycoplasma. All cells were processed at Inbiobank following manufacturing procedures based on ISO9001:2000 in GMP conditions. 05+ Phenotypes were described previously 3 . The hMSC displayed a typical CD29+, CD73+ (SH3 and SH4), CD1 (SH2), CD166+, CD34-, CD45-and CD31-phenotype. In the presence of specific differentiation factors, these cells were able to differentiate into osteocytes, chondrocytes and fatty cell lineages.
Other fibroblastic cell lines used in the study were as follow: HCF (ScienceCell, cat. Number 6300), HCF-c (PromoCell, cat. Number C-12375) and HPF-c (PromoCell, cat. Number C-12360). MSC and all fibroblastic cell lines were cultured in optimal conditions for each, in low-glucose DMEM (Sigma-Aldrich) with 10% FBS.

Immunohistochemical analyses
For immunohistochemistry, heart samples were fixed in 4% paraformaldehyde (PFA) (overnight, 4ºC) and cryopreserved in 30% saccharose, frozen in OCT compound, and sectioned in 8-mm sections on a cryostat. Heart immunohistochemistry and immunocytochemistry (ICC) have been described in detail

Proteomics analysis
Cells from CPC isolates H1, H3 or H4 were cultured; after repeated washing in PBS, cell pellets (5-8 x 10 7 ) were collected and aliquoted. For protein extract preparation, cell pellets were resuspended in lysis buffer (50 mM Tris-HCl pH8.5, 4% SDS, 50 mM DTT), boiled (5 min) and incubated (30 min, room temperature) for full protein solubilization. Total protein (~200 mg) was digested using a filter-aided sample preparation protocol (FASP, Protein Digestion Kit, Expedeon) following manufacturer's instructions. Protein extracts were diluted in buffer UA (8 M urea in 0.1 M Tris-HCl pH 8.5) and loaded onto 30K centrifugal filter devices. Denaturation buffer was replaced by washing with buffer UA. Proteins were alkylated using 50 mM iodoacetamide in buffer UA (20 min in the dark) and excess alkylation reagent was eliminated by three washes with buffer UA and three washes with 50 mM ammonium bicarbonate. Proteins were digested (37ºC, overnight) with modified trypsin (Promega) in 50 mM ammonium bicarbonate at a 40:1 protein: trypsin (w/w) ratio. The resulting peptides were eluted by centrifugation with 50 mM ammonium bicarbonate (twice) and 0.5 M sodium chloride. TFA was added to a final concentration of 1%, the peptides were desalted on C18 Oasis-HLB cartridges (Waters) and dried for further analysis. The resulting tryptic peptides were dissolved in 0.1% formic acid, loaded onto the nLC-MS/MS system for on-line desalting on C18 cartridges, and analyzed by nLC-MS/MS using a C-18 reverse-phase nanocolumn (75 mm ID x 50 cm, 3 mm particle size, Acclaim PepMap 100 C18, Thermo-Fisher) in a continuous acetonitrile gradient consisting of 0-30% B in 180 min, 30-43% in 5 min and 43-90% B in 1 min (A= 0.5% formic acid; B=90% acetonitrile, 0.5% formic acid). Peptides were eluted at a flow rate of ~200 nL/min from the reverse phase nano-column to an emitter nanospray needle for real time ionization and peptide fragmentation on orbital ion trap mass spectrometers (Orbitrap Elite mass spectrometer, Thermo-Fisher). To increase proteome coverage, tryptic peptides were fractionated by cation exchange chromatography (Oasis HLB-MCX columns), desalted and analyzed as above.
The enriched membrane fraction from CPC (H4 isolate, Coretherapix), MSC and HDF, using the optimized extraction protocol 4 . was processed by off-line fractionation (medium cation exchange chromatography, MCX) prior to nLC-MS/MS analysis. Six MCX fractions were analyzed by nLC-MS/MS in the Orbitrap XL equipment, using a 3-h gradient. Approximately 20% of the proteins identified were assigned as plasma membrane proteins, including several receptors and proteins with numerous predicted transmembrane domains (TMD).

Database search
To identify peptides, MS/MS spectra were searched with the SEQUEST HT algorithm implemented in Proteome Discoverer 1.4.0.29 (Thermo Scientific). For database searching at the Uniprot database (which contains all human sequences; March 06, 2013; 70024 entries; Mann's lab contaminants), search parameters were: trypsin digestion with 2 maximum missed cleavage sites, precursor and fragment mass tolerances of 800 ppm and 1.2 Da, respectively for Elite files, or 2 Da and 0.02 Da, respectively for QExactive files, carbamidomethyl cysteine as fixed modification and methionine oxidation as dynamic modifications.
For iTRAQ-labeled peptides, N-terminal and Lys iTRAQ modifications were selected as a fixed modification. Results were analyzed using the probability ratio method with additional filtering for a precursor mass tolerance of 15 ppm; a false discovery rate (FDR) for peptide identification was calculated based on the search results against a decoy database using the refined method. The iTRAQ reporter ion intensities were retrieved from MS/MS scans by QuiXoT software and used as inputs to the weighted spectrum, peptide and protein statistical (WSPP) model to obtain peptide and protein abundance changes.
Using the ontologies and annotations included in the GO database, WSSP was used to assess statistically significant changes at the protein function level.

Peptide quantification and statistical analysis
Peptides were quantified using QuiXoT quantitative proteomics software ( Statistical analyses were based on the WSPP statistical model 5 a random effects model that considers five sources of variance: spectrum-fitting, scan, peptide, protein and functional category levels. A weight !"# was associated to each spectrum, using the maximum intensity of each pair of iTRAQ reporter ions compared. The overall log2 ratio of each peptide, !" , was calculated as a weighted average of the scans matching each peptide. The log2 ratio of each protein, ! , was similarly calculated, using the weighted average of all the peptides that identify the protein studied. The final ! value was corrected by subtracting the grand mean of each experiment. The weights for spectra, !"# , were corrected based on the spectrum level variance, ! ! . The weight for peptides, !" , was then calculated by adding the weights of all spectra matching that peptide and considering variance, ! ! , and the weight for the protein, ! , was calculated by adding the weights of all peptides associated to each protein, considering protein variance, ! ! . Standardized variables, !"# , !" and ! , were defined at each level as the meancorrected log2 ratio, expressed in units of the corresponding standard deviation. Further details can be found in previous reports [5][6][7] .

Bioinformatics identification
For peptide identification, all spectra were analyzed with Proteome Discoverer (version 1.4.0.29, Thermo Fisher Scientific), using a Uniprot database containing all human and chicken protein sequences (November 23, 2011). For searching, parameters were selected as follows: trypsin digestion with 2 maximum missed cleavage sites, precursor and fragment mass tolerances for Elite of 600 ppm and 1200 mmu respectively (2 Da and 0.02 Da respectively for QExactive), carbamidomethyl cysteine as fixed modification, and methionine oxidation as dynamic modifications. Peptide identification was validated using the probability ratio method and FDR were calculated using inverted databases and the refined method.

RNA-Seq analysis
Sequenced reads were quality-controlled and pre-processed using cutadapt v1.6 to remove adaptor contaminants. Resulting reads were aligned and gene expression quantified using RSEM v1.1.19, over human reference GRCh37 and Ensembl gene build 65. Only genes with at least 1 count per million in at least three samples were considered for statistical analysis. Data were then normalized and differential expression tested using the bioconductor package EdgeR v3.0.8. We considered as differentially expressed those genes with a Benjamini-Hochberg adjusted p-value ≤0.05. For the set of differentially expressed genes, functional analysis was performed using topGO v2.10 Bioconductor R package, with annotations from org.Hs.eg.db and GO.db v 2.8. For the functional analysis, genes were classified by subcellular compartment (nucleus, cytoplasm, plasma membrane, extracellular) according to GO annotations. Enrichment was performed using the full list of equally localized genes as reference. Top biological processes and molecular functions were selected using the Weighted Fisher method implemented by topGO with P<0.01.