Molecular characterization and integrative genomic analysis of a panel of newly established penile cancer cell lines

Cell line models are essential tools to study the molecular mechanisms underlying tumor initiation and progression. There are limited treatment options for penile squamous cell carcinoma (PSCC), accounting for 1–2% of male tumors in developing countries, and limited progress in preclinical research in PSCC due to lacking available models with identified genomic characteristics. Here, biological and molecular characteristics and whole-genomic alterations were analyzed in a panel of PSCC cell lines newly established in our laboratory. These cell lines were all human papillomavirus (HPV)-negative, epithelial-like, immortalized, and tumorigenic in nude mice, whereas they displayed different proliferation, migration and invasion capacities in vitro, and tumorigenic ability in nude mice. They were all cisplatin sensitive, anti-EGFR therapy resistant, and androgen irresponsive. Whole-genomic sequecing analysis revealed that transition mutations (C:G>T:A and T:A>C:G) were the most common substitution types in these cell lines, whereas ERCC5, TP53, PTH1, CLTCL1, NOTCH2, MAP2K3, CDK11A/B, USP6, ADCH5, BCLAF1, CDKN2A, FANCD2, HRAS, and NOTCH1 were the most frequently altered genes. Amplifications of MYC, PLAG1, NCOA2, RUNX1T1, COX6C, and EGFR and losses of FBXW7, TET2, XPC, and FANCE were frequently observed in cell lines. The exomic variations between cell lines and their corresponding cancer tissues were highly consistent. Genetic variations were mainly involved in the MAPK, Jak-STAT, TGF-beta, Notch, and apoptosis signaling pathways. Conclusively, these panel of PSCC cell lines established in our laboratory harbor some common or specific biological characteristics and genomic variations, and they may serve as optimal models to investigate the molecular mechanisms underlying the progression, metastasis, relapses, and treatment resistance of PSCC and to develop effective treatment strategy.


Introduction
Penile squamous cell carcinoma (PSCC) is a rare malignancy in developed countries with an incidence of (0.3-1)/ 100,000. However, PSCC constitutes up to 1-2% of male cancer in the developing world [1][2][3] . Risk factors for PSCC include poor hygiene, phimosis, smoking, lack of circumcision, and human papillomavirus (HPV) infection 1 . The 5year overall survival in patients with metastatic inguinal lymph nodes (LNs) is approximately 50% and with pelvic LN metastasis is~0%, compared with >90% in patients without LN involvement [4][5][6] . Due to the limited basic and clinical studies, treatment options are very deficient for PSCC patients, especially those with advanced disease.
Previous studies have revealed the mutational landscapes of PSCC using next-generation sequencing, which contributes greatly to our understanding of the tumorigenesis and development of penile cancer [7][8][9] . However, the biological functions of these genetic aberrations have not been clarified by in vitro and animal experiments, due to fewer available PSCC cell lines with various genetic backgrounds. Japanese researchers established three penile cancer cell lines in the last century [10][11][12] , and then a pair of cell lines Ki-PeCa-P1/L1 from a primary lesion and its corresponding metastatic LN 13 and a cell line from human verrucous penile carcinoma 14 were established by German and Brazilian researchers, respectively, in 2012 and 2016. Even so, few findings about molecular mechanisms underlying PeSCC oncogenesis based on these cell lines were reported in subsequent years.
To investigate molecular mechanisms underlying PSCC oncogenesis, we established four other new PSCC cell lines after development of an HPV-negative PSCC cell line with a TP53 mutation in 2016 15 , and compared the biological and genetic characteristics of these five cell lines; some common or specific genomic variations were found, which will contribute to clarifying molecular mechanisms of PSCC tumorigenesis and searching for potential therapeutic targets of PSCC in further studies.

The origins of cell lines
All cell lines were derived from Chinese PSCC patients, whose detailed clinicopathological characteristics are presented in Additional Table S1. Notably, 149RCa and 149RM were two different cell lines generated from the local recurrent lesion or a nearby scrotal invasion lesion in the same patient who suffered a relapse disease after systemic therapy. Cell line LM156 was generated from a patient with poorly differentiated disease.

Morphological features and purity of cell lines
In vitro culture showed that all PSCC cell lines grew well as a well-attached monolayer, displayed a distinct epithelial appearance (with a cobblestone pattern), and exhibited typical characteristics of malignant change: large nucleus, scarce cytoplasm, prominent nucleoli, irregular size and shape (Fig. 1a). Flow cytometry assays revealed that 95.7-99.08% of the cells were positive for epithelial cell marker Pan-cytokeratin (Pan-CK, Fig. 1b), and Western blotting (WB) assays showed that Pan-CK was highly expressed in all PSCC cell lines, whereas fibroblast marker vimentin could not be detected (Fig. 1c). These data suggested that these five cell lines were highly purified, without fibroblast contamination.

Detection for the infection of HPV and mycoplasma
HPV infection is thought to play an important role in the carcinogenesis of PSCC 16 ; however, HPV DNA could not be detected in all cell lines and their corresponding tumor tissues (quantitative real-time PCR, all Ct values > 35) (Additional Table S1). In addition, none of the cell lines were found to contaminate with mycoplasma ( Fig. 1d).

Short tandem repeat (STR) analysis for the validation of cell line
Detailed STR information of the cell lines was presented in Additional Table S2. The STR profiles of all five cell lines did not match any cell lines in the databases from ATCC, DSMZ, or JRCB. Notably, the STR profiles of 149RCa and 149RM were identical, confirming that they were from the same patient.

Cell proliferation in vitro and tumorigenicity in nude mice
All PSCC cell lines were immortalized cells and grew well in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS). The growth curves and doubling times of the cell lines are presented in Fig. 2a. Among the five cell lines, the 149RCa and 149RM cell line had the shortest doubling time (26.75 and 26.39 h), whereas the LM156 cell line had the longest doubling time (34.43 h). These cell lines were all tumorigenic in male/female BALB/C nude mice, but their tumor formation rates were different, ranging from 4/10  Fig. S1). Cell line 149RCa grew faster than the other four cell lines in nude mice ( Fig. 2c and Additional Fig. S1). Hematoxylin and eosin (H&E) staining analysis showed that the xenograft tumors were squamous cell carcinoma, consistent with their original tumors ( Fig. 2d and Additional Fig. S1).

Migration and invasion abilities of PSCC cell lines
Among these five cell lines, 149RM had the highest migration potential (Fig. 3a, c; p < 0.01 or 0.001), whereas Penl1, Penl2, and LM156 that were from metastatic LNs had stronger invasive ability, especially Penl2 (Fig. 3b, d; all p < 0.001). Although the invasion rate of 149RM was lower than three cell lines from metastatic LNs, 149RM had stronger invasion activity than 149RCa (Fig. 3d, p < 0.001), which may be partially because 149RCa and 149RM are derived from the same patient's local recurrent lesion and nearby scrotal invasion lesion, respectively.

Androgen response and treatment sensitivity for therapeutic agents
The progression of some male tumors may be associated with androgen stimulation; however, we found that these PSCC cells expressed very low levels of androgen receptor (AR), and androgens did not affect the growth of these cells (Additional Fig. S2). We next tested their sensitivity to cisplatin, a widely used chemotherapeutic drug. The results demonstrated that all cell lines were highly sensitive to cisplatin, characterized by low values of the half maximal inhibitory concentration (IC 50 ) (0.85-2.21 μg/mL) (Fig. 4c, d).
We next examined the expression of epidermal growth factor receptor (EGFR) by WB assay and found that EGFR highly overexpressed in PSCC cell lines Penl1, Penl2, 149RCa, and 149RM, whereas there was a moderate EGFR expression in LM156 cells (Fig. 4c). These findings suggest a promising EGFR-targeted therapy for PSCC. However, cetuximab, an anti-EGFR antibody, showed no obvious inhibitory activities in these cell lines (all IC 50 s > 1000 μg/mL) (Fig. 4d), whereas cetuximab displayed a slight inhibition in Penl1 cells. Similarly, the five cell lines displayed a strong resistance to other two EGFR tyrosine kinase inhibitors, erlotinib and afatinib (Fig. 4e, f).

Mutation patterns of penile cancer cell lines
Whole-genome sequencing (WGS) analyses revealed that these cell lines exhibited a similar mutational spectrum, containing 1229-1290 single-nucleotide polymorphisms (SNPs) per megabyte compared to the National Center for Biotechnology Information (NCBI) human reference genome (hg19), and transition mutations (C:G>T:A and T:A>C:G) were the most frequent substitution types ( Fig. 5a and Additional Table S3). The single-nucleotide variations (SNVs) and short insertions/deletions (InDels), resulting in protein variations, included missense, splice site, stop-gain and stop-loss mutations, frame-shift and in-frame InDels; among them, missense mutation was the most common mutation type (Fig. 5b). We compared the exomic SNVs/ InDels between cell lines and their corresponding cancer tissues and found that their exomic variations were highly consistent, and the accordance ratios were from 91.4 to 94.7% (Additional Table S4), which suggested that these cell lines were suitable representatives of their corresponding penile cancer tissues.
Notably, cell lines 149RCa and 149RM had similar gene variant calls, except few mutations, such as BCL11A:E539K, CASZ1:R277W in 149RM, and FBXW7:S25X, NOTCH2: S220R/P227H/T235S, BCLAF:G208V/R864C, and NIN: C1036F in 149RCa. Whether these unique variations are relevant to their biological differences (such as motility) requires further investigations. Additionally, we detected several deleterious or possibly damaging variants in PSCC cell lines, which were not indexed in the COSMIC database (Table 1). Detailed SNV/InDel information of important functional genes in cell lines and corresponding tumor tissues was shown in Additional Tables S5-S9.

Identification of prioritized copy number variations (CNVs) in penile cancer cell lines
CNV analyses showed that five cell lines presented different amplification/deletion profiles, the whole-genomic CNVs are shown in Fig. 5c. Further analyses revealed that high-frequency gains of oncogenes MYC (5/ 5), PLAG1, NCOA2, RUNX1T1, COX6C, and CHCHD7 (4/5), HOXA13 (3/5), EGFR, HOXA9, HOXA11, ETV1 and PIK3CA (2/5), and losses of tumor suppressors FBXW7, TET2, XPC, and FANCE (2/5) were frequently observed in these cell lines. Surprisingly, we also found that there were partial losses of certain oncogenes (FGFR1, PDGFRA, and so on) and amplifications of certain tumor suppressors (NBN, EXT1, and so on) in these cell lines. However, due to the complexity of molecular interaction in cancer, the roles that these changes play in penile carcinoma need a further investigation. CNV profiles of critical functional genes in these cell lines are shown as an c Circos plots of CNV profiling (inner circle) for copy number change in the whole genome. Red dots denote copy number gain, and green dots denote copy number loss. d Heatmap of copy numbers (CN) for prioritized functional genes. CN ≥2.8 is considered copy number gain (amplification), 2.8>CN>1.4 is considered normal diploid, and CN ≤1.4 is considered copy number loss (deletion). e Circos plots showing the SV spectrums in these PSCC cell lines Table 1 SNVs and InDels of critical genes in penile cancer cell lines Tumor suppressor S188delinsSD S188delinsSD S188delinsSD S188delinsSD S188delinsSD

MAML3
151_161del -Note: X stop codon, * validated by Sanger sequencing, Δ not validated by Sanger sequencing, other variations (without labels * or Δ ) have not been tested by Sanger sequencing, $ variants that were not indexed in the COSMIC70 database integrative heatmap in Fig. 5d, and detailed CNVs information of important functional genes in each cell lines was shown in Additional Tables S10-S14.

Aberrant pathways in penile cell lines
We combined SNVs, InDels, and CNVs to perform KEGG pathway analysis, and determined that genomic alterations in these cell lines were mainly involed in the MAPK, Jak-STAT, TGF-beta, Notch, and apoptosis signaling pathways. Main aberrant pathways and involved gene variations were showed in Table 2.

Genomic structural variations (SVs) in penile cancer cell lines
The distribution and number of SVs are exhibited in Fig. 5e and Additional Table S15. LM156 cells possessed the largest number of SVs (352), while the other four cell lines had similar SV numbers (from 144 to 200). The inter-chromosomal translocation (CTX) and deletion (DEL) were the most frequent SV types. However, no fusion genes were found in the five cell lines.

Discussion
Cell line models are essential tools to study the molecular mechanisms underlying the tumor progression, metastasis, relapses, and treatment resistance [17][18][19][20] . Various tumors, even same tumor, have been confirmed as being driven from different molecular mechanisms [17][18][19] , so it is necessary to obtain cell line models with identified genetic characteristics for the PSCC study. Although a few PSCC cell lines were developed several years ago [10][11][12][13][14] , little molecular mechanisms underlying PSCC initiation and progression have been clarified using these cell line models.
We have successfully established a panel of PSCC cell lines including Penl1 15 and four additional PSCC cell lines: Penl2, 149RCa, 149RM, and LM156, well characterized them in genomic alterations by WGS assay. All cell lines consist of pure, immortalized, and epithelial-like cells without HPV infection and harbor some common or specific biological characteristics and genomic variations. These cell lines present different growth, migratory, invasive, and tumorigenic abilities. Among them, 149RM has the highest mobility but the lower invasion ability, as well as the shortest doubling time, which may contribute to the rapid wound healing. Three cell lines from metastatic LNs, especially Penl2, has stronger invasion potential, thus these cell lines would be better cell models for the investigation of mechanisms underlying LN metastasis in penile cancer. Although 149RM from a nearby scrotal invasion lesion was expected to be more aggressive, it has a lower invasion rate. We supposed that this scrotal lesion was originated by direct contact of primary tumor, but not by invasion of tumor cells. Intriguingly, 149RM had a similar doubling time as 149RCa, whereas the 149RM xenograft tumors were smaller than 149RCa. We presumed that the microenvironment in vivo play more important role in xenografts' growth, the underlying mechanisms need to be further explored.
The WGS data of cell lines revealed that C:G>T:A and T:A>C:G transitions were the most frequent substitution modes in PSCC cells, which were consistent with previous genomic profiling analysis in penile cancer 21 . Interestingly, this mutation pattern differed from that in other solid tumor 22 . This PSCC-specific mutation signature and the etiology of penile cancer need further investigation. Mutation or loss of CDKN2A gene, TP53 mutation, RAS mutation, amplifications of MYC and EGFR genes have been reported in many human malignancies [23][24][25][26] , including penile carcinoma 9,27 , and the clinical data imply that the loss-of-function mutations of tumor suppressor genes and activating variations of oncogenes play a critical role in the carcinogenesis and invasion of PSCC 9,28 . We found that all five cell lines harbor CDKN2A deletion/ mutation, MYC amplification, and TP53 mutations, which strongly support the above findings and hypothesis. These findings are also in accordance with Ali et al.'s comprehensive genomic profiling analysis, which involved 3769 exons of 236 cancer-related genes plus 47 introns from 19 genes frequently rearranged in cancer 7 .
PSCC is a male-specific malignancy, which may be similar to prostate cancer that correlates with androgen stimulation and AR expression 29 . Marchi et al. found that AR was one of top ten driver candidates in penile cancer by a multidimensional integrative analysis 30 . However, compared to normal glans tissues, AR expression in penile cancer tissue was greatly downregulated by increased microRNAs 30 or promoter hypermethylation 31 , which is in accord with our result that the five cell lines established in our laboratory expressed very low levels of AR. It seems that androgen stimulation plays a negative role in penile cancer. Additionally, we found that penile cancer cells did not response to androgen stimulation in vitro. So we are unable to identify whether AR plays a driver candidate role or a passenger role, which needs to be validated in future functional studies.
Our further investigations revealed that all five PSCC cell lines were sensitive to cisplatin, a DNA-damaging anti-cancer drug used in the clinic 32 . Notably, ERCC5 mutations in all of five PSCC cell line, the mutations of FANCD2 in four of five PSCC cell lines, ATM and MSH6 mutations and BRCA2 loss in Penl2, and loss or stop-gain mutations of FBXW7 in Penl1, 149RM, and 149RCA were found, and these genes are all important DNA damage repair-related genes [33][34][35] , which may well account for the cisplatin sensitivity of these cell lines. Therefore, the status of DNA damage repair-related genes may be a promising marker to identify PSCC patients who can benefit from DNA-damaging chemotherapy.  MYC, IL7R, LIFR, JAK2, IFNA4, IFNA2, IFNA5, IFNA7, IFNA8, IFNA14, IFNA16, IFNA21, IFNW1) Previous studies have shown that EGFR is frequently overexpressed in PSCC, and anti-EGFR therapy may be a potential therapeutic option for PSCC patients 36 . We detected EGFR amplification in two cell lines, and EGFR overexpression in most of these cell lines, but they were all resistant to EGFR inhibitors. Activating HRAS and PI3KCA mutations were found to correlate with resistance to anti-EGFR therapy 37,38 , so we presume that HRAS G13V/G13R mutations in 149RCa, 149RM, and LM156 cells and PI3KCA amplification in Penl2 and 149RM cells may relate with their resistance to anti-EGFR therapy, which need to be verify in further investigations. Thus, these PSCC cell lines may be helpful models for further investigation of the mechanisms underlying anti-EGFR resistance and strategies overcoming this resistance in PSCC.
In summary, we established a panel of cell lines derived from PSCC patients with different clinicopathological characteristics, and these cell lines display diverse biological characteristics. WGS analysis revealed that these cell lines harbor a genomic alteration spectrum similar to that in PSCC patients reported by previous studies, and they   (IFNA1, IFNA2, IFNA4, IFNA5, IFNA6, IFNA7, IFNA8, IFNA10,   IFNA13, IFNA14, IFNA16, IFNA17, IFNA21, IFNB1 display their specific gene alterations and amplification/ loss spectrums, suggesting that this panel of PSCC cell lines might be suitable model systems for PSCC research.
The comprehensive biological and genetic information should aid researchers in the selection of representative penile cancer cell lines for investigating molecular mechanisms and screening potential therapeutics.

Cell line establishment and morphological examination
Fresh tumor specimens were obtained from patients in the Department of Urology, Sun Yat-Sen University Cancer Center (SYSUCC). Four new cell lines Penl2, 149RCa, 149RM, and LM156 were established using the same procedure as Penl1, developed previously in our laboratory 15 . To remove cancer-associated fibroblasts, cells were briefly digested with 0.25% trypsin/ethylenediaminetetraacetic acid (Invitrogen) for 4-5 times as previously reported 15,39 . Cells were cultured in DMEM supplemented with 10% FBS (Gibco, Life Technologies) at 37°C with humidified air and 5% CO 2 , subcultured at a ratio of 1:3 to 1:4 and cryopreserved in complete growth medium supplemented with 10% (v/v) dimethyl sulfoxide at 1 × 10 6 cells/mL in liquid nitrogen. All PSCC cell lines were continuously passaged every 3 days for over 50 passages, without any exogenous transfection or stimulation. Cell lines at passage 15-25 were used for further analyses. Cell morphology of cell lines was examined and photographed under a phase-contrast microscope.

Flow cytometry assay
Cells were harvested by trypsinization, fixed with BD Cytofix/Cytoperm solution (BD Bioscience) and stained with Alexa Fluor ® 488-conjugated anti-Pan-CK antibody or corresponding mouse (DA1E) mAb IgG XP ® isotype control (1:50, Cell Signaling Technology) for 30 min at room temperature, followed by a flow cytometry analysis. The results were analyzed with Kaluza software (Beckman Coulter Inc.). The percentage of cells with Pan-CK positive represented the purity of cell lines 15 . The experiments were independently performed for three times, and representative images were shown.

WB assay
WB assay was performed as previously described 40,41 . Briefly, protein samples from cell lysates were separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred to polyvinylidene fluoride membranes. After blocking for non-specific binding, the blots were inoculated with primary antibodies against EGFR (1:2000, Cell Signaling Technology), Vimentin, Pan-CK, AR and glyceraldehyde phosphate dehydrogenase (GAPDH) (1:1000, Santa Cruz Biotechnology), followed by reaction with corresponding HRP-conjugated secondary antibodies. Bands were visualized via Chemidoc Touch (Bio-Rad).

Polymerase chain reaction (PCR)-based assays
Genome DNA was extracted from cell lines and corresponding tumor tissues, and subjected to analyze the infection of HPV and mycoplasma and the STR profiles. HPV infection was analyzed in MyGenostics Inc. (Beijing, China) by quantitative real-time PCR assays with specific primers and probes (Additional Table S16) as described previously, which can detect both free and integrated HPV DNA 42 . Mycoplasma contamination using the PCR Mycoplasma Test Kit (HuaAn Biotech, Hangzhou, China) according to the manufacturer's instruction 15,39 .
The STR profile analysis was performed by PCR amplification 43 using Goldeneye 20A kit (Peoplespot Inc., China) and then processing with the ABI3730xl Genetic Analyzer. The STR profile of each cell line was compared with those from the ATCC, DSMZ, or JRCB databases for reference matching.

Cell proliferation and viability assays
For cell proliferation assay, 5 × 10 3 -1 × 10 4 cells were seeded in 24-well plates in triplicate. Cell were harvested and counted after typan blue staining 44 using a Beckman Z2 Coulter ® Particle Count and Size Analyzer (Beckman Coulter). Cell growth curve was analyzed in an exponential growth phase using Graphpad 5.0 software to obtain cell doubling time.

In vitro migration and invasion assay
The migration ability of cell lines was assessed using a monolayer wound healing assay 45 . Briefly, cells were seeded in 6-well plates and grown to confluence overnight. Cells were slightly scratched with a 200 μL-tip and then washed twice with phosphate buffered saline. Fresh serum-free medium were then added and wound healing was monitored after 12 h of incubation.
Cell invasion was tested using a Transwell assay 46,47 . Briefly, 2 × 10 5 cells in 200 μL serum-free or complete medium were seeded into the Boyden chamber with Matrigel (8-μm pore; BD Falcon, USA). The chambers were put in 24-well plates with medium with 10% FBS to incubate for 24 h. The cells attached to the undersurface of filter membrane were fixed and stained in 0.1% crystal violet. The invaded cells were counted in five random fields (100×) under a microscope.

Tumorigenicity in nude mice
To confirm the tumorigenicity of these cell lines, 5 × 10 6 cells (passage 20-25) suspended in 100 μL 20% Matrigel (BD Biosciences, USA) were injected subcutaneously into the right flanks of 6-week-old female and male nude mice (10 mice per cell line). After 3 weeks, mice were sacrificed and tumors were removed for H&E staining and pathology examination.

WGS and whole-exome sequencing (WES)
WGS analysis for PSCC cell lines (90 Gb PF data for each cell line), WES analysis was performed for cell linecorresponding cancer tissues (Agilent's SureSelect Human All Exon v5 fors exome capture, 10G PF data for each tissue) were performed by Wuxi Apptec (Shanghai, China) using the Illumina HiSeq X sequencing system. The raw data of WGS/WES in this study were deposited at the NCBI (SRA accession number: SRP117294). Highquality reads were aligned against the NCBI human reference genome (hg19) using Burrows-Wheeler Aligner (v0.5.9) with default parameters, followed by passing a hard filter (the detailed description was showed in Additional Methods).
CNVs were calculated based on the read depth of WGS, SVs were analyzed and annotated by the chromosomal locations, while SNVs and InDels were identified by a customized bioinformatics data analysis pipeline. Prioritized functional SNVs and InDels were identified as meet one of the following criteria: (1) Splicing site, frameshift insertion/deletion, stop-gain or stop-loss mutations that are excluded in SNP142common database; (2) Non-synonymous mutations that are excluded in SNP142common database, which functional impacts were "deleterious (D)" predicted by SIFT or "probably damaging/possibly damaging (P/D)" predicted by PolyPhen2; (3) Non-frameshift insertion/deletion mutations that are included in COSMIC (Catalog of Somatic Mutations in Cancer) 70 database and excluded in SNP142common database; (4) SNVs and InDels included in SNP142common database and COSMIC70 database, which functional impacts were "D" predicted by SIFT or "P/D" predicted by PolyPhen2.

Statistical analysis
All of in vitro experiments were performed in triplicate and repeated three times and animal experiments were repeated twice. Heatmap Illustrator (HemI 1.0) 48 , SPSS 20.0, and GraphPad 5.0 software programs were used for statistical analyses in this study. One-way ANOVA was utilized to compare the differences among the five cell lines while Student's t-test was used to compared the difference between two groups. p < 0.05 was considered as significantly different.