Integrated proteogenomic characterization of medullary thyroid carcinoma

Medullary thyroid carcinoma (MTC) is a rare neuroendocrine malignancy derived from parafollicular cells (C cells) of the thyroid. Here we presented a comprehensive multi-omics landscape of 102 MTCs through whole-exome sequencing, RNA sequencing, DNA methylation array, proteomic and phosphoproteomic profiling. Integrated analyses identified BRAF and NF1 as novel driver genes in addition to the well-characterized RET and RAS proto-oncogenes. Proteome-based stratification of MTCs revealed three molecularly heterogeneous subtypes named as: (1) Metabolic, (2) Basal and (3) Mesenchymal, which are distinct in genetic drivers, epigenetic modification profiles, clinicopathologic factors and clinical outcomes. Furthermore, we explored putative therapeutic targets of each proteomic subtype, and found that two tenascin family members TNC/TNXB might serve as potential prognostic biomarkers for MTC. Collectively, our study expands the knowledge of MTC biology and therapeutic vulnerabilities, which may serve as an important resource for future investigation on this malignancy.

About 10~15 mg wet-weight for each tumor sample was used for RNA extraction.
Total RNA was extracted from cryopulverized tumor aliquots using the TRIzol reagent (Invitrogen). RNA purity and quantification were measured with a NanoDrop 2000 spectrophotometer (Thermo Scientific), and RNA integrity was evaluated by Agilent 2100 Bioanalyzer (Agilent Technologies). Samples with RNA integrity number > 6.0 were considered to be of high quality and acceptable for further sequencing.

Whole Exome Sequencing
For WES, qualified genomic DNA from the 102 tumor/non-tumor paired samples (tumor: n = 102; normal thyroid tissue: n = 86; blood: n = 16) were prepared for library construction (560 ng DNA for each sample). The WES libraries were then prepared and captured with an Agilent SureSelect Human All Exon v6 kit (Agilent Technologies) according to the manufacturer's protocol. Afterwards, pooled libraries were sequenced on Illumina NovaSeq 6000 system (Illumina) with 150 bp paired-end reads.

RNA Sequencing
For RNA-Seq, 101 tumor samples in our study cohort had qualified total RNA for subsequent sequencing, and 500 ng RNA of each sample was used for library construction. Ribosomal RNA was depleted from total RNA with Ribo-Zero Gold rRNA Removal Kit (Illumina). Subsequently, the sequencing libraries were constructed with TruSeq Stranded Total RNA Library Prep Kit (Illumina) following the manufacturer's instructions, and library quality was assessed on an Agilent 2100 Bioanalyzer. Then the libraries were sequenced on an Illumina NovaSeq 6000 system with 150 bp paired-end read length.

Illumina Infinium MethylationEPIC BeadChip Array
After WES, a total of 78 tumor samples in our cohort had remaining tissue aliquots sufficient for DNA methylation array. About 10~25 mg wet-weight for each MTC tumor sample was used for DNA extraction, and 500 ng genomic DNA from each tumor was bisulfite treated using the EZ DNA methylation-Gold Kit (Zymo Research).
Bisulfite-converted DNA was analyzed on an 8-sample version Infinium MethylationEPIC 850K BeadChip (Illumina) capturing > 850,000 methylation sites per sample. This EPIC array includes sample plating, bisulfite conversion, and methylation array processing. After scanning, the data were processed through an automated genotype calling pipeline provided by the manufacturer. The raw data were available as IDAT files and a sample sheet.

Protein Extraction and Digestion
For all 102 tumors in our cohort, approximately 25-100 mg of each cryopulverized tumor sample was lysed with 300 μL SDS lysis buffer supplemented with 1mM PMSF, followed by 3 min of sonication (1 second on and 1 second off, 80 Watts power) (Ningbo Scientz Biotechnology). After sonication, the lysate was centrifuged at 15000 g, 4℃ for 15 min twice to remove insoluble particles and precipitations, and protein concentration was determined by BCA assay (Thermo Scientific).
Based on the concentration determined by the BCA assay, 50 µg of protein per sample was used in subsequent process. The protein solution was incubated with 5mM dithiothreitol (DTT) at 55°C for 60 min, and was subsequently alkylated with a final concentration of 10 mM iodoacetamide for 30 min at room temperature (RT) in the dark.
Followed by protein precipitation and collection, the precipitates were redissolved with 100 μL 100 mM triethyl ammonium bicarbonate (TEAB), and then digested by sequencing grade modified trypsin (Promega) at 1:50 enzyme-to-substrate ratio overnight at 37°C. The digested peptides were acidified by 100% formic acid (FA, Thermo Scientific) to 1% of the final concentration of FA to adjust the pH value to ~2.
Finally, the tryptic peptides were desalted on reversed-phase C18-Reverse-Phase SPE Column (Waters) and dried by Speed-Vac (Thermo Scientific). ( Supplementary Fig. S2a, b) The DDA-PASEF spectral library was loaded onto Spectronaut TM (version 15.3.210906.50606). Subsequently, 102 DIA runs for global proteomics and 74 DIA runs for phosphoproteomics were analyzed by Spectronaut TM to search the constructed DDA spectral library, respectively. In addition to the search parameters for the DDA library generation mentioned above, the main parameters of the software also included the following settings: retention time prediction type was dynamic iRT, interference on MS2 level correction was enabled, and cross run normalization was enabled. All results on protein and precursor levels were filtered based on a q-value cutoff of 0.01 (equivalent to FDR < 1%).

Phosphopeptide Enrichment by IMAC
By searching the constructed library with Spectronaut TM , a total of 64305 peptides from 7974 proteins were obtained for the proteomic analysis, 61179 of which were unique peptides. The phosphoproteomic analysis then yielded a total of 7417 phosphopeptides (covering 7566 confidently localized phosphosites), 7214 of which were unique phosphopeptides. The abundances of peptides were all quantified based on the area under the extracted ion chromatogram (XIC) peaks of all assigned fragments that passed filtering. For each protein with at least one matching unique peptide, the abundance was quantified using the Top-3 approach peptides abundance score. To be specific, the Top-3 approach calculated the total protein ion intensity for the respective proteins by using the median intensities from the three most intense peptides matching to a specific protein, if the number of unique peptides from a specific protein was < 3, then all the 1-2 peptides were used for calculation) [2][3][4][5] . For phosphoproteomic analysis, peptide grouping was performed based on modified peptides, and site-specific quantification was counted from the quantity of phosphorylation sequences according