A pan-cancer analysis of synonymous mutations

Synonymous mutations have been viewed as silent mutations, since they only affect the DNA and mRNA, but not the amino acid sequence of the resulting protein. Nonetheless, recent studies suggest their significant impact on splicing, RNA stability, RNA folding, translation or co-translational protein folding. Hence, we compile 659194 synonymous mutations found in human cancer and characterize their properties. We provide the user-friendly, comprehensive resource for synonymous mutations in cancer, SynMICdb (http://SynMICdb.dkfz.de), which also contains orthogonal information about gene annotation, recurrence, mutation loads, cancer association, conservation, alternative events, impact on mRNA structure and a SynMICdb score. Notably, synonymous and missense mutations are depleted at the 5'-end of the coding sequence as well as at the ends of internal exons independent of mutational signatures. For patient-derived synonymous mutations in the oncogene KRAS, we indicate that single point mutations can have a relevant impact on expression as well as on mRNA secondary structure.

Numbered rectangular boxes correspond to regions of predicted local structural accessibility changes as shown in Fig. 6c  The Synonymous Mutations In Cancer database (SynMICdb) is a curated database of synonymous mutations in cancer. SynMICdb allows biologists to easily extract and download synonymous mutations in cancer as well as orthogonal data using multiple search options. It also integrates the predicted impact of synonymous mutations on structural changes in RNA using structural prediction algorithms.
Several independent search criteria are available in SynMICdb such as the gene name, the genomic coordinates, the position of the mutations within the coding sequence (CDS), their evolutionary conservation, the organ system, organ and tumor type, their link to cancer (Cancer Gene Census) or the SynMICdb score. Each search option is described in detail below.

Search by Gene
This feature allows the user to search for synonymous mutations present in a gene of interest using one of the following nomenclatures ( Figure 1): 1. HGNC gene symbol 2. Gene name 3. ENSEMBL ID Alias names for genes (P53 for TP53) are allowed and the search is case-insensitive. For example, Figure 2 shows the results page for the gene KRAS. The summary information in Cancer Gene Census 1 for the gene is shown. The link to Genecards 2 for the gene is also provided.

Figure 2. Example of result page for "Search by Gene" in SynMICdb.
The result columns provide the following information: • Mutation ID: Unique identifier of each mutation (as present in COSMIC database).
• Gene Name: Abbreviated name of the gene.
• Transcript ID: ENSEMBL transcript ID for the corresponding mutation.
• Mutation nt: Number and nucleotide change of mutation: e.g. c.36T>G indicates a change of coding nucleotide number 36 from T to G. • Mutation genome position: Genomic coordinates of each respective mutation in human genome assembly GRCh38 (chromosome:start-end). • SynMICdb Score: The SynMICdb score shall reflect the probable impact of the synonymous mutation and is based on the mutation frequency, the probability due to mutational bias by mutation signatures, the average mutation load of the tumors with this mutation, the evolutionary conservation, the listing of the affected gene as cancer gene in the Cancer Gene Census, the listing of the mutation in the SNPdb, the FATHMM-MKL score, the CADD score and the predicted impact on RNA secondary structure. The score ranges from -4 to +12 and high numbers indicate a higher likelihood of a functional impact of the synonymous mutation. The distribution of the SynMICdb score is illustrated by the following 8.08 Thus, a SynMICdb score of above 4.38 indicates that the synonymous mutation is among the top 1% of synonymous mutations in this study.
• Average Mutation Load: This column indicates the average number of mutations found in the genome-wide analysis of the tumor samples harboring this specific mutation. • Alternative Events: This column provides information about alternative events as indicated by GENCODE like alternative splicing and other events that result in more than a single transcript from the same gene characterized by the UCSC genome browser 3 . • SNP: This column provides information whether this mutation has been listed as a Single Nucleotide Polymorphism (SNP) in the SNP database. y = yes, n = no. • Conservation: This column lists the conservation scores of human vs. 99 vertebrate genomes (PhastCons100). The score ranges between 0 to 1 with 1 indicating the highest conservation levels among the 100 species. • Structure Change Score (remuRNA): This column depicts scores for structural change predictions for the respective mutation calculated by remuRNA. The score ranges from -5 to +20 and high numbers indicate a higher likelihood of a structural change caused by the mutation. Details for this analysis for ESEs and ESSs separately for the two prediction algorithms are provided in the full data table upon "Download Full Results". Please note that 23 motifs were assigned "ESE" as well as "ESS" properties in SpliceAidF and hence are listed separately as "ESE & ESS". • Signature-normalized Frequency: In this column, the Frequency of the mutation has been corrected for the mutation bias due to mutational signatures frequently observed in cancer -thus, the Frequency has been multiplied with (1 -p) with p indicating the probability of the nucleotide change according to the most prevalent mutational signature in cancer.
• Frequency: This column shows the recurrence level of each mutation. The number in this column represents the total number of tumor samples in which the respective mutation was found.
By default, the results are grouped by Mutation ID and sorted by their frequency. For each Mutation ID, only one line is given in this view.
Detailed information for each sample can be viewed by clicking on the ⊕ icon. Figure 3 shows an example of sample information for mutation ID COSM253757.

Download Options
The user can download the results using one of the following two options: • Download Table: This button allows the user to download the displayed results as a csv file. • Download Full Results: This button allows user to download the displayed results plus additional information like affected codon and amino acid, the mutation load of each affected sample, the position of the mutation within the CDS as well as the classification by the Cancer Gene Census (CGC).

Search by Position in CDS
This option allows the user to search for mutations on the basis of their location within the coding sequence (CDS) of genes (e.g. Figure 4 shows mutations present within the first 20% of the CDS). This facilitates the user to study synonymous mutation within a specific region of interest, for example towards the 5'-end of the coding region within the translation initiation and ramping region.

Search by Region
This option allows the user to search for mutations present within a region defined by genomic coordinates of human genome assembly GRChg38 (note: chromosome 23 = X, 24 = Y and 25 = M). For example, Figure 5 shows the list of mutations present in chromosome 5 region 50000-500000.

Search by Organ
This option allows the user to search for synonymous mutations in cancer on the basis of their site of origin in a hierarchical manner. The user first selects an organ system and then a site and histology of interest. Nine organ systems are listed (as depicted in Figure 6): Cardiovascular System, Digestive System, Endocrine System, Genitourinary System, Integumentary System, Lymphatic System, Musculoskeletal System, Nervous System and Respiratory System. After selecting the organ system, the user selects first the primary site and optionally the histology of interest. The following example depicts a search and result of synonymous mutations present in the "Digestive System" as organ system following the selection of the "Large Intestine" as primary site (Figure 7) and "Adenocarcinoma" as histology (Figure 8).

Advanced search
This search option allows the combination of multiple search parameters and offers additional search criteria including Gene Names, Cancer Gene Census genes, Conservation, Location within CDS, SynMICdb Score, Organ System, Site, and Histology of synonymous mutations. Here, users can also perform batch searches by providing a list of up to 100 genes ( Figure 9). Below is an example of search for synonymous mutations that are >80% conserved and only present in the first 30% of the CDS (Figure 10). The user can limit the output to genes listed as cancer genes in the Cancer Gene Census (CGC) database by clicking the "Limit to CGC genes" option.