Introduction

Proteins are translated from mRNAs, which are transcribed from genomic DNA. It has long been assumed that DNA sequences and corresponding RNA transcripts are almost identical; a recent discovery, however, revealed widespread differences between them1. Cheung and coworkers sequenced and compared DNA and RNA sequences from B cells of 27 human individuals and found more than 10,000 sites that showed RNA-DNA differences (RDD). Many of these RDDs were observed in other tissues, including primary skin cells and the brain, from other unrelated individuals. Mass spectrometry showed that sequences of many proteins corresponded to the RNA variants, rather than genomic DNA, indicating that the RNA forms were translated into proteins1.

This breakthrough represents a largely unexplored aspect of human genome variation. Traditionally, genetic studies have been focused on DNA sequence polymorphisms, but because of the presence of RDDs, future studies will likely also need to include RNA variants. It has been speculated that RDDs can affect disease susceptibility and manifestations1; however, almost nothing is known about how RDDs are related to disease.

Here, we have examined whether RDD distributions differ between proto-oncogenes and tumor suppressor genes. A proto-oncogene, which usually encodes proteins that regulate cell growth and differentiation, is a gene that, due to mutations, can become an oncogene to induce tumors2. In contrast, a tumor suppressor gene, or anti-oncogene, which usually encodes proteins that repress cell cycle or promote apoptosis, is a gene that protects humans from tumor induction3. Consequently, we found that proto-oncogenes have significantly rarer RDDs than tumor suppressor genes and this is especially pronounced for RDDs that lead to non-synonymous amino acid changes.

Results

To examine if RDDs have different occurrence between tumor suppressor genes and proto-oncogenes, we compared the RDD distributions between these 2 kinds of genes. Maizels and coworkers compiled a database of tumor suppressor genes and proto-oncogenes by extensively searching the Online Mendelian Inheritance in Man (OMIM) database as a primary source, followed by confirmation of gene classification based on published literatures4.

A gene can have RDDs in the 5′ untranslated region (UTR), 3′ UTR and coding exons and those in coding exons can be either synonymous or non-synonymous. The database contained 55 tumor suppressor genes and 95 proto-oncogenes. We calculated the number of RDDs per gene for the 2 classes of genes. Tumor suppressor genes and proto-oncogenes had 27 and 14 RDDs, respectively, in coding exons. The number of RDD per gene (0.491 vs. 0.147) was more than 3 times higher in the former than the latter (Fig. 1). We performed the chi-square tests, which showed that the difference was statistically significant (P < 0.01). Interestingly, the number of RDD at the 3′ UTR and 5′ UTR showed no significant differences (Fig. 1). Because coding regions usually have a more dominant role than 3′UTR and 5′ UTR, this result seems to suggest that the RDD number difference in tumor suppressor genes and proto-oncogenes is biologically meaningful.

Figure 1
figure 1

RNA-DNA differences (RDDs) are rarer in proto-oncogenes than in tumor suppressor genes.

The number of coding-region RDDs is significantly lower in proto-oncogenes than in tumor suppressor genes, while RDDs at 3′ UTR and 5′ UTR show no significant difference.

If RDDs are indeed related to biological functions of the genes, we would expect that the difference between the 2 kinds of genes to become more pronounced in RDDs that lead to non-synonymous amino acid changes, compared to synonymous RDDs. The numbers of non-synonymous RDDs for tumor suppressor genes and proto-oncogenes were 20 and 9, respectively (P < 0.01). The RDD number per gene in tumor suppressor genes was about 4 times of that in proto-oncogenes (0.364 vs. 0.095) (Fig. 2). Therefore, these data is consistent with the notion that RDDs are related to gene functions.

Figure 2
figure 2

Significantly lower number of non-synonymous RDDs in proto-oncogenes than in tumor suppressor genes.

Each gene can have either one or more than one RDDs. In addition to the RDD numbers per gene, we thus also calculated the number of genes that contain RDDs and consistent results were obtained. The number of genes that contained RDD was 36 (65.45%) and 24 (25.26%) in tumor suppressor genes and proto-oncogenes, respectively (p < 0.0001). The number of genes that contained non-synonymous RDDs in tumor suppressor genes and proto-oncogenes were 15 (27.27%) and 7 (7.37%) (P = 0.0015) and those for synonymous RDDs were 6 (10.91%) and 5 (5.26%), respectively (Fig. 3). Therefore the number of genes that contained RDD was still about 4 times in tumor suppressor genes than in proto-oncogenes.

Figure 3
figure 3

Significantly lower percentage of RDD-containing genes in proto-oncogenes than in tumor suppressor genes.

To facilitate the research on RDDs, we created a database of RNA-DNA differences (DRDD). The database contains detailed information about RDDs, such as RDD location, involved base changes, involved amino acid changes, if any and sequences and names of RDD-containing genes. The information is stored and operated by an open-source database management system, MySQL, which allows rapid data retrieval. Users can browse and search for RDD records and can also Blast the query genes against the database. The database will be periodically updated to incorporate newly discovered RDDs, e.g., those in the reference5. DRDD can be accessed from the website: http://tubic.tju.edu.cn/drdd.

Discussion

Some mechanisms, such as transcriptional errors6 and RNA editing7,8, are known to explain the exceptions for the complete fidelity from genomic DNA to mRNA. But transcription errors are very uncommon because of proofreading and repair mechanism9 and RNA editing mainly only involves A to G transition8. The RDDs identified in1 included a large number (in the order of thousands) RDD events for all the possible 12 categories, that is, A to C/G/T, C to A/G/T, G to A/C/T and T to A/C/G. Therefore, these RDDs suggest unknown mechanisms that increase human genetic variation and diversify the human proteome because many RNA variants are translated into proteins that were identified by mass spectrometry1.

Our results, for the first time, show that RDDs are much rarer in proto-oncogenes than in tumor suppressor genes. The database was compiled in the year 20064 and therefore, the result will still need to be further validated by other studies, which hopefully include larger and more updated datasets. Nevertheless, because the difference of RDD numbers in tumor suppressor genes and proto-oncogenes is relatively large (4 times of non-synonymous RDDs in the former vs. the latter), the conclusion is not likely to change.

Our results indicate that proto-oncogenes are much intolerable to RDDs than tumor suppressor genes. One possible mechanism is that, unlike proto-oncogenes, tumor suppressor genes usually follow a two-hit hypothesis10, which suggests that both alleles that code for a particular gene must be affected to cause tumor. If only one allele is affected, the other copy of the gene can still function to protect the cell. In contrast, for oncogenes, mutations in one allele can lead to tumor. Therefore, the requirement for affecting both alleles in tumor suppressor genes seems to have a ‘buffer’ effect for these genes to tolerate more RDDs, compared to proto-oncogenes.

In summary, RDDs, a newly discovered phenomenon, represent a largely unexplored area of human genome variation. Although it has been speculated that RDDs are involved in disease susceptibility and manifestations, no evidence is found to relate RDDs to disease. We here show that RDDs are rarer in proto-oncogenes than in tumor suppressor genes; the number of RDDs in coding exons, but not in 3′UTR and 5′UTR, is significantly lower in the former than the latter and this trend is especially pronounced in RDDs that cause non-synonymous amino acid changes. This result suggests that proto-oncogenes are more intolerable to RDDs than tumor suppressor genes. A potential mechanism is that, unlike proto-oncogenes, the requirement of tumor suppressor genes to have both allele affected to cause tumor ‘buffers’ these genes to tolerate more RDDs.

During proofreading, we noticed a recent publication11 which suggested that rather than RNA editing events, these RDDs can be the result of accurate transcription from paralogous genes, making the issue of wide-spread human RDDs highly controversial. Therefore, the prevalence of human RDDs reported in the reference1 needs to be further confirmed by more studies in more tissues and with more disease conditions. The observation made by Schrider et al.11 appears to explain the majority of the RDDs observed in the reference1. In that case, an alternative explanation for the RDD difference between proto-oncogenes and tumor suppressor genes is that rather than RNA editing mechanisms, it may in fact reflect the different distribution of paralogous genes between the 2 gene classes. This possibility, however, needs to be addressed by future studies.

Methods

The 10,210 RDDs, which reside in 4,741 known genes in the human genome, were provided in reference1. A database of tumor suppressor genes and proto-oncogenes was used to compare RDD difference between the 2 classes of genes. The database, which was based on extensively searching the Online Mendelian Inheritance in Man (OMIM) database4, contained 55 tumor suppressor genes and 95 proto-oncogenes. HGNC symbols, which have been assigned by the HUGO Gene Nomenclature Committee (HGNC) as unique gene symbols and names, were used to link the 4,741 known genes containing RDDs with the tumor suppressor genes and proto-oncogenes. Based on the above information, a program was written in the language of C++ to search for the tumor suppressor genes and proto-oncogenes containing RDDs. Detailed information about the tumor suppressor genes and proto-oncogenes containing non-synonymous RDDs is shown in Table 1. Either chi-square or Fisher exact tests were used to compare the number of RDDs between the 2 classes of genes and P values less than 0.01 were considered statistically significant.

Table 1 Non-synonymous RNA-DNA differences in tumor suppressor genes and proto-oncogenes a,b