Summary

The Cancer Genome Atlas (TCGA) is an excellent tool to interrogate over 30 different cancer types, containing DNA, RNA and protein from over 11,000 patients. Nevertheless, caution should be used when interpreting results from these studies, with careful consideration of their generalisability to the population, especially racial minority populations and the elderly.

The Cancer Genome Atlas

The progress being made in oncology towards true personalised medicine is remarkable. Just a few decades ago the thought of performing whole exome sequencing, RNA sequencing, methylome analyses and broad coverage proteomic analysis on thousands of tumour samples would be beyond a scientist’s or clinician’s dream. Through technological advancements and backing from the US National Institute of Health, The Cancer Genome Atlas (TCGA) project was born. The TCGA has become a high-quality reference guide to the genetic fingerprint of most cancers, and targetable mutations identified in TCGA have become the focus of clinical exploration. With DNA, RNA and protein analysed from over 11,000 tumours across 33 cancer types, an unparalleled amount of data has been gathered that the global community is still unravelling. However, the TCGA will only by useful globally if the catalogued sequences are representative. A recent article published in this issue of BJC further explores this issue.1

Generalisability of TCGA

Much of the focus of TCGA, rightfully so, has been on genomic alterations across a tumour’s genetic code. The associated linked variables, such as age, race, gender and stage of diagnosis are often a dismissed co-variable when performing genomic analyses. Yet, it is clear that patients of different age or race have variable prognosis and tumour biology. Classical examples of this are the extreme difference in prognosis of thyroid or endometrial cancer by age, or the difference in prevalence of EGFR mutations between Asians and Whites.2 Therefore, we must now question how well the TCGA represents patients with cancer within the USA, let alone the global community. If we pursue medical advancements based on genomic sequencing efforts that are not representative of the population, then we have the potential to widen many of the already pervasive disparities in oncology.

To investigate this, multiple studies have been performed to assess how representative the TCGA dataset is compared to the USA cancer population, as inferred from the Surveillance, Epidemiology, and End Results (SEER) database. Two such studies were specifically dedicated to disparities in race and age, respectively.3,4 The first study, from our group and published in 2016, shed light on the low absolute number of racial minorities within TCGA.3 Of the 10 tumour types studied comprising 5729 samples, 77% (n = 4389) were white, 12% (n = 660) were black, 3% (n = 173) were Asian, 3% (n = 149) were Hispanic, and less than 0.5% combined were from patients of Native Hawaiian, Pacific Islander, Alaskan Native or American Indian decent. Furthermore, even if the relative distribution by race was similar to the actual population, usually sequencing fewer than 50 Black patients per tumour type has powerful implications to the generalisability of results from the TCGA. Spratt et al.3 performed a series of power calculations to demonstrate that there is currently insufficient power to detect even a 10% mutational frequency in Black patients across many tumour types within TCGA. This means that there could easily be a relatively frequent mutation present in Black patients that has yet to be detected, given the limited sample size.

The second study, also by our group, investigated disparities within the elderly population and their participation within TCGA.4 This study showed that patients aged 80 to 99 years old were underrepresented in all 9 cancer types assessed, with a median TCGA underrepresentation of 167% compared to SEER. Similar to the racial disparities study, no cancer type had enough elderly patients to detect even a 10% mutational frequency. This is not likely to be the fault of TCGA, as elderly patients, especially over 80 years old, rarely undergo radical surgery and thus there are fewer tissues available for sequencing. However, given the close correlation between age and oncological outcomes in a multitude of tumour types, we are probably missing valuable biological insights by the omission of these patients from sequencing efforts. The size of the elderly population is rapidly growing, and the relevance of understanding the biology of their cancer is of increasing importance. Within this study by Wahl et al.4 the investigators specifically looked at gene expression in young versus older men with prostate cancer, and identified multiple differentially expressed genes involved with androgen signalling and DNA repair, further supporting the influence age has on cancer biology.4

In the current issue, Wang et al.1 report a well-performed analysis of all cancer types within TCGA. They have completed the most comprehensive comparison of patient characteristics within TCGA and SEER, not only for race and age, but also for gender, stage, and survival outcomes. They confirmed the previous findings, and expand upon them to demonstrate that 20 cancer types within TCGA contain significantly younger patients than SEER, 13 had disparities in relative racial distributions, and 25 cancer types had different stage distributions. Furthermore, the mean 1-year survival estimate for 27 of the 33 tumour types within TCGA were longer than SEER estimates. These studies clearly demonstrate important differences between patients contained within TCGA and the general cancer population.

Beyond TCGA

The overwhelming genomic data gathered to date has been almost exclusively on individuals of European ancestry, with an estimated 10:1 ratio of genome-wide association studies (GWAS) from European ancestry vs all other groups combined. For example, one study by Need et al highlighted this point by showing that over 1.5 million patients of European ancestry have been studied in GWAS cohorts, compared with only ~1000 Hispanic patients (0.07%).5 This has been due in part to Europe and the United States providing a large majority of genomic samples and data, as well as a low percent of minorities taking part in genomic sequencing.

This is important, as there has been an increasing trend towards somatic tumour profiling without combined germline testing. Garofalo et al. nicely demonstrated that in studies without matched germline testing, there was a higher percentage for non-white patients to have false positive results.6 The higher false positive variants in non-white patients highlights the limitations of utilising genomic reference databases, as these minorities are often not well represented. Thus, the clinical impact of disparities in genomic sequencing can impact real-world clinical results from tumour profiling.

Implications

The implications and downstream impact of these findings is unclear. It is plausible that, in an effort to further develop precision medicine through advanced technology and big data, the scientific community has inadvertently widened an already existent disparity for racial minorities and the elderly. Similar to the low enrolment of minorities and the elderly within clinical trials, dedicated efforts have improved clinical trial participation.7 This is critical, because just this year at the Annual Society of Clinical Oncology 2018 meeting, there were two studies that compared the results of Black versus White men with advanced prostate cancer enroled on various clinical trials, and showed equivalent, if not potentially superior outcomes for the Black men.8,9 It is possible that with further investigation of the populations under-represented within TCGA that additional clinically meaningful findings could be generated.

Going forward, dedicated initiatives will be required to increase the enrolment of genomic sequencing of racial minorities, especially when there are known disparities in clinical outcomes by race, and the elderly. Given the ability to now perform next-generation sequencing on limited amounts of biopsy tissue, this should aid in a broader ability to capture these under-represented populations. New discoveries often lead to more questions and challenges, and we must unite to ensure these benefits reach everyone.