Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases

Wang, Xiaoyan; Steensma, Joseph T.; Bailey, Matthew H.; Feng, Qianxi; Padda, Hannah; Johnson, Kimberly J.

doi:10.1038/s41416-018-0140-8

Download PDF

Article
Open access
Published: 21 August 2018

Epidemiology

Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases

Xiaoyan Wang¹,
Joseph T. Steensma¹,
Matthew H. Bailey^2,3,
Qianxi Feng¹,
Hannah Padda¹ &
…
Kimberly J. Johnson^1,4

British Journal of Cancer volume 119, pages 885–892 (2018)Cite this article

1978 Accesses
20 Citations
Metrics details

Subjects

Abstract

Background

Despite anecdotal reports of differences in clinical and demographic characteristics of The Cancer Genome Atlas (TCGA) relative to general population cancer cases, differences have not been systematically evaluated.

Methods

Data from 11,160 cases with 33 cancer types were ascertained from TCGA data portal. Corresponding data from the Surveillance, Epidemiology, and End Results (SEER) 18 and North American Association of Central Cancer Registries databases were obtained. Differences in characteristics were compared using Student’s t, Chi-square, and Fisher’s exact tests. Differences in mean survival months were assessed using restricted mean survival time analysis and generalised linear model.

Results

TCGA cases were 3.9 years (95% CI 1.7–6.2) younger on average than SEER cases, with a significantly younger mean age for 20/33 cancer types. Although most cancer types had a similar sex distribution, race and stage at diagnosis distributions were disproportional for 13/18 and 25/26 assessed cancer types, respectively. Using 12 months as an end point, the observed mean survival months were longer for 27 of 33 TCGA cancer types.

Conclusions

Differences exist in the characteristics of TCGA vs. general population cancer cases. Our study highlights population subgroups where increased sample collection is warranted to increase the applicability of cancer genomic research results to all individuals.

The role of genomics in global cancer prevention

Article 24 September 2020

The changing landscape of cancer in the USA — opportunities for advancing prevention and treatment

Article 28 May 2020

Genome-wide association study identifies tumor anatomical site-specific risk variants for colorectal cancer survival

Article Open access 07 January 2022

Introduction

In recent years, progress in genome sequencing technologies and bioinformatics has provided enormous gains in understanding of the molecular aberrations associated with the development of various cancers. The emergence of publicly available cancer genomic datasets, including The Cancer Genome Atlas (TCGA), facilitates the comprehensive understanding of the molecular pathogenesis of cancer and is allowing for the development of new strategies to improve cancer diagnosis, therapy, and prevention. By analysing these publicly available genomic data, many novel disease-associated genes have been uncovered.^1,2

TCGA was formed in 2005 when the U.S. National Cancer and National Human Genome Research Institutes teamed together to support the launch of the project to comprehensively map various cancer genomic changes. To date, more than 11,000 individuals with 33 cancer types have been included in the cohort.^3,4 These data have thus far contributed to >2000 studies of various cancers in PubMed.

The cohort composition for each cancer type is an important consideration since the results generated from these cases may be used to make inferences about the respective cancer type among the general population. Prior studies have shown that race, which is often used as a proxy for ancestry and social exposures, is related to the pathogenesis of cancer and different genetic backgrounds in common tumour types may influence clinical outcome and response to therapy.^5,6,7 Evidence has shown that somatic mutation frequency differs by race in various cancer types,^8,9,10 implying that factors associated with race can impact the somatic mutation landscape. Other evidence also highlights the implications of sex and age dissimilarities in genetic susceptibility to cancer.^11,12,13 For these reasons and because TCGA data was assembled mainly from an eligible convenience sample of cancer patients with strict sample selection criteria,¹⁴ it is important to understand similarities and differences in the characteristics of individuals who have contributed samples to TCGA relative to those of the general population of individuals diagnosed with cancer. A previous study of TCGA cases found that race/ethnicity disparities exist relative to the U.S. general population for ten cancer types examined comprising 5729 cases.¹⁵ Another study that analysed nine different cancer types in TCGA indicated a dissimilar age distribution in comparison with corresponding cases in the Surveillance, Epidemiology, and End Results (SEER) database.¹⁶ However, differences in demographic and clinical characteristics beyond race/ethnicity and age between members of the TCGA and the general U.S. population of cancer cases have not been systematically characterised.

In this study, we extend the results from previous studies by comparing demographic and clinical characteristics (age at diagnosis, sex, race, stage at diagnosis, and survival months) between TCGA cases with 33 cancer types and cases in two population-based databases: (1) the SEER 18 database that currently covers ~28% of the U.S. population,¹⁷ and (2) the U.S. combined registries of North American Association of Central Cancer Registries (NAACCR) that covers cancer registrations in all 50 states and the District of Columbia.¹⁸

Methods

Population

Three separate data sources were used in this study: TCGA,¹⁹ the SEER 18 database,¹⁷ and the NAACCR public use dataset.²⁰ Data from individuals diagnosed with 33 cancer types were extracted from TCGA. No duplicate cases were found across various cancer types as determined by matching TCGA case IDs. Individuals with corresponding cancer types in SEER were identified using the third edition of the International Classification of Diseases for Oncology (ICD-O-3) by primary site and histology/behavior (Supplementary Table 1). To compare TCGA cases to a contemporary population of individuals diagnosed with cancer, only cases diagnosed with a primary malignancy from 2010 to 2013 in SEER were included. Since SEER intentionally oversamples U.S. minority populations,²¹ we used data from NAACCR to compare race distributions. This public use dataset published in the annual Cancer in North American (CiNA) Volumes covers cancer registrations in all 50 states and the District of Columbia, approaching 100% coverage of the U.S. population in the most recent time period.²² The most current five years (2009–2013) of data for U.S. and Canadian individuals diagnosed with cancer were available in this dataset. In this study, only U.S. cancer cases with available race data were included. The corresponding cancer types in NAACCR were defined using the cancer sites as denoted in Supplementary Table 1.

Variables

XML files from TCGA containing data on demographics, cancer variables, and follow-up status were downloaded from the National Cancer Institute Genomic Data Commons data portal¹⁹ on 22 December 2016. Python 3.6.0 was used to parse these files and extract the variables. Demographic data including diagnosis age, sex, and race were extracted from the “clin_shared:age_at_initial_pathologic_diagnosis”, “shared:gender”, “clin_shared:race” fields. Race was categorised as White, Black (African American), and Other (Asian, American Indian, or Alaska Native). Ethnicity was not included in this analysis due to the large proportion (24%) of cases with missing data for this field. Clinical information was extracted from the “shared_stage:clinical_stage”, “shared_stage:pathological_stage”, “shared_last_contact_days_to”, and “shared_death_days_to” fields. Stage was defined according to American Joint Committee on Cancer (AJCC) staging that includes categories I, II, III, and IV. Survival months were calculated using the “shared_last_contact_days_to” field for cases who were still alive and “shared_death_days_to field” for cases who were deceased during the follow-up period divided by days in a month (365.24/12).²³ Similarly, the demographic and clinical data of the 33 corresponding cancers were extracted from the SEER 18 database using SEER*Stat 8.3.4. Diagnosis age was based on the SEER variable “Age at diagnosis”. Race classifications were based on the “Race recode (W, B, AI, API)” variable and defined the same as above. Stage at diagnosis was defined using the “Derived AJCC Stage Group (7th edition 2010+)” variable. Survival months were defined using the “Survival months” variable. In NAACCR, the race categories were based on the “Race (Includes Hispanic)” variable and defined the same as for TCGA.

Statistical analysis

Stata version 14 was used for all analyses. Student’s t-test and Cohen’s d, a measure of effect size, were used to identify and quantify the statistical differences and effect sizes in diagnosis age. Cohen’s d is calculated as the difference between two means divided by the pooled standard deviation.²⁴ By convention, Cohen’s d ≥ 0.3 indicates at least a moderate effect size. Ordinary least squares regression was used to estimate the overall mean difference in diagnosis age between TCGA and SEER cases with adjustment for cancer types. Chi-square and Fisher’s exact tests were used to identify proportion differences in sex, race, and stage. Additionally, for race and stage comparisons, adjusted residuals were used to determine categories with the largest difference relative to sample size. An adjusted residual ≥ 2.0 indicates that there was a significantly greater proportion of a particular race or stage category among TCGA cases than in the comparison population (i.e., NAACCR or SEER), while an adjusted residual ≤ −2.0 indicates a significantly lower proportion. We also quantified the mean all-cause survival months using restricted mean survival time (RMST) analysis²⁵ using 12 months as the end point to ensure that all TCGA cases that were included have the same window of observation. Since all TCGA cases were diagnosed prior to 2014, all had at least 12 months of follow-up time except for the cases who died during this period. For cases with over 12 survival months, the survival months were truncated at 12. The RMST approach is valid for any distribution of time to event.^{25,26,27,28,29} The between-group difference in mean survival with corresponding 95% confidence intervals (CIs) was estimated at 12-month horizon with adjustment for diagnosis age, sex, race, and stage if available for a specific cancer type, and a subsequent generalised linear model with robust standard errors. Statistical tests for all analyses were two-tailed tests and the critical value for alpha for all tests was 0.05.

Results

Of 11,160 TCGA cases with 33 cancer types diagnosed between 1978 and 2013, 1097 cases were diagnosed with breast invasive carcinoma (BRCA) followed by glioblastoma multiforme (GBM, n = 596), ovarian serous cystadenocarcinoma (OV, n = 587), uterine corpus endometrial carcinoma (UCEC, n = 548), kidney renal clear cell carcinoma (KIRC, n = 537), head and neck squamous cell carcinoma (HNSC, n = 528), lung adenocarcinoma (LUAD, n = 522), and brain lower grade glioma (LGG, n = 515). Six cancers including adrenocortical carcinoma (ACC), cholangiocarcinoma (CHOL), lymphoid neoplasm diffuse large B-cell lymphoma (DLBC), mesothelioma (MESO), uterine carcinosarcoma (UCS), and uveal melanoma (UVM) had < 100 cases each. Among the corresponding 33 diagnoses in SEER, the number of cases ranged from 164 (UCS) to 203,828 (BRCA). In NAACCR, the number of cases ranged from 15,705 (MESO) to 1,085,443 (BRCA). Demographic and clinical characteristics of TCGA and SEER cases are shown in Table 1. The race distribution of TCGA and NAACCR cases is shown in Table 2.

Table 1 Demographic and clinical characteristics of TCGA and SEER cases

Full size table

Table 2 Race distribution of TCGA and NAACCR cases

Full size table

Age at diagnosis

Overall, the mean diagnosis age of TCGA cases was 3.9 years younger (95% CI: 1.7–6.2, P < 0.001) than SEER cases after adjusting for cancer types (data not shown). The mean diagnosis age of TCGA cancer cases was not significantly different from that of SEER cases for a minority of cancers (CHOL, colon adenocarcinoma (COAD), KIRC, kidney renal papillary cell carcinoma (KIRP), pheochromocytoma and paraganglioma (PCPG), sarcoma (SARC), stomach adenocarcinoma (STAD), thymoma (THYM), and UVM). In contrast, for most cancer types (24/33), there were statistically significant differences in the mean diagnosis age. Among these, the majority (20/24) had a significantly younger mean diagnosis age with the exceptions of LGG, rectum adenocarcinoma (READ), UCEC, and UCS, TCGA cases that had statistically significant older mean diagnosis age than SEER cases (Fig. 1). The difference in the mean diagnosis age was especially pronounced for DLBC (8.4 ± 2.4 years younger in TCGA), oesophageal carcinoma (ESCA, 3.8 ± 0.9 years younger), kidney chromophobe (KICH, 7.4 ± 1.3 years younger), LGG (7.5 ± 0.9 years older), liver hepatocellular carcinoma (LIHC, 3.6 ± 0.7 years younger), MESO (8.4 ± 1.3 years younger), prostate adenocarcinoma (PRAD, 4.7 ± 0.4 years younger), and UCS (4.3 ± 1.5 years older) cases where the absolute effect size for the diagnosis age difference (Cohen’s d) was ≥ 0.3 (Table 3).

Table 3 Differences of demographic and clinical characteristics distribution among TCGA, SEER, and NAACCR cases^a

Full size table

Sex

For most cancer types (22/27), the observed sex distribution for TCGA cases was similar to SEER cases. Lung squamous cell carcinoma (LUSC), skin cutaneous melanoma (SKCM), and thyroid carcinoma (THCA) had a significantly higher proportion of male cases (74.0% vs. 62.4%, 61.7% vs. 56.6%, and 26.8% vs. 22.8%, respectively), while LIHC and SARC cases had an excess of female cases (32.4% vs. 22.6%, 54.4% vs. 46.7%) in TCGA vs. SEER (Tables 1 and 3).

Race

Overall, compared to the NAACCR cases, individuals whose reported race was Other (Asian, American Indian, or Alaska Native) were over-represented in TCGA. The observed race distribution was disproportional for 13/18 cancer types (Fig. 2a). Among the 13 cancers, eight (bladder urothelial carcinoma (BLCA), BRCA, ESCA, LIHC, pancreatic adenocarcinoma (PAAD), SKCM, STAD, and THCA) had a significantly higher percentage (adjusted residuals ≥ 2) of individuals with reported Other race (Asian, American Indian, or Alaska Native) and eight (cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), ESCA, LIHC, OV, PAAD, PRAD, SARC, and STAD) had a lower percentage (adjusted residuals ≤ −2) of reported Black race in TCGA vs. NAACCR (Table 3).

Stage at diagnosis

For the 26 TCGA cancer types with stage information, evidence for stage dissimilarities was observed for most cancer types (25/26) (Fig. 2b). Specifically, compared to SEER cases, 16 cancers had a significantly lower proportion of stage I in the TCGA cohort, 19 cancers had a significantly higher proportion of stage II, 12 cancers had a significantly higher proportion of stage III, and 14 cancers had a significantly lower proportion of stage IV (Table 3).

Survival months

Using 12 months as an end point, the adjusted mean all-cause survival months were significantly longer for cases with 27/33 cancer types in TCGA relative to SEER. For the remaining six cancer types (CESC, KICH, KIRC, OV, testicular germ cell tumours (TGCT), and UVM), no statistically significant difference was found (Fig. 3). It is noteworthy that for CHOL and SARC, TCGA cases lived an average of over 2 months (2.35 and 2.47 months, respectively) longer than SEER cases after 12 months of follow-up (Table 3).

Discussion

In this study, we observed that despite an approximately equal sex distribution for most cancer types included in TCGA vs. SEER data, differences exist in mean diagnosis age, race, stage at diagnosis distributions, and mean survival months. Generally, our analysis indicates that TCGA cases are younger and survive longer than those from SEER.

A previous study comparing the characteristics of TCGA cases to the U.S. general population was conducted by Spratt et al.¹⁵ The authors reported that TCGA cases with 10 cancer types compared to the U.S. population were 77% vs. 64% White, 12% vs. 12% Black, 3% vs. 5% Asian, 3% vs. 16% Hispanic, and 0.5% vs. 1–2% Native Hawaiian, Pacific Islander, Alaskan Native, or American Indian descent. White cases were over-represented and Asian and Hispanic cases were under-represented compared to the general population. However, the Spratt et al. study used the general U.S. population as the comparator, which is different from the composition of U.S. cancer patients who are one of the prime beneficiaries of TCGA results.

Another more recent study compared the distribution of TCGA cases by age to SEER cases for nine cancer types.¹⁶ Similar to our study, the age distributions for cases in the SEER database were skewed older than those in the TCGA data for nearly all cancer types examined. Specifically, TCGA cases < 70 years were well represented across most tumour types, but cases aged 80–99 years were under-represented for all cancers. These data are also consistent with that from clinical trials.³⁰ TCGA specimens are primarily from U.S. academic institutions,^3,15 suggesting that younger patients are more likely to be seen at academic centres and participate in research where the samples were acquired. A systematic review on the recruitment of older cancer patients to clinical trials reported that age is a significant barrier to recruitment.³¹ For example, Kemeny et al. found that 68% of younger stage II breast cancer patients were offered a trial vs. 34% of the older patients (P < 0.001).³² It is presumed that older patients may need extra time and resources to access available clinical trials or they are often excluded because they do not meet eligibility criteria.³¹ Our results emphasize the importance of increasing access of older cancer patients to cancer genomic projects to increase the applicability of the findings to these patients.

Racial disparities in cancer incidence and survival have been well documented among various cancers. Although socioeconomic and cultural differences that differ between racial groups can explain some of the disparities, recent progress in cancer genomic sequencing allows for a molecular understanding.^33,34 Genomic landscape differences that co-vary by race, a marker of ancestry, may influence cancer treatment. For example, one study reported that even after adjusting for smoking status and sex, race was still significantly associated with EGFR mutations.³⁵ EGFR mutations were highly prevalent in Asians at 30% vs. 7% in Whites.³⁶ In addition, results from a meta-analysis of randomised controlled trials have reported that compared with Caucasians, Asians have a higher survival and response rate to chemotherapy.³⁷ In our study, the race distribution was notably dissimilar for 13/18 cancers, with 8/13 cancers having under-representation in individuals with Black race, which may translate to a distinct genomic landscape that may be under-represented for many cancer types. Notably, 8 of these 13 cancers had higher representation by individuals with Other race (Asian, American Indian, or Alaska Native). This over-representation may be due to TCGA cancers with small sample sizes where a relatively large proportion can be found even only with few cases in the Other population.

Stage is a well-established predictor of cancer prognosis and survival.³⁸ Studies have also reported notable genetic variation in cancers by stage.^39,40 In our study, stage dissimilarities existed for almost all cancer types (25/26). However, these identified differences between datasets may be due to the fact that only individuals who had a resection procedure were included in TCGA.¹⁴ Individuals with unresectable cancers, such as cancers with advanced stage or metastatic cancer,⁴¹ did not meet the inclusion criteria of the program, which likely led to a lower stage distribution of the cases in TCGA compared to SEER. In addition, other differences may have contributed to stage differences including the sample eligibility requirements of only untreated first primary tumour samples being fresh frozen.¹⁴

To our knowledge, this is the largest study to compare clinical characteristics of TCGA cancer cases to a sample of the general population of U.S. cancer cases. However, our study has limitations. No specific diagnosis criteria for each cancer type have been published for TCGA to our knowledge. Thus, the corresponding cancers in SEER were matched by cancer site and histology, and identified by ICD-O-3 primary site and histology/behavior code. Moreover, cases with certain cancers had missing race, stage, and survival months information. Particularly, 6/33 TCGA cancer types (THCA, LUSC, PRAD, COAD, READ, and UVM) had over 15% missing data on race, and 5/26 SEER cancers (BLCA, LIHC, MESO, CHOL, and UVM) had over 15% missing data on stage. In addition, for the race comparison, only 18 cancers in NAACCR were identified with sites matching to those of TCGA cases.

In conclusion, we found dissimilarities in the distributions of demographic and clinical characteristics between TCGA and general population cancer cases for the majority of cancers. Increased awareness of under-represented groups by researchers conducting cancer genomic research will allow for targeted efforts that increase the representativeness of genomic data that is important for precision medicine.

References

Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Article CAS PubMed PubMed Central Google Scholar
Saha, S. K. et al. Corrigendum: mutant IDH inhibits HNF-4alpha to block hepatocyte differentiation and promote biliary cancer. Nature 528, 152 (2015).
Article CAS PubMed Google Scholar
Tomczak, K., Czerwinska, P. & Wiznerowicz, M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19, A68–A77 (2015).
Google Scholar
National Institute of Health, National Cancer Institute, National Human Genome Research Institute. TCGA program overview. http://cancergenome.nih.gov/abouttcga/overview (2016).
Calvo, E. & Baselga, J. Ethnic differences in response to epidermal growth factor receptor tyrosine kinase inhibitors. J. Clin. Oncol. 24, 2158–2163 (2006).
Article CAS PubMed Google Scholar
Shi, Y. et al. A prospective, molecular epidemiology study of EGFR mutations in Asian patients with advanced non-small-cell lung cancer of adenocarcinoma histology (PIONEER). J. Thorac. Oncol. 9, 154–162 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kurian, A. W. BRCA1 and BRCA2 mutations across race and ethnicity: distribution and clinical implications. Curr. Opin. Obstet. Gynecol. 22, 72–78 (2010).
Article PubMed Google Scholar
Cote, M. L. et al. Racial differences in oncogene mutations detected in early-stage low-grade endometrial cancers. Int. J. Gynecol. Cancer 22, 1367–1372 (2012).
Article PubMed PubMed Central Google Scholar
Keenan, T. et al. Comparison of the genomic landscape between primary breast cancer in African American versus white women and the association of racial differences with tumour recurrence. J. Clin. Oncol. 33, 3621–3627 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tan, D. S., Mok, T. S. & Rebbeck, T. R. Cancer genomics: diversity and disparity across ethnicity and geography. J. Clin. Oncol. 34, 91–101 (2016).
Article CAS PubMed Google Scholar
Dresler, C. M. et al. Gender differences in genetic susceptibility for lung cancer. Lung Cancer 30, 153–160 (2000).
Article CAS PubMed Google Scholar
Hwang, S. J., Lozano, G., Amos, C. I. & Strong, L. C. Germline p53 mutations in a cohort with childhood sarcoma: sex differences in cancer risk. Am. J. Hum. Genet. 72, 975–983 (2003).
Article CAS PubMed PubMed Central Google Scholar
Liu, L., Zhang, J., Wu, A. H., Pike, M. C. & Deapen, D. Invasive breast cancer incidence trends by detailed race/ethnicity and age. Int. J. Cancer 130, 395–404 (2012).
Article CAS PubMed Google Scholar
National Institute of Health, National Cancer Institute, National Human Genome Research Institute. TCGA tissue sample requirements: high quality requirements yield high quality data. https://cancergenome.nih.gov/cancersselected/biospeccriteria (2018).
Spratt, D. E. et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol. 2, 1070–1074 (2016).
Article PubMed PubMed Central Google Scholar
Wahl, D. R. et al. Pan-cancer analysis of genomic sequencing among the elderly. Int. J. Radiat. Oncol. Biol. Phys. 98, 726–732 (2017).
Article CAS PubMed Google Scholar
Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 9 Regs Research Data, Nov 2017 Sub (1973-2015) <Katrina/Rita Population Adjustment> - Linked To County Attributes - Total U.S., 1969-2016 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2018, based on the November 2017 submission.
Weir, H. K. et al. Evaluation of North American Association of Central Cancer Registries’ (NAACCR) data for use in population-based cancer survival studies. J. Natl Cancer Inst. Monogr. 2014, 198–209 (2014).
Article PubMed PubMed Central Google Scholar
National Institute of Health, National Cancer Institute. NCI genomic data commons data portal. https://portal.gdc.cancer.gov/ (2016).
North American Association of Central Cancer Registries. NAACCR fast stats: an interactive tool for quick access to key NAACCR cancer statistics. http://www.naaccr.org/ (2016).
Surveillance Epidemiology and End Results (SEER) Program. About the SEER registries. https://seer.cancer.gov/registries/ (2016).
North American Association of Central Cancer Registries. Cancer in North America CiNA volumes. https://www.naaccr.org/cancer-in-north-america-cina-volumes/ (2016).
Surveillance Epidemiology and End Results (SEER) Program. Survival time calculation. https://seer.cancer.gov/survivaltime/SurvivalTimeCalculation.pdf (2016).
Fritz, C. O., Morris, P. E. & Richler, J. J. Effect size estimates: current use, calculations, and interpretation. J. Exp. Psychol. Gen. 141, 2–18 (2012).
Article PubMed Google Scholar
Royston, P. & Parmar, M. K. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med. Res. Methodol. 13, 152 (2013).
Article PubMed PubMed Central Google Scholar
Zhao, L. et al. On the restricted mean survival time curve in survival analysis. Biometrics 72, 215–221 (2016).
Article PubMed Google Scholar
A’Hern, R. P. Restricted mean survival time: an obligatory end point for time-to-event analysis in cancer trials? J. Clin. Oncol. 34, 3474–3476 (2016).
Article PubMed Google Scholar
Andersen, P. K. & Perme, M. P. Pseudo-observations in survival analysis. Stat. Methods Med. Res. 19, 71–99 (2010).
Article PubMed Google Scholar
Andersen, P. K., Hansen, M. G. & Klein, J. P. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal. 10, 335–350 (2004).
Article PubMed Google Scholar
Murthy, V. H., Krumholz, H. M. & Gross, C. P. Participation in cancer clinical trials: race-, sex-, and age-based disparities. J. Am. Med. Assoc. 291, 2720–2726 (2004).
Article CAS Google Scholar
Townsley, C. A., Selby, R. & Siu, L. L. Systematic review of barriers to the recruitment of older patients with cancer onto clinical trials. J. Clin. Oncol. 23, 3112–3124 (2005).
Article PubMed Google Scholar
Kemeny, M. M. et al. Barriers to clinical trial participation by older women with breast cancer. J. Clin. Oncol. 21, 2268–2275 (2003).
Article PubMed Google Scholar
Burchard, E. G. et al. The importance of race and ethnic background in biomedical research and clinical practice. N. Engl. J. Med. 348, 1170–1175 (2003).
Article PubMed Google Scholar
El-Telbany, A. & Ma, P. C. Cancer genes in lung cancer: racial disparities: are there any? Genes Cancer 3, 467–480 (2012).
Article PubMed PubMed Central Google Scholar
Bauml, J. et al. Frequency of EGFR and KRAS mutations in patients with non small cell lung cancer by racial background: do disparities exist? Lung Cancer 81, 347–353 (2013).
Article PubMed Google Scholar
Zhou, W. & Christiani, D. C. East meets West: ethnic differences in epidemiology and clinical behaviors of lung cancer between East Asians and Caucasians. Chin. J. Cancer 30, 287–292 (2011).
Article CAS PubMed PubMed Central Google Scholar
Soo, R. A. et al. Ethnic differences in survival outcome in patients with advanced stage non-small cell lung cancer: results of a meta-analysis of randomized controlled trials. J. Thorac. Oncol. 6, 1030–1038 (2011).
Article PubMed Google Scholar
Naruke, T., Goya, T., Tsuchiya, R. & Suemasu, K. Prognosis and survival in resected lung carcinoma based on the new international staging system. J. Thorac. Cardiovasc. Surg. 96, 440–447 (1988).
CAS PubMed Google Scholar
Blaveri, E. et al. Bladder cancer stage and outcome by array-based comparative genomic hybridization. Clin. Cancer Res. 11, 7012–7022 (2005).
Article CAS PubMed Google Scholar
Richter, J. et al. Marked genetic differences between stage pTa and stage pT1 papillary bladder cancer detected by comparative genomic hybridization. Cancer Res. 57, 2860–2864 (1997).
CAS PubMed Google Scholar
Balaban, E. P. et al. Locally advanced, unresectable pancreatic cancer: American society of clinical oncology clinical practice guideline. J. Clin. Oncol. 34, 2654–2668 (2016).
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Brown School, Washington University in St. Louis, St. Louis MO, USA
Xiaoyan Wang, Joseph T. Steensma, Qianxi Feng, Hannah Padda & Kimberly J. Johnson
Division of Oncology, Department of Medicine, Washington University in St. Louis, St. Louis MO, USA
Matthew H. Bailey
McDonnell Genome Institute, Washington University in St. Louis, St. Louis MO, USA
Matthew H. Bailey
Siteman Cancer Center, Washington University in St. Louis, St. Louis MO, USA
Kimberly J. Johnson

Authors

Xiaoyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph T. Steensma
View author publications
You can also search for this author in PubMed Google Scholar
Matthew H. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Qianxi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hannah Padda
View author publications
You can also search for this author in PubMed Google Scholar
Kimberly J. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.W. analysed data, wrote, and revised the paper. J.S. contributed to the method section and revisions on the manuscript. M.H.B. revised the figures and the manuscript. Q.F. and H.P. replicated the results to ensure reproducibility of findings. K.J. supervised the project.

Corresponding author

Correspondence to Kimberly J. Johnson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Availability of data and materials

TCGA: https://portal.gdc.cancer.gov/ SEER: www.seer.cancer.gov NAACCR: https://faststats.naaccr.org/

Additional information

Note: This work is published under the standard license to publish agreement. After 12 months the work will become freely available and the license terms will switch to a Creative Commons Attribution 4.0 International (CC BY 4.0).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Table 1: Cancer Types in TCGA, SEER and NAACCR

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Wang, X., Steensma, J.T., Bailey, M. et al. Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases. Br J Cancer 119, 885–892 (2018). https://doi.org/10.1038/s41416-018-0140-8

Download citation

Received: 24 January 2018
Revised: 15 May 2018
Accepted: 16 May 2018
Published: 21 August 2018
Issue Date: 02 October 2018
DOI: https://doi.org/10.1038/s41416-018-0140-8

This article is cited by

A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer
- Mohamed Amgad
- James M. Hodge
- Lee A. D. Cooper
Nature Medicine (2024)
Demographic bias in misdiagnosis by computational pathology models
- Anurag Vaidya
- Richard J. Chen
- Faisal Mahmood
Nature Medicine (2024)
What can we learn about acid-base transporters in cancer from studying somatic mutations in their genes?
- Bobby White
- Pawel Swietach
Pflügers Archiv - European Journal of Physiology (2024)
Pan-cancer landscape of epigenetic factor expression predicts tumor outcome
- Michael W. Cheng
- Mithun Mitra
- Hilary A. Coller
Communications Biology (2023)
The pharmacoepigenomic landscape of cancer cell lines reveals the epigenetic component of drug sensitivity
- Alexander Joschua Ohnmacht
- Anantharamanan Rajamani
- Michael Patrick Menden
Communications Biology (2023)

Subjects

Abstract

Background

Methods

Results

Conclusions

Similar content being viewed by others

Introduction

Methods

Population

Variables

Statistical analysis

Results

Age at diagnosis

Sex

Race

Stage at diagnosis

Survival months

Discussion

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Availability of data and materials

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links