Introduction

Most cancer patients are diagnosed after the onset of symptoms relating to their disease. In the UK, more than 90% of all cancer patients are diagnosed symptomatically [1, 2]; the similar figure in the US is likely to exceed 80% [3]. Despite ongoing improvements in diagnostic technologies to support asymptomatic cancer detection through screening, most patients are expected to continue to be diagnosed symptomatically in the forthcoming decade [4]. Therefore, among other cancer control strategies, several interventions aimed to advance help-seeking (through raising awareness of possible cancer symptoms among members of the public), and prompt diagnostic investigation or referral of patients presenting with symptoms raising the suspicion of cancer, have been instigated [1, 4,5,6]. However, clinical guidelines supporting referrals currently chiefly relate to specific cancer sites, although many symptoms (particularly vague/non-organ specific ones) relate to a range of different cancers. In England, multi-specialty diagnostic centres have been developed to assess patients with vague/non-organ specific symptoms, but optimal investigation strategies, either pre-referral and within primary care, or post-referral are unclear [7]. Decisions on choice of target symptoms in public awareness campaigns have traditionally been made based on clinical consensus about associations of a given symptom with different cancers [8]. Attaining such evidence can help to improve the design and evaluation strategies for both symptom awareness campaigns, and clinical practice recommendations for use of primary care investigations or referrals. Information from common blood tests can support the diagnostic process in primary care, but such tests are used in fewer than half of all cancer patients, with large variation by presenting symptoms and cancer site [9]. For these reasons, a fuller understanding of the bidirectional relationships between presenting symptoms and cancer sites is needed.

Against this background, we aimed to first examine the relative frequency of presenting symptoms by cancer site (the ‘symptom signature’ of each cancer site), and second to examine the relative frequency of cancer sites by presenting symptom (the ‘cancer site case-mix’ of each symptom), among incident cancer cases.

Methods

Data and study population

Data from the English National Cancer Diagnosis Audit (NCDA) 2018 was analysed. Details of the NCDA methodology have been described previously [2]. Briefly, incident cancer cases diagnosed in 2018, recorded by NHS England National Disease Registration Service, were assigned to the participating general practices which the patient was registered with at the time of their diagnosis. General practitioners completed a questionnaire about the diagnostic process of each patient. 1878 general practices (26% of all practices) chose to take part in the audit, gathering data on 64,489 malignant tumours (20% of incident cancers in 2018), excluding non-melanoma skin cancer. Participating practices were similar to non-participating practices regarding the characteristics of registered populations, practice performance metrics and quality of patient experience. Patients included in the NCDA had similar characteristics (age and sex), cancer types, and cancer stages compared to the incident cohort of cancer patients in England. Patients whose cancer was screening-detected were excluded from analysis, as were patients aged 24 or younger. In patients with more than one tumour diagnosed in 2018, the tumour with the more advanced stage was chosen, or randomly if stage category was missing or identical. The derivation of the analysis sample is described in Fig. 1.

Fig. 1: Flowchart describing the process of cohort selection.
figure 1

Diagnostic interval: interval between date of first presentation to GP and date of cancer diagnosis.

Data were available on 38 cancer sites, which were further categorised into the following cancer site groups: head and neck, upper gastrointestinal (GI), lower GI, hepato-pancreato-biliary (HPB), respiratory, urological, haematological, central nervous system (CNS), sarcoma, skin, ocular, breast, gynaecological, and prostate and other male organs. Two further cancer site groups included cancer sites with a small sample size (<35 cases) that could not be incorporated into other groups as ‘other malignant neoplasms’, and cancers of unknown primary sites (ICD-10 codes C77-C80), comprising a total of 16 cancer site groups. A full list of ICD-10 codes can be found in Supplementary Table 1.

Within NCDA, GPs could record one or more presenting symptoms from a drop-down menu of 83 symptom categories (including a ‘not applicable’ (N/A) and a ‘not known’ (N/K) group). Each of the 83 symptom categories was assigned to one of 14 higher-level groups comprising upper abdominal, lower abdominal, breast, CNS, lump/mass/lymph node, musculoskeletal pain, respiratory, skin lesion, ulceration, urological, female-organ specific, male-organ specific, non-specific symptoms, and no symptoms recorded. The non-specific group included symptoms without organ-specificity (e.g. fever, or fatigue) (Supplementary Table 2).

Analysis

First, to examine the symptom signature of each cancer site, the proportion of individual symptoms that patients presented with by cancer site was calculated, ignoring combinations in patients presenting with more than one symptom. To further assess presenting symptom burden in individual cancer sites, the mean number of recorded symptoms per patient was calculated, alongside the number of symptoms that occurred in more than 1% and 50% of cases.

Second, to examine cancer site case-mix of each symptom among incident cancer patients, the proportion of patients diagnosed with each cancer site was calculated within each symptom. For both symptom signatures and cancer site case-mix, relevant proportions were calculated alongside their corresponding 95% confidence intervals.

Results

Sample description

Among 55,122 patients included in the analysis (Fig. 1), 29,841 (54%) were men and 21,597 (39%) were 60–74 years old. Sample composition by demographic characteristics and cancer site are shown in Table 1. For 11,066 (20%) patients, no presenting symptom was recorded.

Table 1 Sample characteristics (sex and age) of patients by cancer site.

Symptom signatures

The relative frequencies of presenting symptoms (symptom signatures) of each cancer site are shown in Figs. 2, 3 with exact values in Supplementary Tables 3 and 4.

Fig. 2: Proportion of symptom groups that patients presented with by cancer group (e.g. 66% of patients with upper GI cancer presented with upper abdominal symptoms).
figure 2

Columns add up to >100% because of multiple symptoms per patient. ‘-‘ represents values <= 0.5% of cases in a given cancer. GI Gastrointestinal, HPB Hepato-pancreato-biliary, CNS Central nervous system.

Fig. 3: Proportion of symptoms that patients presented with by cancer site (e.g. 69% of patients with laryngeal cancer presented with hoarseness).
figure 3

Columns add to >100% because of multiple symptoms per patient. ‘-‘ represents values <= 0.5% of cases in a given cancer. Mean no. symptoms represents the average number of symptoms experienced per patient in each cancer site, excluding patients without recorded symptoms (N/A or N/K); it is calculated by the total number of symptoms recorded divided by the number of patients that had at least 1 symptom recorded. Cervix, ovary, uterus and vulva/vagina cancers only in women. Penile, prostate and testicular cancers only in men. HPB Hepato-pancreato-biliary, CNS Central nervous system, DVT Deep vein thrombosis, LN pain with alcohol Lymph node pain with alcohol, Abdominal pain (NOS) abdominal pain not otherwise specified, CIBH Change in bowel habit, LUTS Lower urinary tract symptoms, UTI urinary tract infection, N/A never presented to primary care pre-diagnosis*, N/K patient records don’t contain answer to the question.* *According to the NCDA data collection instrument definitions of these categories.

Considering the 16 cancer site groups, we observe a higher concentration of similar symptoms among cancers relating to organs of the same body system or region (Fig. 2). For example, patients with upper abdominal organ cancers tend to frequently present with upper abdominal symptoms, and the same is observed for gynaecological cancers and gynaecological symptoms. Patients with haematological cancers are an exception to this pattern, as they tend to present with more diverse symptoms.

Considering the 38 individual cancer sites, three principal patterns are apparent (Fig. 3). First, cancers with symptom signatures dominated by a frequent single presenting symptom. Examples include laryngeal cancer (69% of patients presenting with hoarseness), melanoma (71% with abnormal mole/lesion), and breast cancer (75% with breast lump/mass). Second, cancers with a broad range of presenting symptoms. For example, the most frequent presenting symptom among patients with pancreatic cancer (weight loss), renal cancer (haematuria) and multiple myeloma (back pain) was recorded in <25% of patients with each of these cancers. Third, cancers with relatively high percentages of patients without recorded presenting symptoms. For example, ocular cancer and chronic lymphocytic leukaemia have high proportions of patients without recorded symptoms (60% and 52%, respectively). Other notable cancer sites in this group include liver (36%), renal (36%), oral (29%), prostate (29%) and thyroid (27%) cancers. In contrast, laryngeal, oropharyngeal and uterine cancers had <9% of patients without recorded symptoms.

In total, there were 77,089 symptoms corresponding to the 55,122 analysed patients (1.4 symptoms/patient). The mean number of symptoms per patient ranged from 1 for ocular cancer and 1.1 for melanoma, to 2 for acute leukaemia and 2.1 for pancreatic cancer (Fig. 3, bottom row). A further illustration of the breadth of the symptom signature of each cancer site is provided in Table 2, describing the number of symptoms occurring in differently sized sub-cohorts pertinent to each cancer site. Higher count of symptoms occurring in >1% of cases indicate that a cancer site has a broader symptom signature, and vice versa. For example, a total variety of 30 symptoms was recorded in >1% of all patients with non-HL (Non-Hodgkin Lymphoma), compared to 7 symptoms recorded in >1% of women with breast cancer. The number of symptoms occurring in at least half of the cases is an indicator of whether the signature is dominated by specific presenting symptoms (e.g. hoarseness in 69% of patients with laryngeal cancer in comparison to no single symptom having a relative frequency exceeding 25% among pancreatic cancer patients, Fig. 3). Further descriptions of this nature can be found in Supplementary Table 5.

Table 2 Number of symptoms per cancer site that occurred in more than 1% and 50% of cases.

Cancer site case-mix

The cancer site signatures of different symptoms are illustrated as spectra (Figs. 4, 5), with exact values in Supplementary Tables 6, 7.

Fig. 4: The spectrum of cancer groups contained within each symptom group (proportion of cancer groups within each symptom group).
figure 4

“Gynaecological” and “Prostate and other male organs” cancer groups have similar colours because they occur exclusively in women and men, respectively. GI Gastrointestinal, HPB Hepato-pancreato-biliary, CNS Central nervous system, MSK Musculoskeletal.

Fig. 5: The spectrum of cancer sites contained within each symptom (proportion of cancer sites within each symptom).
figure 5

“Gynaecological” and “Prostate and other male organs” cancer groups have similar colours because they occur exclusively in women and men, respectively. Symptoms with n < 20 by sex were excluded from the visualisation: Lymph node pain with alcohol (both), general lymphadenopathy (men), leukoplakia (both), new onset diabetes (both), clubbing (women), stridor (both), renal colic (women), breast pain (men), nipple changes (men), nipple discharge (men). GI Gastrointestinal, HPB Hepato-pancreato-biliary, CNS Central nervous system, DVT Deep Vein Thrombosis, Abdominal pain (NOS) Abdominal pain not otherwise specified, CIBH Change In Bowel Habit, LUTS Lower urinary tract symptoms, UTI Urinary Tract Infection, N/A never presented to primary care pre-diagnosis*, N/K patient records don’t contain answer to the question.* *According to the NCDA data collection instrument definitions of these categories.

Considering the 13 symptom groups, two principal patterns are apparent. Certain presenting symptoms tend to relate to cancers of the same body system or region (Fig. 4). For example, skin lesions almost solely relate to skin cancer cases, respiratory symptoms to respiratory organ cancers, and urological symptoms to urological or sex-specific cancers. In contrast, the group of non-specific symptoms typically relates to a wide range of cancer sites (see also below). Abdominal symptoms (both upper or lower), although often relating to abdominal cancers, also relate to other cancer sites.

Considering the 83 symptom categories, we observed that some tend to principally relate to specific cancer sites (Fig. 5). Examples include anal mass (anal cancer, diagnosed in 46% of men and 72% of women presenting with this symptom), haemoptysis (lung cancer, in 84% of men and 90% of women with this symptom) and abnormal mole (melanoma, in 99% of either men or women presenting with this symptom). In contrast, patients with other symptoms (such as abdominal pain (not otherwise specified), weight loss, fatigue, and night sweats) were subsequently diagnosed with a wider range of cancer sites. For example, among all female patients presenting with abdominal pain (not otherwise specified), 26% were diagnosed with colon, 12% with pancreatic, 16% with ovarian and 44% with another 27 cancer sites, with corresponding figures in males being 28%, 14%, 8% for prostate cancer, and 50% with another 22 cancer sites. Similarly, among female patients presenting with weight loss, 22% were diagnosed with lung, 16% with colon, 10% with pancreatic cancer; among male patients presenting with weight loss, 20% were diagnosed with lung, 11% with colon, 12% with prostate, 9% with oesophageal and 9% with pancreatic cancer.

Certain symptoms have a different cancer site case mix by sex (Figs. 4, 5). For example, although musculoskeletal symptoms often present in sarcoma or haematological cancers in both men and women, in men they also frequently relate to prostate cancer. Similarly, the cancer site case mix of patients without recorded symptoms varied by sex, chiefly reflecting a high percentage (38%) of men without recorded presenting symptoms diagnosed with prostate cancer.

Discussion

Summary

Among incident cancer cases, we have mapped the presenting symptom signatures of 38 cancer sites and described the cancer site case-mix of 83 presenting symptoms. Certain presenting symptoms are typically concentrated in cancers of the same body system or region, and vice versa. When the symptom signature of a given cancer is dominated by a single symptom, the cancer case-mix of that symptom is also dominated by the same cancer; conversely, relationships between presenting symptom and cancer site are much weaker for cancers with broader symptom signature. The cancer site case-mix of certain symptoms (e.g. musculoskeletal symptoms) varies by sex.

Comparisons with literature

Evidence on the presenting symptom signature of 15 cancer sites has been reviewed previously [10]. Generally, relevant prior evidence is concentrated on single cancer sites; in contrast we have examined the symptom signature of 38 cancers simultaneously. Acknowledging this difference, our findings concord with prior evidence, although they expand to an additional 23 cancers sites with little or prior population-based evidence on their presenting symptoms (such as laryngeal, liver, melanoma, mesothelioma, oral, penile, sarcoma, small intestinal, testicular, thyroid, vaginal and vulval) [10].

Consistent with previous literature, we have found breast and bladder cancers to have narrow symptom signatures [10], and haematological [11], pancreatic [12], and renal cancers [10] to have broad symptom signatures. Haemoptysis has a relatively high predictive value for lung cancer, but it only occurs in 20% of patients with lung cancer [13]; consistent with this prior evidence, we found that a fifth of lung cancer patients in our study population presented with haemoptysis. As described previously, we have found that chronic lymphocytic leukaemia patients typically had no recorded symptoms, which concords with prior evidence indicating that this cancer is often detected asymptomatically [11]. Prostate, renal, liver and thyroid cancers also had high percentages of diagnosis without recorded symptoms, consistent with prior knowledge about detection via opportunistic screening or incidental identification in many patients [14]. The observation that ocular and oral cancers also have a high proportion of non-recorded symptoms is novel, and could indicate detection of those cancers outside primary care (e.g. opticians and dentists). In keeping with prior evidence, most cancers arising after upper or lower abdominal symptom presentations related to cancers of the abdomen (e.g. upper and lower GI, hepato-pancreato-biliary, urological, prostate and other male organs/gynaecological) though around one in five related to non-abdominal sites [15, 16]. Concurring with prior evidence, we have observed a diverse symptom signature for colon and rectal cancers [17].

In brief, the findings concord with prior evidence but amplify it substantially in respect of number of presenting symptoms examined and the range of cancer sites considered.

Strengths and limitations

We covered a wide range of symptoms and cancer sites in a population-based incident cohort of patients with cancer. However, there are several limitations to consider. This was a case-only analysis (only patients with diagnosis of cancer were included). While this is an inherent feature of all epidemiological studies using cancer registry data, it is important that this is borne in mind for interpretation. For example, no inferences can be made about the predictive value of certain symptoms for specific cancers.

By the design of the NCDA questionnaire, GPs were asked to record the first presenting symptom(s) that prompted the suspicion of cancer. Since the surveys were filled by the GPs retrospectively, it is possible that certain symptoms more closely related to the patient’s diagnosis were recorded in the audit. It is also possible that other symptoms, particularly non-specific ones, were present and did relate to the underlying cancer but not deemed to do so by the GP, and therefore were not recorded. However, GPs had access to both coded and free-text data in the patients’ records—a unique feature of NCDA study, which mitigates concerns about reliance on structured (coded) fields in primary care electronic health record data sources [18]. The symptoms recorded relate to the symptoms presented to the GP which may differ from the symptoms experienced at symptom onset.

Although we present associations between individual symptoms and individual cancer sites, we have also grouped symptoms and cancer sites to provide higher-level summaries. The definition of these groups is normative and chiefly guided by anatomical considerations; by its nature includes a degree of heterogeneity.

Implications

Three main translational implications arise from the findings.

Considering research implications, the results provide foundational evidence that can be used to validate the completeness of phenotyping of cancer symptoms in electronic health record sources, or profile associations between presenting symptoms and diagnostic process measures, such as investigation use.

Considering implications for public health or clinical practice, the findings can guide decision-making about the choice of target symptoms in symptom awareness campaigns, and inform their evaluation, regarding the range of cancer sites where changes in diagnostic pathways and intervals may be observable. Similarly, they can guide investigation strategies in patients presenting with specific symptoms, for example, prioritising certain tests over others (e.g. endoscopy over imaging), given differences in the expected probability of specific cancer sites. Further, novel/emerging diagnostic technologies, such as multi-cancer early detection tests, could be preferentially deployed on symptoms potentially associated with a wider range of cancer sites. As an example, Multi-Cancer Early Detection (MCED) tests provide information on whether a cancer signal was detected (yes/no) and up to two predicted ‘cancer site origins’, i.e. suspected site of underlying cancer [19]. Clinicians may be able to complement such information with evidence on the associations between presenting symptoms and the likely distribution of cancer sites in cases presenting with that symptom, to further inform investigation strategies and test sequencing. Additionally, our study could motivate future studies into examining the probability of specific cancer sites in cancer patients conditional on their presenting symptoms, which could further improve the diagnostic accuracy and usefulness of information that can be derived by MCED tests.

Considering implications for public policy, the findings emphasise the importance of considering the overall risk of cancer (across body organs and systems) in symptom-based referral guidelines for suspected cancer. Instead, current UK guideline recommendations (issued in 2015) chiefly relate to symptoms of specific cancer sites [6], meaning that, as we show, the broader cancer site case-mix of different symptoms (particularly vague symptoms) is not appreciated. There is no equivalent single body of guidelines for presenting symptoms in the US setting, although evidence indicates that diagnostic delays in symptomatic patients subsequently diagnosed with cancer are comparable to those seen in Europe, and that the predictive values of different symptoms among presenters are also similar [20, 21]. Recent evidence demonstrates that the predictive values of three common vague symptoms, i.e. weight loss, fatigue and abdominal pain, do not exceed the 3% normative referral threshold used by NICE when individual cancer sites are considered on their own, but do so when all cancer sites are considered together [22,23,24].

Conclusion

The study provides a detailed understanding of bidirectional relationships between presenting symptoms and cancer sites among incident cases, enabling research examining associations between symptomatic presentations and diagnostic process measures. Future clinical practice recommendations for specialist referral ought to encompass a broader range of cancer sites per symptom. The design of symptom awareness campaigns can be appropriately guided regarding choice of target symptoms, and diagnostic strategies can be suitably informed.