Stroke among cancer patients

We identify cancer patients at highest risk of fatal stroke. This is a population-based study using nationally representative data from the Surveillance, Epidemiology, and End Results program, 1992-2015. Among 7,529,481 cancer patients, 80,513 died of fatal stroke (with 262,461 person-years at risk); the rate of fatal stroke was 21.64 per 100,000-person years, and the standardized mortality ratio (SMR) of fatal stroke was 2.17 (95% CI, 2.15, 2.19). Patients with cancer of the prostate, breast, and colorectum contribute to the plurality of cancer patients dying of fatal stroke. Brain and gastrointestinal cancer patients had the highest SMRs (>2-5) through the follow up period. Among those diagnosed at <40 years of age, the plurality of strokes occurs in patients treated for brain tumors and lymphomas; if >40, from cancers of the prostate, breast, and colorectum. For almost all cancers survivors, the risk of stroke increases with time.

Since the SEER database have increased the proportion of the US population captured over the years, in early years of the SEER program there are fewer survivors than in later years, and the proportion of death by index cancer is lower in later years. Further, the rate count of people having a cancer depends on the number of patients living with this cancer from previous years (which depends on cancer prevalence), those diagnosed within the calendar year (which depends on screening and incidence), and those dying during that year (which depends on cancer and treatment aggressiveness, how death is coded, common risk factors among cancers and comorbidities, and patient age). Certain cancers have an indolent course (e.g. prostate), and patients diagnosed in subsequent years are added to the cumulative count, increasing the number of prostate cancer patients relative to all others; for patients with aggressive cancers (e.g. pancreatic), the addition of patients diagnosed in subsequent years has little effect on the cumulative number because of high rates of mortality.

Age
SEER provides age-standard adult (age ≥15) cancer populations to calculate agestandardized survival, which is used to compare survival across time or different cancer populations with different age distributions. The standards provided are the International Cancer Survival Standard (ICSS) derived for three broad groups of cancer sites with similar patterns of incidence by age. ICSS 1 includes cancer sites with increasing incidence by age (most cancer sites; e.g. prostate). ICSS 2 includes cancer sites with broadly constant incidence by age (e.g. nasopharynx). ICSS 3 includes cancer sites that mainly affect young adults (e.g. testis). By using the appropriate standard, the age-standardized survival is theorized to be like the raw (unweighted) survival. For each of the three ICSS populations, SEER*Stat provides weights by 5-year age bins using the age variable, Age recode with <1 year olds, and by five larger age groups, in the variable, Age Standard for Survival (15-44, 45-54, 55-64, 65-74, 75+), as described on the SEER website.

Quality assurance and completeness
SEER undergoes quality assurance using systematic, standardized, and periodic data collection procedure for all defined members of a defined cohort is performed to avoid surveillance bias. 1 The case-finding audits are performed by a qualified member from each SEER registry under the direction of members of the National Cancer Institute. Auditors create an abstract the contains the primary site and the case finding source. 2 When performing audits, SEER adheres to two basic principles: auditing high quantity and high risk data. High quantity refers to disease sites that have the highest incidence and prevalence (e.g. breast, prostate, lung, colon); as well facilities that contribute the greatest percent of cases to the central database.
Additionally, pathology laboratories are selected to review tissue from patients not seen at that hospital. High risk refers to cases that are likely to be miscoded (e.g. head and neck, hematopoietic diseases); compliance to new rules; and newly-reportable diseases.

Defining the cause of death
Mortality codes in SEER are assigned from death certificates, completed by the doctor caring for the patient at the time of demise. There is no single best method for calculating survival from cancer in the SEER program. 3 Different methods can give different outcomes, but for most variants considered the differences are small. For stroke, there is likely little discrepancy in the cause of death, as compared to a cause of death like heart disease, which may be cause by the cancer treatment, underlying heart disease, or a combination of both.

Data session information
The instructions to access the SEER data are provided below: (1) Download the SEER*Stat software from the NCI website: https://seer.cancer.gov/seerstat/software/ (2) Open the program (3) Click "File", "New," "MP-SIR" Session to generate the SMRs. Note, this was used in Table 1 and Figure 1 of the current analysis.
"Case Listing" to generate a list of patient cases diagnosed.
"Incidence" to generate a list of the incidence of cancer or cause of death.
Note, this was used in Table 2 and Figure 2 of the current analysis, and to generate the ORs.
(4) Click on the desired registry to use for each of the sessions. For the purposes of this analysis, the following registries and options were selected. All the other data supporting the findings of this study are available within the article and its supplementary information files and from the corresponding author upon reasonable request In the "MP-SIR" session, select the following: The output of these sessions is provided in Source Data file.

Registry Differences
The SEER databases have been evolving over the years, and this evolution is described in The SEER 21 database was released in 2019, including more geographic regions. As data are collected from more regions, the same concepts of patient inclusion over time apply.
SEER is able to analyze data by different methods, using its "Sessions." The time period of these data sessions depend on the SEER database chosen (SEER 9, SEER 18, SEER 21, etc.).
The "Standardized Incidence Ratio (SIR) session" provides incidence of a particular event after diagnosis, as a function of follow up time or age at diagnosis. When the event of interest is death as a function of follow up time, the SIRs are actually standardized mortality ratios (SMRs), and they provide the relative risk of death from a particular cause vs. the general population.
A "case listing" session is another option in displaying the data. Case listing sessions provide patient-level data, with each patient in a row, and variables (e.g. age, sex, cancer type) in columns. Thus, case listing sessions may be used to calculate odds ratios and generate survival plots.

Latency Exclusion Periods in Standardized Mortality Ratios
For SMRs calculated as a function of follow up time, SMRs during each window of time (e.g. at 1 year after diagnosis, 1-5 years after diagnosis, etc.) depend on the time at risk. With longer time at risk and more observed events, the confidence intervals become smaller, and measurements are more accurate. With a short time at risk (e.g. the first few months after diagnosis), or very few events (e.g. suicide), or among a niche patient cohort (e.g. Hodgkin lymphoma), the confidence intervals can widen dramatically.
In the first few months after diagnosis of cancer, patients often have an "introduction to the medical system;" i.e. a patient living in a rural area comes to a hospital where they are diagnosed with cancer, as well as many other comorbidities like heart disease, lung dysfunction, kidney failure, etc. The patient may die of any of these within a few months, but estimating the observed versus expected rate of death becomes difficult, and the confidence intervals for an SMR naturally widen. Thus, some researchers, including our team, sometimes elect to exclude the first 2 months from the SMR calculations. While SMRs may actually be very high during this time, the confidence intervals are so wide that an accurate measure is not meaningful. Moreover, the absolute number of observed events in this time may be rather low, especially when the event of interest is rare. Thus, the overall SMRs for the entire follow up period (with or without the latency periods) tend to be relatively similar.