Method to estimate relative risk using exposed proportion and case group data

Yada, Yoichi

doi:10.1038/s41598-017-02302-1

Download PDF

Article
Open access
Published: 18 May 2017

Method to estimate relative risk using exposed proportion and case group data

Yoichi Yada¹

Scientific Reports volume 7, Article number: 2131 (2017) Cite this article

3190 Accesses
5 Citations
1 Altmetric
Metrics details

Subjects

Abstract

A change in risk of an event occurring, which is affected with a factor, is a common issue in many research fields, and relative risk is widely used because of intuitive interpretation. Estimating relative risk has required data from two follow-up groups and can thus be cost and time consuming. Subjects for whom an event occurred (case group) are often observed but generally analyzed in comparison to those for whom an event did not (control group); however, estimating relative risk using case group data without approximation is hindered. In this study, an obstacle to estimate relative risk using case control data is clarified as a mathematical expression and a new equation to estimate relative risk using the exposed proportion and case group data is proposed. The proposed equation is derived without using the Bayesian methods. A method to estimate the confidence interval for the proposed estimator is also provided. The usefulness of the proposed equation, which requires neither control nor follow-up groups, is demonstrated for both theoretical and real-life examples.

G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study

Article Open access 08 June 2020

Need for discriminating between diagnostic and screening efficacy to estimate a biomarker based on case control and cohort studies

Article Open access 17 November 2021

Temporal bias in case-control design: preventing reliable predictions of the future

Article Open access 17 February 2021

Introduction

A change in risk of an event occurring associated with exposure to a factor is generally studied in many fields, such as medicine and social science^{1, 2}. Relative risk (RR), also known as “rate ratio”, is widely used as a measure of association and can be interpreted intuitively^{3, 4} because of its simple definition:

$$RR\equiv \frac{{\pi }_{1}}{{\pi }_{0}},$$

(1)

where π ₁ and π ₀ are the probabilities of an event occurring (i.e., risks) for subjects exposed and unexposed to a factor. Estimating RR requires the estimators of both π ₁ and π ₀, such as the prevalence or cumulative incidence rate.

The probability estimators can be calculated using existing data of large-scale epidemiological studies or should be obtained from a smaller study designed for the estimation. Let N be the total number of subjects to be studied, such as population, and N ₁ and N ₀ be the exposed and unexposed parts of N. The N ₁ is written as

$${N}_{1}=N-{N}_{0}=E\cdot N,$$

(2)

where E is the exposed proportion. The probabilities of an event occurring can be written as

$${\pi }_{1}=\frac{{N}_{11}}{{N}_{1}}\,{\rm{and}}\,{\pi }_{0}=\frac{{N}_{01}}{{N}_{0}},$$

(3)

where N ₁₁ and N ₀₁ are the numbers of subjects for whom an event occurred among N ₁ and N ₀. When p ₁ and p ₀ are the estimators of π ₁ and π ₀, they should be defined as

$${p}_{1}\equiv \frac{{n}_{11}}{{n}_{1}}\,{\rm{and}}\,{p}_{0}\equiv \frac{{n}_{01}}{{n}_{0}},$$

(4)

where n ₁ and n ₀ are the observed numbers of exposed and unexposed subjects and n ₁₁ and n ₀₁ are the numbers of subjects for whom the event occurred among n ₁ and n ₀. Thus, eRR, which is defined as

$$eRR\equiv \frac{{n}_{11}\cdot {n}_{1}^{-1}}{{n}_{01}\cdot {n}_{0}^{-1}},$$

(5)

is used as the estimator of relative risk. The groups of n ₁₁ and n ₀₁ can be found in groups of exposed and unexposed subjects, who were followed to the event occurring (called “cohort”). However, appropriate cohorts may be occasionally found in epidemiological survey results or should be obtained from a fresh study designed for the purpose (i.e., cohort study).

Unfortunately, few existing results provide appropriate cohorts and long-term observations of cohorts, for example, over several years or decades, are likely to be costly and time consuming, and thus, estimating relative risk can be burdensome for researchers. Meanwhile, because case groups are commonly observed, studies comparing them to a control group (case control study) and estimating the change in risk tend to be less costly and time consuming. Although a case control study is often conducted, estimating relative risk using case control data is hindered. To demonstrate, let m ₁ and m ₀ be the numbers of observed subjects in a case group and control group and m ₁₁ and m ₀₁ be the numbers of exposed subjects in the case and control groups (see Table 1). When meRR is defined similarly to the estimator of relative risk as

$$meRR\equiv \frac{{m}_{11}\cdot {({m}_{11}+{m}_{10})}^{-1}}{({m}_{1}-{m}_{11})\cdot {({m}_{1}-{m}_{11}+{m}_{0}-{m}_{10})}^{-1}},$$

(6)

meRR may be misused as an estimator of relative risk but will largely vary with observing conditions that researchers can designate, such as the size of m ₁. Moreover, researchers cannot perceive the effects of those observing conditions. Thus, meRR is not appropriate for the estimation. Although this obstacle for estimating relative risk caused by observation is well known to epidemiologists¹, few studies have clarified the effects of observing conditions as a mathematical expression.

Table 1 Contingency tables for all subjects, cohort, case control, and random sample data.

Full size table

According to Cornfield (1951), relative risk can be approximated using an odds ratio (OR)⁵, which is defined as

$$OR\equiv \frac{{\pi }_{1}\cdot {\mathrm{(1}-{\pi }_{1})}^{-1}}{{\pi }_{0}\cdot {\mathrm{(1}-{\pi }_{0})}^{-1}}=\frac{{N}_{11}\cdot {({N}_{1}-{N}_{11})}^{-1}}{{N}_{01}\cdot {({N}_{0}-{N}_{01})}^{-1}},$$

(7)

when π ₀ is small (so-called “rare disease assumption”). Thus, the estimator of OR (eOR), which is defined as

$$eOR\equiv \frac{{m}_{11}\cdot ({m}_{0}-{m}_{10})}{{m}_{10}\cdot ({m}_{1}-{m}_{11})},$$

(8)

is often computed instead of estimating relative risk. However, OR always overstates the association and the divergence of overstatement depends on RR or π ₀ ^{6, 7} and thus, using eOR may be misleading.

In addition, some study designs that reduce costs and estimate relative risk were proposed^8,9,10, although they still require cohorts or the likes. Few studies have focused on deriving the above equations. Zhang and Yu (1998) proposed an equation that can compute relative risk from the odds ratio¹¹ as follows:

$$RR=\frac{OR}{1+{\pi }_{0}\cdot (OR-\mathrm{1)}}=OR+{\pi }_{1}\cdot \mathrm{(1}-OR\mathrm{).}$$

(9)

This equation served as a new method to estimate relative risk using case control data; however, the estimator of π ₀ or π ₁ is still required to perform the calculation.

Other than above, the Bayesian methods also provide an equation of relative risk. When P_o and P_e are the probabilities of finding subjects for whom an event occurred and who were exposed to a factor, the Bayes’ theorem¹² can be written as

$${\pi }_{1}=\frac{{{\rm{P}}}_{{\rm{eo}}}\cdot {{\rm{P}}}_{{\rm{o}}}}{{{\rm{P}}}_{{\rm{e}}}},$$

(10)

where P_eo is the probability of finding subjects who were exposed to a factor among subjects for whom an event occurred. Because π ₀ can be written as

$${\pi }_{0}=\frac{\mathrm{(1}-{{\rm{P}}}_{{\rm{eo}}})\cdot {{\rm{P}}}_{{\rm{o}}}}{1-{{\rm{P}}}_{{\rm{e}}}},$$

(11)

then RR is

$$RR=\frac{1-{{\rm{P}}}_{{\rm{e}}}}{{{\rm{P}}}_{{\rm{e}}}}\cdot \frac{{{\rm{P}}}_{{\rm{eo}}}}{1-{{\rm{P}}}_{{\rm{eo}}}}\mathrm{.}$$

(12)

However, because P_eo and P_e will vary depending on methods of observation, precise estimation with using this equation should require follow-up data of all subjects or a carefully collected random sample of that. Moreover, because of difference in probability definitions, such as using “the probability of finding exposed subjects” rather than “the exposed proportion”, there is resistance toward the Bayesian methods among some researchers, such as traditional statisticians.

This study illustrates an obstacle, which prevent relative risk from being estimated using case control data, as a mathematical expression of inconsistency in the observations and proposes a new equation to estimate relative risk, which requires case group data and the exposed proportion. The proposed equation is derived without the Bayesian methods, and do not require the probability estimators; that is, neither control groups nor cohorts are needed. Theoretical and real-life examples that demonstrate validity and wide applicability of the proposed equation are also provided.

Results

To clarify an obstacle in estimating relative risk using case control data and derive an equation to estimate relative risk, let us introduce a proportion of observed subjects among all subjects of interest (hereinafter, “observed proportion”). For example, the number of observed individuals exposed to a factor divided by the exposed population constitutes the observed proportion of exposed individuals. As a expression, the observed proportion is the same as “the sampling proportion”, which is the proportion of a sample among all subjects of interest. However, the observed proportion cannot be estimated while the sampling proportion can be even assigned by researchers.

In cohort studies, the observed proportions can be defined as follows:

$$O{P}_{\exp }\equiv \frac{{n}_{1}}{{N}_{1}}=\frac{{n}_{11}}{{N}_{11}}+{d}_{\exp }$$

(13)

and

$$O{P}_{{\rm{unexp}}}\equiv \frac{{n}_{0}}{{N}_{0}}=\frac{{n}_{01}}{{N}_{01}}+{d}_{{\rm{unexp}}},$$

(14)

where OP _exp and OP _unexp are the observed proportions of exposed and unexposed subjects and d _exp and d _unexp are constants. Cohort studies must be designed as follows:

$$\frac{{n}_{11}}{{n}_{1}}=\frac{{N}_{11}}{{N}_{1}}\,(\iff \frac{{n}_{1}}{{N}_{1}}=\frac{{n}_{11}}{{N}_{11}})$$

(15)

and

$$\frac{{n}_{01}}{{n}_{0}}=\frac{{N}_{01}}{{N}_{0}}\,(\iff \frac{{n}_{0}}{{N}_{0}}=\frac{{n}_{01}}{{N}_{01}}),$$

(16)

such that d _exp and d _unexp are sufficiently small to be ignored. Inserting equations (13) and (14) into equation (5), we obtain

$$eRR=\frac{\{(O{P}_{\exp }-{d}_{\exp })\cdot {N}_{11}\}\cdot {(O{P}_{\exp }\cdot {N}_{1})}^{-1}}{\{(O{P}_{{\rm{unexp}}}-{d}_{{\rm{unexp}}})\cdot {N}_{01}\}\cdot {(O{P}_{{\rm{unexp}}}\cdot {N}_{0})}^{-1}}\mathrm{.}$$

(17)

When d _exp = 0 and d _unexp = 0,

$$eRR=\frac{{N}_{11}\cdot {N}_{1}^{-1}}{{N}_{01}\cdot {N}_{0}^{-1}}=\frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$

(18)

Therefore, eRR can be used to estimate the relative risk in cohort studies.

In case control studies, the observed proportions may be defined as follows:

$$O{P}_{{\rm{case}}}\equiv \frac{{m}_{11}}{{N}_{11}}=\frac{{m}_{1}-{m}_{11}}{{N}_{01}}+{d}_{{\rm{case}}}$$

(19)

and

$$O{P}_{{\rm{cont}}}\equiv \frac{{m}_{10}}{{N}_{1}-{N}_{11}}=\frac{{m}_{0}-{m}_{10}}{{N}_{0}-{N}_{01}}+{d}_{{\rm{cont}}},$$

(20)

where OP _case and OP _cont are the observed proportions of case group and control group and d _case and d _cont are constants. Case control studies must be designed as

$$\frac{{m}_{11}}{{m}_{1}-{m}_{11}}=\frac{{N}_{11}}{{N}_{01}}\,(\iff \frac{{m}_{11}}{{N}_{11}}=\frac{{m}_{1}-{m}_{11}}{{N}_{01}})$$

(21)

and

$$\frac{{m}_{10}}{{m}_{0}-{m}_{10}}=\frac{{N}_{1}-{N}_{11}}{{N}_{0}-{N}_{01}}\,(\iff \frac{{m}_{10}}{{N}_{1}-{N}_{11}}=\frac{{m}_{0}-{m}_{10}}{{N}_{0}-{N}_{01}}),$$

(22)

such that d _case and d _cont should be sufficiently small to be ignored. Substituting equations (19) and (20) in equation (8), we obtain

$$eOR=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}\cdot \{(O{P}_{{\rm{cont}}}-{d}_{{\rm{cont}}})\cdot ({N}_{0}-{N}_{01})\}}{O{P}_{{\rm{cont}}}\cdot ({N}_{1}-{N}_{11})\cdot \{(O{P}_{{\rm{case}}}-{d}_{{\rm{case}}})\cdot {N}_{01}\}}\mathrm{.}$$

(23)

When d _case = 0 and d _cont = 0,

$$eOR=\frac{{N}_{11}\cdot {({N}_{1}-{N}_{11})}^{-1}}{{N}_{01}\cdot {({N}_{0}-{N}_{01})}^{-1}}=\frac{{\pi }_{1}\cdot {\mathrm{(1}-{\pi }_{1})}^{-1}}{{\pi }_{0}\cdot {\mathrm{(1}-{\pi }_{0})}^{-1}}\mathrm{.}$$

(24)

Therefore, eOR can be used to estimate the odds ratio.

However, inserting equations (19) and (20) into equation (6), we must obtain

$$meRR=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}\cdot {\{O{P}_{{\rm{case}}}\cdot {N}_{11}+O{P}_{{\rm{cont}}}\cdot ({N}_{1}-{N}_{11})\}}^{-1}}{O{P}_{{\rm{case}}}\cdot {N}_{01}\cdot {\{O{P}_{{\rm{case}}}\cdot {N}_{01}+O{P}_{{\rm{cont}}}\cdot ({N}_{0}-{N}_{01})\}}^{-1}}$$

(25)

when d _case = 0 and d _cont = 0. Thus assuming OP _case is equivalent to OP _cont, meRR can estimate the relative risk. Unfortunately, the equivalence of OP _case and OP _cont cannot be estimated but must be tested.

Equation (25) is a mathematical expression that illustrates an obstacle to estimate relative risk using case control data. Thus, excluding both OP _case and OP _cont would clearly remove this obstacle in estimating relative risk.

Here, let us focus on the exposure odds, which is the ratio of exposed subjects to unexposed ones. Let EOC be the exposure odds in a case group and defined as

$$EOC\equiv \frac{{m}_{11}}{{m}_{1}-{m}_{11}}\mathrm{.}$$

(26)

Inserting equation (19) into equation (26) leads

$$EOC=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}}{(O{P}_{{\rm{case}}}-{d}_{{\rm{case}}})\cdot {N}_{01}}\mathrm{.}$$

(27)

When d _case = 0, substituting equations (2) and (3) into equation (27) leads

$$EOC=\frac{E}{1-E}\cdot \frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$

(28)

Assume that a random sample is selected from all subjects and eE is the proportion of exposed subjects among the sample. Thus, eE can be written as

$$eE\equiv \frac{{l}_{1}}{l},$$

(29)

where l is the size of a random sample and l ₁ is the number of exposed subjects among the sample. The observed proportion of a random sample (that is, the sampling proportion) may be defined as

$$O{P}_{{\rm{sample}}}\equiv \frac{l}{N}=\frac{{l}_{1}}{{N}_{1}}+{d}_{{\rm{sample}}},$$

(30)

where d _sample is a constant. Inserting equation (30) into equation (29),

$$eE=\frac{(O{P}_{{\rm{sample}}}-{d}_{{\rm{sample}}})\cdot {N}_{1}}{O{P}_{{\rm{sample}}}\cdot N}$$

(31)

Because the random sampling should provide

$$\frac{{N}_{1}}{N}=\frac{{l}_{1}}{l},$$

(32)

then d _sample is sufficiently small to be ignored. When d _sample = 0, inserting equation (2) into equation (31) leads

$$eE=E\mathrm{.}$$

(33)

Thus, let PRR be defined as

$$PRR\equiv \frac{l-{l}_{1}}{{l}_{1}}\cdot \frac{{m}_{11}}{{m}_{1}-{m}_{11}}\mathrm{.}$$

(34)

Substituting equations (26) and (29) into equation (34) leads

$$PRR=\frac{1-eE}{eE}\cdot EOC\mathrm{.}$$

(35)

Both d _case and d _sample should be sufficiently small to be ignored when a random sample is selected from all subjects of whom a case group represents an event-occurring part. When d _case = 0 and d _sample = 0, combining equations (28), (33), and (35), we must obtain

$$PRR=\frac{1-E}{E}\cdot (\frac{E}{1-E}\cdot \frac{{\pi }_{1}}{{\pi }_{0}})=\frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$

(36)

Therefore, PRR must be an estimator of relative risk when subjects among whom a case group is observed and subjects from whom a random sample is selected are the same.

This estimator is computed from the exposure odds in a case group and those in all subjects to be studied, and thus, no control group is required. In addition, the estimation is performed without a cohort.

Equation (34) is quite similar to equation (12), but note that PRR was derived without using the Bayesian methods and can be applicable to more general data: data of a case group and a random sample.

Therefore, by considering the observed proportions, an observational inconsistency preventing relative risk from being estimated in the case control studies was clarified as a mathematical expression, and a new equation to estimate relative risk using the exposed proportion and a case group was proposed; the proposed equation requires neither control groups nor cohorts.

Application to Model Data

Suppose the probabilities of disease Y developing among people exposed and unexposed to chemical compound X are 0.03 and 0.01 (i.e., relative risk is 3).

When the proportion of exposed people in a city, which has a population of 100000, is 30%, researchers should observe the following data: 30 patients are found among 1000 exposed participants and 10 patients among 1000 unexposed participants during a follow-up period; 180 exposed patients are observed in a case group of 320 and 97 exposed participants are observed in a control group of 328; and 300 exposed people are found in a random sample of 1000 participants (see Table 2). The observed proportions of the case and control groups, which are unavailable for the researchers, are then 1/5 and 1/300.

Table 2 Model data: population, cohort, case control, and census data.

Full size table

Thus, estimating relative risk from cohort data must be

$$eRR=\frac{30/1000}{10/1000}=\mathrm{3.00.}$$

(37)

Estimating odds ratio from case-control data is

$$eOR=\frac{180\times 231}{140\times 97}=3.06$$

(38)

and meRR should be

$$meRR=\frac{180/\mathrm{(180}+\mathrm{97)}}{140/\mathrm{(140}+\mathrm{231)}}=\mathrm{1.72.}$$

(39)

Finally, the proposed estimator PRR can be computed as

$$PRR=\frac{1000-300}{300}\times \frac{180}{140}=\mathrm{3.00.}$$

(40)

Note that the proposed equation will estimate the relative risk as precisely as the estimation in a cohort study but does not require follow-up group data, such as cohort data.

Confidence Interval

The proposed estimator PRR is the ratio of two odds.

On estimating the odds ratio as $eOR={m}_{11}\cdot ({m}_{0}-{m}_{10})\cdot {m}_{10}^{-1}\cdot {({m}_{1}-{m}_{m11})}^{-1}$, the following eSE(ln eOR) is known as the maximum likelihood estimator for the standard deviation of ln eOR ¹³:

$$eSE({\rm{l}}{\rm{n}}eOR)=\sqrt{\frac{1}{{m}_{11}}+\frac{1}{{m}_{10}}+\frac{1}{{m}_{1}-{m}_{11}}+\frac{1}{{m}_{0}-{m}_{10}}}.$$

(41)

Let us apply this formula to PRR for estimating confidence interval (CI).

When these two odds are nonzero, the estimator of the standard deviation of the logarithm of PRR will be

$$eSE(\mathrm{ln}\,PRR)=\sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}}\mathrm{.}$$

(42)

Thus, the following formulas would provide the 100(1 − α)% confidence limits for PRR.

$$LCL=\exp (\mathrm{ln}\,PRR-{Z}_{\alpha \mathrm{/2}}\cdot \sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}})$$

(43)

and

$$UCL={\rm{e}}{\rm{x}}{\rm{p}}({\rm{l}}{\rm{n}}PRR+{Z}_{\alpha \mathrm{/2}}\cdot \sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}}),$$

(44)

where LCL and UCL are the lower and upper limits of CI and Z _α/2 represents the α/2 point of the normal distribution, such as 1.96 for 95% interval.

To prove this estimators for CI, computer simulation was conducted. It is assumed that 30% of the population 100000 was exposed. The total number of exposed and unexposed people for whom an event occurred was determined by using two sets of risks, in which the relative risk is 3: π ₁ = 0.03 and π ₀ = 0.01 or π ₁ = 0.3 and π ₀ = 0.1. Samples, exposed case-groups, and unexposed case-groups were picked from the corresponding people based on each six sets of the observed proportions, and the CI was computed each time. Each set of six proportions was chosen so that each group should be close to the size used generally in research.

Table 3 demonstrates the number of times the true relative risk was included in the 95% CI in each one million trials. It is shown that the true value (relative risk: 3) is included at a rate of approximately 95%; this method will well estimate CI.

Table 3 Number of times the true value (relative risk: 3.0) was included in 95% confidence interval in each one million trials.

Full size table

Application to Real-Life Data

The suicide rate among the youth of Japan is considerably high and suicide accounts for nearly half of the causes of death among those in their twenties¹⁴. Meanwhile, unemployment is suggested to increase suicide risk^{2, 15}.

The proposed equation was applied to the latest suicide and employment data in Japan as real-life data, and confidence intervals at 95% were also estimated. The prevalence of suicide and employment among individuals in their twenties in 2015 was obtained from a statistics report published by the Ministry of Health, Labour and Welfare¹⁶ and the Labour Force Survey¹⁷. The data used are presented in Table 4. Suicide victims who were unemployed are treated as “No occupation”. Although the Labour Force Survey was conducted in a specific month in 2015 using random sampling, the indicators should represent the characteristics of the Japanese population in that year.

Table 4 Employment (A) and suicide rate (B) among population aged 20–29 years in Japan, 2015.

Full size table

The estimation of relative risk for unemployed women is

$$PRR=\frac{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}{0.23\times 1\,000\,000}\times \frac{19}{621-19}=\mathrm{0.82,}$$

(45)

and the 95% confidence interval for this relative risk can be estimated as follows:

$$LCL=\exp (\mathrm{ln}\,0.82-1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}+\frac{1}{0.23\times 1\,000\,000}+\frac{1}{19}+\frac{1}{621-19}})=0.52$$

(46)

and

$$UCL=\exp (\mathrm{ln}\,0.82+1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}+\frac{1}{0.23\times 1\,000\,000}+\frac{1}{19}+\frac{1}{621-19}})=\mathrm{1.30.}$$

(47)

The estimation for men can be done in the same way. Thus, the estimated relative risk is 0.82 (95% CI: 0.52–1.30) for women and 0.78 (95% CI: 0.60–1.00) for men. Unemployment did not increase the risk of suicide.

Incidentally, the proportions of victims who were classified under “No occupation” are comparatively large for both women and men, and thus, the situation of no occupation might increase risk. Let us, on trial, assume that a person who is neither employed nor attending school is the same as an individual with no occupation. The number of women in no occupation is then 1.04 million (6.21 − 4.40 − 0.77 = 1.04); the estimates of the relative risk and confidence limits for women in no occupation can be computed as follows:

$$PRR=\frac{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}{1.04\times 1\,000\,000}\times \frac{290}{621-290}=\mathrm{4.36,}$$

(48)

$$LCL=\exp (\mathrm{ln}\,4.36-1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}+\frac{1}{1.04\times 1\,000\,000}+\frac{1}{290}+\frac{1}{621-290}})=3.72$$

(49)

and

$$UCL=\exp (\mathrm{ln}\,4.36+1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}+\frac{1}{1.04\times 1\,000\,000}+\frac{1}{290}+\frac{1}{621-290}})=\mathrm{5.10.}$$

(50)

For men, the number is 0.53 million (6.56 − 5.02 − 1.01 = 0.53); the estimation can be done in the same way. Thus, the relative risk would be estimated to be 4.36 (95% CI: 3.72–5.10) for women and 4.20 (95% CI: 3.78–4.67) for men.

Although the calculations were not adjusted and the definition of no occupation is tentative, these results suggest that being neither employed nor educated may substantially increase the risk of suicide among the young Japanese population. It might be also suggested that the Japanese governments should consider the indicator of unemployment.

Note that relative risks were estimated without a fresh cohort study, which is generally difficult to conduct.

Discussion

Evaluating a change in risk of an event occurring caused by exposure to (or the presence of/occupation as) a factor is generally attempted in many research fields, such as epidemiology, medicine, social science, politics, and product development. Relative risk, which is the ratio of the risks, can be easily interpreted and widely used, but has been believed to require large-scale epidemiological research or a smaller cohort study designed for the estimation. A case control study, which compares the case and control group, is more convenient than the cohort study, but relative risk cannot be estimated using case control data. The estimator of the odds ratio, which can be calculated using case control data, is often used instead of relative risk, because the former can sometimes approximate the latter. A method to calculate relative risk using the odds ratio was also proposed. Unfortunately, the odds ratio may be misleading to interpret the change in risk and calculating relative risk using the ratio still requires either estimator of risks. Furthermore, control group data are still required, burdening researchers in terms of cost and effort.

In this study, introducing the observed proportion, an observational inconsistency preventing relative risk from being estimated in case control studies was clarified as a mathematical expression; by excluding this inconsistency, a new equation that estimates relative risk using case data was proposed. The proposed equation, which serves as an estimator of relative risk itself without approximation, requires only the exposure odds of a case group and that of all subjects to be studied; no control group is then needed. The calculation is done without using risk estimators, and thus, cohorts are also not needed. Therefore, evaluating a change in risk can be easily conducted without additional costs, efforts, and time generally needed in a fresh study. Moreover, the proposed equation was derived without using the Bayesian probabilities nor the Bayes’ theorem and is free from researcher’s resistance toward the Bayesian methods.

A method of estimating confidence limits of the proposed estimator was also presented and proved to estimate that successfully. Although there may be a more appropriate estimation method of confidence interval, pursuing the best method is beyond the scope of this paper.

Once the exposed proportions by various characteristics are investigated, changes in every risk associated with the exposure will able to be estimated by applying the proposed equation to appropriate case group data. Even the estimation of a change in risk, which has been believed to be impossible, can be done, such as the adverse effect of a social situation on the suicide rate, the effect of a policy on birthrate, or the impact of a new drug for a pandemic on survival rate. There are two caveats: the case group must comprise subjects from whom the exposed proportion was computed and the exposure to the factor must precede the occurring event. Existing statistical methods, such as adjusting confounding factors, should be also applicable for the proposed estimator.

Although the proposed equation is quite simple, its advantages will not only reduce the costs of epidemiological studies but may also make itself a powerful tool in almost all research fields that treat risks.

References

Andrade, C. Understanding relative risk, odds ratio, and related terms: as simple as it can get. J Clin Psychiatry. 76, 857–861, doi:10.4088/JCP.15f10150 (2015).
Article Google Scholar
Milner, A., Page, A. & LaMontagne, A. D. Long-term unemployment and suicide: A systematic review and meta-Analysis. PLoS One. 8, e51333, doi:10.1371/journal.pone.0051333 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Rothman, K. J., Greenland, S., & Lash, T. L. Definition in Modern epidemiology. (3rd ed.) 53–54 (Lippincott Williams & Wilkins, 2008).
Agresti, A. Definition and expression in Categorical Data Analysis. (3rd ed.) 44–45 (John Wiley & Sons, 2011).
Cornfield, J. A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. J Natl Cancer Inst. 11, 1269–1275 (1951).
CAS PubMed Google Scholar
Davies, H. T., Crombie, I. K. & Tavakoli, M. When can odds ratios mislead? BMJ. 316, 989–991, doi:10.1136/bmj.316.7136.989 (1998).
Article CAS PubMed PubMed Central Google Scholar
McNutt, L. A., Wu, C., Xue, X. & Hafner, J. P. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol. 157, 940–943, doi:10.1093/aje/kwg074 (2003).
Article PubMed Google Scholar
Liddell, F. D. K., McDonald, J. C., Thomas, D. C. & Cunliffe, S. V. Methods of cohort analysis: appraisal by application to asbestos mining. J R Stat Soc Ser A. 140, 469–491, doi:10.2307/2345280 (1977).
Article Google Scholar
Maclure, M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol. 133, 144–153, doi:10.1093/oxfordjournals.aje.a115853 (1991).
Article CAS PubMed Google Scholar
Prentice, R. L. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 73, 1–11, doi:10.1093/biomet/73.1.1 (1986).
Article MathSciNet MATH Google Scholar
Zhang, J. & Yu, K. F. What’s the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 280, 1690–1691, doi:10.1001/jama.280.19.1690 (1998).
Article CAS PubMed Google Scholar
Gelman, A., Carlin, J. B., Stern, H. B., & Rubin, D. B. An equation in Bayesian data analysis (3rd ed.) 6–7 (Chapman & Hall/CRC, 2014).
Sahai, H & Khurshid, A. Equations in Statistics in Epidemiology: Methods, Techniques and Applications 21–22 (CRC Press LCC, 1995).
Ministry of Health, Labour and Welfare. Annual Health, Labour and Welfare Report 2013–2014 (Summary) http://www.mhlw.go.jp/english/wp/wp-hw8/dl/summary.pdf (2015).
Maki, N. & Martikainen, P. A register-based study on excess suicide mortality among unemployed men and women during different levels of unemployment in Finland. J Epidemiol Community Health. 66, 302–307, doi:10.1136/jech.2009.105908 (2012).
Article PubMed Google Scholar
Ministry of Health, Labour and Welfare. Heisei 27-nenn chu ni okeru jisatsu no jyoukyou [Statistics of suicide in Japan 2015] http://www.mhlw.go.jp/file/06-Seisakujouhou-12200000-Shakaiengokyokushougaihokenfukushibu/h27kakutei-2syou_2.pdf (2016) [in Japanese].
Ministry of Internal Affairs and Communications. Annual Report on the Labour Force Survey 2015 http://www.stat.go.jp/english/data/roudou/report/2015/index.htm (2016).

Download references

Acknowledgements

The author would like to thank Editage (www.editage.jp) for English language editing and also to thank anonymous reviewers for helpful comments.

Author information

Authors and Affiliations

Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, 30-1 Oyaguchi-kamicho, Itabashi City, Tokyo, 173-8610, Japan
Yoichi Yada

Authors

Yoichi Yada
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.Y. conducted the study and wrote the paper.

Corresponding author

Correspondence to Yoichi Yada.

Ethics declarations

Competing Interests

The author declares that he has no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yada, Y. Method to estimate relative risk using exposed proportion and case group data. Sci Rep 7, 2131 (2017). https://doi.org/10.1038/s41598-017-02302-1

Download citation

Received: 19 December 2016
Accepted: 10 April 2017
Published: 18 May 2017
DOI: https://doi.org/10.1038/s41598-017-02302-1

This article is cited by

Bevacizumab and erlotinib versus bevacizumab for colorectal cancer treatment: systematic review and meta-analysis
- Sara Kaveh
- Parvin Ebrahimi
- Kourosh Sayehmiri
International Journal of Clinical Pharmacy (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Method to estimate relative risk using exposed proportion and case group data

Subjects

Abstract

Similar content being viewed by others

G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study

Need for discriminating between diagnostic and screening efficacy to estimate a biomarker based on case control and cohort studies

Temporal bias in case-control design: preventing reliable predictions of the future

Introduction

Results

Application to Model Data

Confidence Interval

Application to Real-Life Data

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Bevacizumab and erlotinib versus bevacizumab for colorectal cancer treatment: systematic review and meta-analysis

Comments

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study

Need for discriminating between diagnostic and screening efficacy to estimate a biomarker based on case control and cohort studies

Temporal bias in case-control design: preventing reliable predictions of the future

Introduction

Results

Application to Model Data

Confidence Interval

Application to Real-Life Data

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Bevacizumab and erlotinib versus bevacizumab for colorectal cancer treatment: systematic review and meta-analysis

Comments

Search

Quick links