Introduction

A change in risk of an event occurring associated with exposure to a factor is generally studied in many fields, such as medicine and social science1, 2. Relative risk (RR), also known as “rate ratio”, is widely used as a measure of association and can be interpreted intuitively3, 4 because of its simple definition:

$$RR\equiv \frac{{\pi }_{1}}{{\pi }_{0}},$$
(1)

where π 1 and π 0 are the probabilities of an event occurring (i.e., risks) for subjects exposed and unexposed to a factor. Estimating RR requires the estimators of both π 1 and π 0, such as the prevalence or cumulative incidence rate.

The probability estimators can be calculated using existing data of large-scale epidemiological studies or should be obtained from a smaller study designed for the estimation. Let N be the total number of subjects to be studied, such as population, and N 1 and N 0 be the exposed and unexposed parts of N. The N 1 is written as

$${N}_{1}=N-{N}_{0}=E\cdot N,$$
(2)

where E is the exposed proportion. The probabilities of an event occurring can be written as

$${\pi }_{1}=\frac{{N}_{11}}{{N}_{1}}\,{\rm{and}}\,{\pi }_{0}=\frac{{N}_{01}}{{N}_{0}},$$
(3)

where N 11 and N 01 are the numbers of subjects for whom an event occurred among N 1 and N 0. When p 1 and p 0 are the estimators of π 1 and π 0, they should be defined as

$${p}_{1}\equiv \frac{{n}_{11}}{{n}_{1}}\,{\rm{and}}\,{p}_{0}\equiv \frac{{n}_{01}}{{n}_{0}},$$
(4)

where n 1 and n 0 are the observed numbers of exposed and unexposed subjects and n 11 and n 01 are the numbers of subjects for whom the event occurred among n 1 and n 0. Thus, eRR, which is defined as

$$eRR\equiv \frac{{n}_{11}\cdot {n}_{1}^{-1}}{{n}_{01}\cdot {n}_{0}^{-1}},$$
(5)

is used as the estimator of relative risk. The groups of n 11 and n 01 can be found in groups of exposed and unexposed subjects, who were followed to the event occurring (called “cohort”). However, appropriate cohorts may be occasionally found in epidemiological survey results or should be obtained from a fresh study designed for the purpose (i.e., cohort study).

Unfortunately, few existing results provide appropriate cohorts and long-term observations of cohorts, for example, over several years or decades, are likely to be costly and time consuming, and thus, estimating relative risk can be burdensome for researchers. Meanwhile, because case groups are commonly observed, studies comparing them to a control group (case control study) and estimating the change in risk tend to be less costly and time consuming. Although a case control study is often conducted, estimating relative risk using case control data is hindered. To demonstrate, let m 1 and m 0 be the numbers of observed subjects in a case group and control group and m 11 and m 01 be the numbers of exposed subjects in the case and control groups (see Table 1). When meRR is defined similarly to the estimator of relative risk as

$$meRR\equiv \frac{{m}_{11}\cdot {({m}_{11}+{m}_{10})}^{-1}}{({m}_{1}-{m}_{11})\cdot {({m}_{1}-{m}_{11}+{m}_{0}-{m}_{10})}^{-1}},$$
(6)

meRR may be misused as an estimator of relative risk but will largely vary with observing conditions that researchers can designate, such as the size of m 1. Moreover, researchers cannot perceive the effects of those observing conditions. Thus, meRR is not appropriate for the estimation. Although this obstacle for estimating relative risk caused by observation is well known to epidemiologists1, few studies have clarified the effects of observing conditions as a mathematical expression.

Table 1 Contingency tables for all subjects, cohort, case control, and random sample data.

According to Cornfield (1951), relative risk can be approximated using an odds ratio (OR)5, which is defined as

$$OR\equiv \frac{{\pi }_{1}\cdot {\mathrm{(1}-{\pi }_{1})}^{-1}}{{\pi }_{0}\cdot {\mathrm{(1}-{\pi }_{0})}^{-1}}=\frac{{N}_{11}\cdot {({N}_{1}-{N}_{11})}^{-1}}{{N}_{01}\cdot {({N}_{0}-{N}_{01})}^{-1}},$$
(7)

when π 0 is small (so-called “rare disease assumption”). Thus, the estimator of OR (eOR), which is defined as

$$eOR\equiv \frac{{m}_{11}\cdot ({m}_{0}-{m}_{10})}{{m}_{10}\cdot ({m}_{1}-{m}_{11})},$$
(8)

is often computed instead of estimating relative risk. However, OR always overstates the association and the divergence of overstatement depends on RR or π 0 6, 7 and thus, using eOR may be misleading.

In addition, some study designs that reduce costs and estimate relative risk were proposed8,9,10, although they still require cohorts or the likes. Few studies have focused on deriving the above equations. Zhang and Yu (1998) proposed an equation that can compute relative risk from the odds ratio11 as follows:

$$RR=\frac{OR}{1+{\pi }_{0}\cdot (OR-\mathrm{1)}}=OR+{\pi }_{1}\cdot \mathrm{(1}-OR\mathrm{).}$$
(9)

This equation served as a new method to estimate relative risk using case control data; however, the estimator of π 0 or π 1 is still required to perform the calculation.

Other than above, the Bayesian methods also provide an equation of relative risk. When Po and Pe are the probabilities of finding subjects for whom an event occurred and who were exposed to a factor, the Bayes’ theorem12 can be written as

$${\pi }_{1}=\frac{{{\rm{P}}}_{{\rm{eo}}}\cdot {{\rm{P}}}_{{\rm{o}}}}{{{\rm{P}}}_{{\rm{e}}}},$$
(10)

where Peo is the probability of finding subjects who were exposed to a factor among subjects for whom an event occurred. Because π 0 can be written as

$${\pi }_{0}=\frac{\mathrm{(1}-{{\rm{P}}}_{{\rm{eo}}})\cdot {{\rm{P}}}_{{\rm{o}}}}{1-{{\rm{P}}}_{{\rm{e}}}},$$
(11)

then RR is

$$RR=\frac{1-{{\rm{P}}}_{{\rm{e}}}}{{{\rm{P}}}_{{\rm{e}}}}\cdot \frac{{{\rm{P}}}_{{\rm{eo}}}}{1-{{\rm{P}}}_{{\rm{eo}}}}\mathrm{.}$$
(12)

However, because Peo and Pe will vary depending on methods of observation, precise estimation with using this equation should require follow-up data of all subjects or a carefully collected random sample of that. Moreover, because of difference in probability definitions, such as using “the probability of finding exposed subjects” rather than “the exposed proportion”, there is resistance toward the Bayesian methods among some researchers, such as traditional statisticians.

This study illustrates an obstacle, which prevent relative risk from being estimated using case control data, as a mathematical expression of inconsistency in the observations and proposes a new equation to estimate relative risk, which requires case group data and the exposed proportion. The proposed equation is derived without the Bayesian methods, and do not require the probability estimators; that is, neither control groups nor cohorts are needed. Theoretical and real-life examples that demonstrate validity and wide applicability of the proposed equation are also provided.

Results

To clarify an obstacle in estimating relative risk using case control data and derive an equation to estimate relative risk, let us introduce a proportion of observed subjects among all subjects of interest (hereinafter, “observed proportion”). For example, the number of observed individuals exposed to a factor divided by the exposed population constitutes the observed proportion of exposed individuals. As a expression, the observed proportion is the same as “the sampling proportion”, which is the proportion of a sample among all subjects of interest. However, the observed proportion cannot be estimated while the sampling proportion can be even assigned by researchers.

In cohort studies, the observed proportions can be defined as follows:

$$O{P}_{\exp }\equiv \frac{{n}_{1}}{{N}_{1}}=\frac{{n}_{11}}{{N}_{11}}+{d}_{\exp }$$
(13)

and

$$O{P}_{{\rm{unexp}}}\equiv \frac{{n}_{0}}{{N}_{0}}=\frac{{n}_{01}}{{N}_{01}}+{d}_{{\rm{unexp}}},$$
(14)

where OP exp and OP unexp are the observed proportions of exposed and unexposed subjects and d exp and d unexp are constants. Cohort studies must be designed as follows:

$$\frac{{n}_{11}}{{n}_{1}}=\frac{{N}_{11}}{{N}_{1}}\,(\iff \frac{{n}_{1}}{{N}_{1}}=\frac{{n}_{11}}{{N}_{11}})$$
(15)

and

$$\frac{{n}_{01}}{{n}_{0}}=\frac{{N}_{01}}{{N}_{0}}\,(\iff \frac{{n}_{0}}{{N}_{0}}=\frac{{n}_{01}}{{N}_{01}}),$$
(16)

such that d exp and d unexp are sufficiently small to be ignored. Inserting equations (13) and (14) into equation (5), we obtain

$$eRR=\frac{\{(O{P}_{\exp }-{d}_{\exp })\cdot {N}_{11}\}\cdot {(O{P}_{\exp }\cdot {N}_{1})}^{-1}}{\{(O{P}_{{\rm{unexp}}}-{d}_{{\rm{unexp}}})\cdot {N}_{01}\}\cdot {(O{P}_{{\rm{unexp}}}\cdot {N}_{0})}^{-1}}\mathrm{.}$$
(17)

When d exp = 0 and d unexp = 0,

$$eRR=\frac{{N}_{11}\cdot {N}_{1}^{-1}}{{N}_{01}\cdot {N}_{0}^{-1}}=\frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$
(18)

Therefore, eRR can be used to estimate the relative risk in cohort studies.

In case control studies, the observed proportions may be defined as follows:

$$O{P}_{{\rm{case}}}\equiv \frac{{m}_{11}}{{N}_{11}}=\frac{{m}_{1}-{m}_{11}}{{N}_{01}}+{d}_{{\rm{case}}}$$
(19)

and

$$O{P}_{{\rm{cont}}}\equiv \frac{{m}_{10}}{{N}_{1}-{N}_{11}}=\frac{{m}_{0}-{m}_{10}}{{N}_{0}-{N}_{01}}+{d}_{{\rm{cont}}},$$
(20)

where OP case and OP cont are the observed proportions of case group and control group and d case and d cont are constants. Case control studies must be designed as

$$\frac{{m}_{11}}{{m}_{1}-{m}_{11}}=\frac{{N}_{11}}{{N}_{01}}\,(\iff \frac{{m}_{11}}{{N}_{11}}=\frac{{m}_{1}-{m}_{11}}{{N}_{01}})$$
(21)

and

$$\frac{{m}_{10}}{{m}_{0}-{m}_{10}}=\frac{{N}_{1}-{N}_{11}}{{N}_{0}-{N}_{01}}\,(\iff \frac{{m}_{10}}{{N}_{1}-{N}_{11}}=\frac{{m}_{0}-{m}_{10}}{{N}_{0}-{N}_{01}}),$$
(22)

such that d case and d cont should be sufficiently small to be ignored. Substituting equations (19) and (20) in equation (8), we obtain

$$eOR=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}\cdot \{(O{P}_{{\rm{cont}}}-{d}_{{\rm{cont}}})\cdot ({N}_{0}-{N}_{01})\}}{O{P}_{{\rm{cont}}}\cdot ({N}_{1}-{N}_{11})\cdot \{(O{P}_{{\rm{case}}}-{d}_{{\rm{case}}})\cdot {N}_{01}\}}\mathrm{.}$$
(23)

When d case = 0 and d cont = 0,

$$eOR=\frac{{N}_{11}\cdot {({N}_{1}-{N}_{11})}^{-1}}{{N}_{01}\cdot {({N}_{0}-{N}_{01})}^{-1}}=\frac{{\pi }_{1}\cdot {\mathrm{(1}-{\pi }_{1})}^{-1}}{{\pi }_{0}\cdot {\mathrm{(1}-{\pi }_{0})}^{-1}}\mathrm{.}$$
(24)

Therefore, eOR can be used to estimate the odds ratio.

However, inserting equations (19) and (20) into equation (6), we must obtain

$$meRR=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}\cdot {\{O{P}_{{\rm{case}}}\cdot {N}_{11}+O{P}_{{\rm{cont}}}\cdot ({N}_{1}-{N}_{11})\}}^{-1}}{O{P}_{{\rm{case}}}\cdot {N}_{01}\cdot {\{O{P}_{{\rm{case}}}\cdot {N}_{01}+O{P}_{{\rm{cont}}}\cdot ({N}_{0}-{N}_{01})\}}^{-1}}$$
(25)

when d case = 0 and d cont = 0. Thus assuming OP case is equivalent to OP cont, meRR can estimate the relative risk. Unfortunately, the equivalence of OP case and OP cont cannot be estimated but must be tested.

Equation (25) is a mathematical expression that illustrates an obstacle to estimate relative risk using case control data. Thus, excluding both OP case and OP cont would clearly remove this obstacle in estimating relative risk.

Here, let us focus on the exposure odds, which is the ratio of exposed subjects to unexposed ones. Let EOC be the exposure odds in a case group and defined as

$$EOC\equiv \frac{{m}_{11}}{{m}_{1}-{m}_{11}}\mathrm{.}$$
(26)

Inserting equation (19) into equation (26) leads

$$EOC=\frac{O{P}_{{\rm{case}}}\cdot {N}_{11}}{(O{P}_{{\rm{case}}}-{d}_{{\rm{case}}})\cdot {N}_{01}}\mathrm{.}$$
(27)

When d case = 0, substituting equations (2) and (3) into equation (27) leads

$$EOC=\frac{E}{1-E}\cdot \frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$
(28)

Assume that a random sample is selected from all subjects and eE is the proportion of exposed subjects among the sample. Thus, eE can be written as

$$eE\equiv \frac{{l}_{1}}{l},$$
(29)

where l is the size of a random sample and l 1 is the number of exposed subjects among the sample. The observed proportion of a random sample (that is, the sampling proportion) may be defined as

$$O{P}_{{\rm{sample}}}\equiv \frac{l}{N}=\frac{{l}_{1}}{{N}_{1}}+{d}_{{\rm{sample}}},$$
(30)

where d sample is a constant. Inserting equation (30) into equation (29),

$$eE=\frac{(O{P}_{{\rm{sample}}}-{d}_{{\rm{sample}}})\cdot {N}_{1}}{O{P}_{{\rm{sample}}}\cdot N}$$
(31)

Because the random sampling should provide

$$\frac{{N}_{1}}{N}=\frac{{l}_{1}}{l},$$
(32)

then d sample is sufficiently small to be ignored. When d sample = 0, inserting equation (2) into equation (31) leads

$$eE=E\mathrm{.}$$
(33)

Thus, let PRR be defined as

$$PRR\equiv \frac{l-{l}_{1}}{{l}_{1}}\cdot \frac{{m}_{11}}{{m}_{1}-{m}_{11}}\mathrm{.}$$
(34)

Substituting equations (26) and (29) into equation (34) leads

$$PRR=\frac{1-eE}{eE}\cdot EOC\mathrm{.}$$
(35)

Both d case and d sample should be sufficiently small to be ignored when a random sample is selected from all subjects of whom a case group represents an event-occurring part. When d case = 0 and d sample = 0, combining equations (28), (33), and (35), we must obtain

$$PRR=\frac{1-E}{E}\cdot (\frac{E}{1-E}\cdot \frac{{\pi }_{1}}{{\pi }_{0}})=\frac{{\pi }_{1}}{{\pi }_{0}}\mathrm{.}$$
(36)

Therefore, PRR must be an estimator of relative risk when subjects among whom a case group is observed and subjects from whom a random sample is selected are the same.

This estimator is computed from the exposure odds in a case group and those in all subjects to be studied, and thus, no control group is required. In addition, the estimation is performed without a cohort.

Equation (34) is quite similar to equation (12), but note that PRR was derived without using the Bayesian methods and can be applicable to more general data: data of a case group and a random sample.

Therefore, by considering the observed proportions, an observational inconsistency preventing relative risk from being estimated in the case control studies was clarified as a mathematical expression, and a new equation to estimate relative risk using the exposed proportion and a case group was proposed; the proposed equation requires neither control groups nor cohorts.

Application to Model Data

Suppose the probabilities of disease Y developing among people exposed and unexposed to chemical compound X are 0.03 and 0.01 (i.e., relative risk is 3).

When the proportion of exposed people in a city, which has a population of 100000, is 30%, researchers should observe the following data: 30 patients are found among 1000 exposed participants and 10 patients among 1000 unexposed participants during a follow-up period; 180 exposed patients are observed in a case group of 320 and 97 exposed participants are observed in a control group of 328; and 300 exposed people are found in a random sample of 1000 participants (see Table 2). The observed proportions of the case and control groups, which are unavailable for the researchers, are then 1/5 and 1/300.

Table 2 Model data: population, cohort, case control, and census data.

Thus, estimating relative risk from cohort data must be

$$eRR=\frac{30/1000}{10/1000}=\mathrm{3.00.}$$
(37)

Estimating odds ratio from case-control data is

$$eOR=\frac{180\times 231}{140\times 97}=3.06$$
(38)

and meRR should be

$$meRR=\frac{180/\mathrm{(180}+\mathrm{97)}}{140/\mathrm{(140}+\mathrm{231)}}=\mathrm{1.72.}$$
(39)

Finally, the proposed estimator PRR can be computed as

$$PRR=\frac{1000-300}{300}\times \frac{180}{140}=\mathrm{3.00.}$$
(40)

Note that the proposed equation will estimate the relative risk as precisely as the estimation in a cohort study but does not require follow-up group data, such as cohort data.

Confidence Interval

The proposed estimator PRR is the ratio of two odds.

On estimating the odds ratio as \(eOR={m}_{11}\cdot ({m}_{0}-{m}_{10})\cdot {m}_{10}^{-1}\cdot {({m}_{1}-{m}_{m11})}^{-1}\), the following eSE(ln eOR) is known as the maximum likelihood estimator for the standard deviation of ln eOR 13:

$$eSE({\rm{l}}{\rm{n}}eOR)=\sqrt{\frac{1}{{m}_{11}}+\frac{1}{{m}_{10}}+\frac{1}{{m}_{1}-{m}_{11}}+\frac{1}{{m}_{0}-{m}_{10}}}.$$
(41)

Let us apply this formula to PRR for estimating confidence interval (CI).

When these two odds are nonzero, the estimator of the standard deviation of the logarithm of PRR will be

$$eSE(\mathrm{ln}\,PRR)=\sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}}\mathrm{.}$$
(42)

Thus, the following formulas would provide the 100(1 − α)% confidence limits for PRR.

$$LCL=\exp (\mathrm{ln}\,PRR-{Z}_{\alpha \mathrm{/2}}\cdot \sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}})$$
(43)

and

$$UCL={\rm{e}}{\rm{x}}{\rm{p}}({\rm{l}}{\rm{n}}PRR+{Z}_{\alpha \mathrm{/2}}\cdot \sqrt{\frac{1}{l-{l}_{1}}+\frac{1}{{l}_{1}}+\frac{1}{{m}_{11}}+\frac{1}{{m}_{1}-{m}_{11}}}),$$
(44)

where LCL and UCL are the lower and upper limits of CI and Z α/2 represents the α/2 point of the normal distribution, such as 1.96 for 95% interval.

To prove this estimators for CI, computer simulation was conducted. It is assumed that 30% of the population 100000 was exposed. The total number of exposed and unexposed people for whom an event occurred was determined by using two sets of risks, in which the relative risk is 3: π 1 = 0.03 and π 0 = 0.01 or π 1 = 0.3 and π 0 = 0.1. Samples, exposed case-groups, and unexposed case-groups were picked from the corresponding people based on each six sets of the observed proportions, and the CI was computed each time. Each set of six proportions was chosen so that each group should be close to the size used generally in research.

Table 3 demonstrates the number of times the true relative risk was included in the 95% CI in each one million trials. It is shown that the true value (relative risk: 3) is included at a rate of approximately 95%; this method will well estimate CI.

Table 3 Number of times the true value (relative risk: 3.0) was included in 95% confidence interval in each one million trials.

Application to Real-Life Data

The suicide rate among the youth of Japan is considerably high and suicide accounts for nearly half of the causes of death among those in their twenties14. Meanwhile, unemployment is suggested to increase suicide risk2, 15.

The proposed equation was applied to the latest suicide and employment data in Japan as real-life data, and confidence intervals at 95% were also estimated. The prevalence of suicide and employment among individuals in their twenties in 2015 was obtained from a statistics report published by the Ministry of Health, Labour and Welfare16 and the Labour Force Survey17. The data used are presented in Table 4. Suicide victims who were unemployed are treated as “No occupation”. Although the Labour Force Survey was conducted in a specific month in 2015 using random sampling, the indicators should represent the characteristics of the Japanese population in that year.

Table 4 Employment (A) and suicide rate (B) among population aged 20–29 years in Japan, 2015.

The estimation of relative risk for unemployed women is

$$PRR=\frac{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}{0.23\times 1\,000\,000}\times \frac{19}{621-19}=\mathrm{0.82,}$$
(45)

and the 95% confidence interval for this relative risk can be estimated as follows:

$$LCL=\exp (\mathrm{ln}\,0.82-1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}+\frac{1}{0.23\times 1\,000\,000}+\frac{1}{19}+\frac{1}{621-19}})=0.52$$
(46)

and

$$UCL=\exp (\mathrm{ln}\,0.82+1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{0.23)}\times 1\,000\,000}+\frac{1}{0.23\times 1\,000\,000}+\frac{1}{19}+\frac{1}{621-19}})=\mathrm{1.30.}$$
(47)

The estimation for men can be done in the same way. Thus, the estimated relative risk is 0.82 (95% CI: 0.52–1.30) for women and 0.78 (95% CI: 0.60–1.00) for men. Unemployment did not increase the risk of suicide.

Incidentally, the proportions of victims who were classified under “No occupation” are comparatively large for both women and men, and thus, the situation of no occupation might increase risk. Let us, on trial, assume that a person who is neither employed nor attending school is the same as an individual with no occupation. The number of women in no occupation is then 1.04 million (6.21 − 4.40 − 0.77 = 1.04); the estimates of the relative risk and confidence limits for women in no occupation can be computed as follows:

$$PRR=\frac{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}{1.04\times 1\,000\,000}\times \frac{290}{621-290}=\mathrm{4.36,}$$
(48)
$$LCL=\exp (\mathrm{ln}\,4.36-1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}+\frac{1}{1.04\times 1\,000\,000}+\frac{1}{290}+\frac{1}{621-290}})=3.72$$
(49)

and

$$UCL=\exp (\mathrm{ln}\,4.36+1.96\cdot \sqrt{\frac{1}{\mathrm{(6.21}-\mathrm{1.04)}\times 1\,000\,000}+\frac{1}{1.04\times 1\,000\,000}+\frac{1}{290}+\frac{1}{621-290}})=\mathrm{5.10.}$$
(50)

For men, the number is 0.53 million (6.56 − 5.02 − 1.01 = 0.53); the estimation can be done in the same way. Thus, the relative risk would be estimated to be 4.36 (95% CI: 3.72–5.10) for women and 4.20 (95% CI: 3.78–4.67) for men.

Although the calculations were not adjusted and the definition of no occupation is tentative, these results suggest that being neither employed nor educated may substantially increase the risk of suicide among the young Japanese population. It might be also suggested that the Japanese governments should consider the indicator of unemployment.

Note that relative risks were estimated without a fresh cohort study, which is generally difficult to conduct.

Discussion

Evaluating a change in risk of an event occurring caused by exposure to (or the presence of/occupation as) a factor is generally attempted in many research fields, such as epidemiology, medicine, social science, politics, and product development. Relative risk, which is the ratio of the risks, can be easily interpreted and widely used, but has been believed to require large-scale epidemiological research or a smaller cohort study designed for the estimation. A case control study, which compares the case and control group, is more convenient than the cohort study, but relative risk cannot be estimated using case control data. The estimator of the odds ratio, which can be calculated using case control data, is often used instead of relative risk, because the former can sometimes approximate the latter. A method to calculate relative risk using the odds ratio was also proposed. Unfortunately, the odds ratio may be misleading to interpret the change in risk and calculating relative risk using the ratio still requires either estimator of risks. Furthermore, control group data are still required, burdening researchers in terms of cost and effort.

In this study, introducing the observed proportion, an observational inconsistency preventing relative risk from being estimated in case control studies was clarified as a mathematical expression; by excluding this inconsistency, a new equation that estimates relative risk using case data was proposed. The proposed equation, which serves as an estimator of relative risk itself without approximation, requires only the exposure odds of a case group and that of all subjects to be studied; no control group is then needed. The calculation is done without using risk estimators, and thus, cohorts are also not needed. Therefore, evaluating a change in risk can be easily conducted without additional costs, efforts, and time generally needed in a fresh study. Moreover, the proposed equation was derived without using the Bayesian probabilities nor the Bayes’ theorem and is free from researcher’s resistance toward the Bayesian methods.

A method of estimating confidence limits of the proposed estimator was also presented and proved to estimate that successfully. Although there may be a more appropriate estimation method of confidence interval, pursuing the best method is beyond the scope of this paper.

Once the exposed proportions by various characteristics are investigated, changes in every risk associated with the exposure will able to be estimated by applying the proposed equation to appropriate case group data. Even the estimation of a change in risk, which has been believed to be impossible, can be done, such as the adverse effect of a social situation on the suicide rate, the effect of a policy on birthrate, or the impact of a new drug for a pandemic on survival rate. There are two caveats: the case group must comprise subjects from whom the exposed proportion was computed and the exposure to the factor must precede the occurring event. Existing statistical methods, such as adjusting confounding factors, should be also applicable for the proposed estimator.

Although the proposed equation is quite simple, its advantages will not only reduce the costs of epidemiological studies but may also make itself a powerful tool in almost all research fields that treat risks.