Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Imputation of missing values in a large job exposure matrix using hierarchical information


Job exposure matrices (JEMs) represent a useful and efficient approach for estimating occupational exposures. This study uses a large dataset of full-shift measurements and employs imputation strategies to develop noise exposure estimates for almost all broad level standard occupational classification (SOC) groups in the US. The JEM was constructed using 753,702 measurements from the government, private industry, and the published literature. Parametric Bayes imputation was used to take advantage of the hierarchical structure of the SOCs and the mean occupational noise exposures were estimated for all broad level SOCs, except those in major group 23-0000, for which no data were available. The estimated posterior mean for all broad SOCs was found to be 82.1 dBA with within- and between-major SOC variabilities of 22.1 and 13.8, respectively. Of the 443 broad SOCs, 85 were found to have an estimated mean exposure >85 dBA while 10 were >90 dBA. By taking advantage of the size and structure of the dataset, we were able to employ imputation techniques to estimate mean levels of noise exposure for nearly all SOCs in the US. Possible sources of errors in the estimates include misclassification of job titles due to limited data, temporal variations that were not accounted for, and variation in exposures within the same SOC. Our efforts have resulted in an almost completely populated noise JEM that provides a valuable tool for the assessment of occupational exposures to noise. Imputation techniques can lead to maximal use of available information that may be incomplete.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others


  1. Tak S, Calvert GM. Hearing difficulty attributable to employment by industry and occupation: an analysis of the National Health Interview Survey--United States, 1997 to 2003. J Occup Environ Med. 2008;50:46–56.

    Article  Google Scholar 

  2. Saunders G, Griest S. Hearing loss in veterans and the need for hearing loss prevention programs. Noise Health. 2009;11:14–21.

    Article  Google Scholar 

  3. Themann C, Suter AH, Stephenson MR. National research agenda for the prevention of occupational hearing loss-part 1. Semin Hear. 2013;34:145–207.

    Article  Google Scholar 

  4. Neitzel RL, Swinburn TK, Hammer MS, Eisenberg D. Economic impact of hearing loss and reduction of noise-induced hearing loss in the United States. J Speech Lang Hear Res. 2017;25:1–8.

    Google Scholar 

  5. Tak S, Davis RR, Calvert GM. Exposure to hazardous workplace noise and use of hearing protection devices among US workers--NHANES, 1999-2004. Am J Ind Med. 2009;52:358–71.

    Article  Google Scholar 

  6. NIOSH. Criteria for a recommended standard. Occupational noise exposure. Revised criteria 1998. Cincinnati, Ohio: National Institutes of Occupational Safety and Health; 1998. Publication No. 98–1268.

  7. Seixas NS, Checkoway H. Exposure assessment in industry specific retrospective occupational epidemiology studies. Occup Environ Med. 1995;52:625–33.

    Article  CAS  Google Scholar 

  8. Dewar R, Siemiatycki J, Gerin M. Loss of statistical power associated with the use of a job-exposure matrix in occupational case-control studies. Appl Occup Environ Hyg. 1991;6:508–15.

    Article  CAS  Google Scholar 

  9. Astrakianakis G, Anderson JTL, Anderson JTL, Keefe AR, Bert JL, Le N, et al. Job—exposure matrices and retrospective exposure assessment in the pulp and paper industry. Appl Occup Environ Hyg. 1998;13:663–70.

    Article  Google Scholar 

  10. Friesen MC, Demers PA, Spinelli JJ, Le ND. Validation of a semi-quantitative job exposure matric at an aluminum Smelter. Ann Occup Hyg. 2003;47:477–84.

    CAS  PubMed  Google Scholar 

  11. Guéguen a, Goldberg M, Bonenfant S, Martin JC. Using a representative sample of workers for constructing the SUMEX French general population based job-exposure matrix. Occup Environ Med. 2004;61:586–93.

    Article  Google Scholar 

  12. Semple SE, Dick F, Cherrie JW. Exposure assessment for a population-based case-control study combining a job-exposure matrix with interview data. Scand J Work Environ Health. 2004;30:241–8.

    Article  Google Scholar 

  13. Sjöström M, Lewné M, Alderling M, Willix P, Berg P, Gustavsson P, et al. A job-exposure matrix for occupational noise: development and validation. Ann Occup Hyg. 2013;57:774–83.

    PubMed  Google Scholar 

  14. Middendorf PJ. Surveillance of occupational noise exposures using OSHA’s Integrated Management Information System. Am J Ind Med. 2004;46:492–504.

    Article  Google Scholar 

  15. Rappaport SM, Kromhout H, Symanski E. Variation of exposure between workers in homogeneous exposure groups. Am Ind Hyg Assoc J. 1993;54:654–62.

    Article  CAS  Google Scholar 

  16. Stewart PA, Herrick RF. Issues in performing retrospective exposure assessment. Appl Occup Environ Hyg. 1991;6:421–7.

    Article  CAS  Google Scholar 

  17. OSHA. OSHA technical manual noise. 2013.

  18. Cheng W, Roberts B, Mukherjee B, Neitzel RL. Meta-analysis of job exposure matrix data from multiple sources. J Expo Sci Environ Epidemiol. 2017; in press.

    Article  Google Scholar 

  19. MSHA. Subchapter M- Uniform Mine Health Regulations Part 62 - Occupational Noise. Mine Safety and Health Administration: United States; 2014.

  20. Roberts B, Sun K, Neitzel RL. What can 35 years and over 700,000 measurements tell us about noise exposure in the mining industry?. Int J Audiol. 2016;56:4–12.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Office of Management Budget. North American Industry Classification System; Revision for 2012; Notice. Washington, D.C.: US Congress; 2011.

    Google Scholar 

  22. Bureau of Labor Statistics. 2010 SOC user guide. Alexandria, VA; 2010.

  23. Chen D-G, Chen J, Lu X, Yi G, Yu H, editors. Advanced statistical methods in data science. New York, NY: Wiley; 2016.

    Google Scholar 

  24. Little R, Rubin D. Statistical analysis with missing data. 2nd ed. New York, NY: Wiley; 2002.

    Book  Google Scholar 

  25. Rubin D. Multiple imputation in sample surveys- a phenomenological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Methods Section. Washington, DC: American Statistical Association; 1978. p. 20–34.

  26. Rubin D. Multiple imputation for nonresponse in surveys. New York, NY: John Wiley & Sons; 1987.

    Book  Google Scholar 

  27. Schenker N, Taylor JMG. Partially parametric techniques for multiple imputation. Comput Stat Data Anal. 1996;22:425–46.

    Article  Google Scholar 

  28. Hoff PDA. First course in Bayesian statistical methods. New York, NY: Springer-Verlag; 2009.

    Book  Google Scholar 

  29. Gilks W, Richardsons S, Spiegelhalter D. Markov Chain Monte Carlo in practice. London, United Kingdom: Chapman and Hall; 1996.

    Google Scholar 

  30. ISO. Acoustics - Determination of occupational noise exposure - Engineering method. International Organization for Standardization: Geneva, Switzerland; 2008.

  31. Lambert J, Lelong J, Phillips-Bertin C. Final Report ENNAH – European Network on Noise and Health. 2013. 178.

  32. Neitzel R, Galusha D, Dixon-Ernsts C, Rabinowitz P. Methods for evaluating temporal trends in noise exposure. Int J Audiol. 2014;53:S76–83.

    Article  Google Scholar 

  33. Neitzel RL, Stover B, Seixas NS. Longitudinal assessment of noise exposure in a cohort of construction workers. Ann Occup Hyg. 2011;55:906–16.

    PubMed  PubMed Central  Google Scholar 

  34. Anderson JTL, Astrakianakis G, Band PR. Standardizing job titles for exposure assessment in the pulp and paper industry. Appl Occup Environ Hyg. 1997;12:611–4.

    Article  Google Scholar 

  35. Berger EH, Royster LH, Royster JD, Driscoll DP, Layne M, editors. The noise manual. 5th ed. Fairfax, VA: American Industrial Hygiene Association; 2003.

  36. O*NET OnLine. National Center for O*NET Development. Work context: sounds, noise levels are distracting or uncomfortable. 2017. Accessed 9 Jan 2017.

  37. Choi Y-H, Hu H, Tak S, Mukherjee B, Park SK. Occupational noise exposure assessment using O*NET and its application to a study of hearing loss in the US general population. Occup Environ Med. 2012;69:176–83.

    Article  Google Scholar 

  38. National Center for Health Statistics. National Health and Nutrition Examination Survey 1999 – 2014 Survey Content Brochure. 2014.

  39. ISO. Acoustics - estimation of noise-induced hearing loss. International Organization for Standardization: Geneva, Switzerland, 2013.

  40. Roberts B, Kardous C, Neitzel RL. Improving the accuracy of smart devices to measure noise exposure. J Occup Environ Hyg. 2015;13:840–46.

    Google Scholar 

  41. Kardous C, Shaw PB. Evaluation of smartphone sound measurement applications. J Acoust Soc Am. 2014;135:EL186–92.

    Article  Google Scholar 

Download references


This work was supported by the National Institute of Occupational Safety and Health. Grant # R21OH0 10482: Development of a US/Canadian Job Exposure Matrix (JEM) for Noise.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Richard L Neitzel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.


Appendix 1

The imputation procedure

The unknown quantities in our system include the broad SOC means \(\theta _{ij}^{{\rm obs}},\,i = 1, \ldots ,I;j = 1, \ldots ,n_i^{{\rm obs}},\,\)\(\theta _{ik}^{{\rm mis}},\,i = 1, \ldots ,I;k = 1, \ldots ,n_i^{{\rm mis}}\), the major SOC means \(\theta _i,\,i = 1, \ldots ,I\), the population mean \(\mu \), the within major SOC sampling variance \(\sigma ^2\), and the between major SOC sampling variance \(\tau ^2\). Joint posterior inference for these parameters can be made by constructing a Gibbs sampler which approximates the posterior distribution \(p\left({\theta_{11}^{\rm{obs}},}\,{\ldots,}\right.\)\( \left. \theta_{In_I}^{\rm{obs}},\theta _{11}^{{\rm mis}}, \ldots ,\theta _{In_I}^{{\rm mis}},\theta_1, \ldots ,\theta_I,\mu ,\tau^2,\sigma^2|{\rm observed}\,{\rm data}\right)\):

$$\begin{array}{l}p\left( {\theta _{11}^{{\rm obs}}, \ldots ,\theta _{In_I}^{{\rm obs}},\,\theta _{11}^{{\rm mis}}, \ldots ,\theta _{In_I}^{{\rm mis}},\theta _1, \ldots ,}\right. \\ \left.{ \theta _I,\mu ,\tau ^2, \sigma ^2{\mathrm{|}}{\rm observed}\,{\rm data}} \right) \\ \propto p\left( {{\rm observed}\,{\rm data}{\mathrm{|}}\theta _{11}^{{\rm obs}}, \ldots ,\theta _{In_I}^{{\rm obs}},\,\theta _{11}^{{\rm mis}}, \ldots ,}\right. \\ \left.{\theta _{In_I}^{{\rm mis}},\theta _1, \ldots ,\theta _I,\mu ,\tau ^2,\sigma ^2} \right)\\ \cdot \left\{{\mathop {\prod}\limits_{i = 1}^I \mathop {\prod}\limits_{j = 1}^{n_i^{{\rm obs}}} p(\theta _{ij}^{{\rm obs}}|\theta _i,\,\sigma ^2)} \right\} \cdot \left\{{\mathop {\prod}\limits_{i = 1}^I \mathop {\prod}\limits_{k = 1}^{n_i^{{\rm mis}}} p(\theta _{ik}^{{\rm obs}}|\theta _i,\,\sigma ^2)} \right\} \\ \cdot \left\{{\mathop {\prod}\limits_{i = 1}^I p(\theta _i|\mu ,\tau ^2)} \right\} \cdot \pi (\mu ) \cdot \pi (\tau ^2) \cdot \pi (\sigma ^2)\end{array}.$$

Collecting the terms that depend on \(\theta _{ij}^{{\rm obs}}\) shows that the full conditional distribution of \(\theta _{ij}^{{\rm obs}}\) must be proportional to

$$\left( {\theta _{ij}^{{\rm obs}}{\mathrm{|}}{\rm observed}\,{\rm data},{\rm all}\,{\rm other}\,{\rm para}} \right) \propto \\ {\rm e}{\mathrm{xp}}\left( { - \frac{{\left( {Y_{ij}^{{\rm obs}} - \theta _{ij}^{{\rm obs}}} \right)^2}}{{2\frac{{\left( {s_{ij}^{{\rm obs}}} \right)^2}}{{n_{ij}^{{\rm obs}}}}}}} \right) \cdot {\mathrm{exp}} \left( { - \frac{{\left( {\theta _{ij}^{{\rm obs}} - \theta _i} \right)^2}}{{2\sigma ^2}}} \right).$$

After some calculations, we find that conditional on \(\sigma ^2\) and \(\theta _i\), \(\theta _{ij}^{{\rm obs}}\) must be conditionally independent of other \(\theta _{ij}^{{\rm obs}}\) as well as independent of the data from broad SOCs other than ij:

$$\theta _{ij}^{{\rm obs}}\sim N\left( {\mu _{ij}^{{\rm obs}},(\sigma _{ij}^{{\rm obs}})^2} \right),$$

where \(\mu _{ij}^{{\rm obs}} = \frac{{Y_{ij}^{{\rm obs}}\sigma ^2 + \theta _i\frac{{\left( {s_{ij}^{{\rm obs}}} \right)^2}}{{n_{ij}^{{\rm obs}}}}}}{{\sigma ^2 + \frac{{\left( {s_{ij}^{{\rm obs}}} \right)^2}}{{n_{ij}^{{\rm obs}}}}}}\) and \(\left( {\sigma _{ij}^{{\rm obs}}} \right)^2 = \frac{{\frac{{\left( {s_{ij}^{{\rm obs}}} \right)^2}}{{n_{ij}^{{\rm obs}}}}\sigma ^2}}{{\left( {\sigma ^2 + \frac{{\left( {s_{ij}^{{\rm obs}}} \right)^2}}{{n_{ij}^{{\rm obs}}}}} \right)}}.\)

The conditional distribution of \(\theta _{ik}^{{\rm mis}}\) will be normal distribution

$$\theta _{ik}^{{\rm mis}}\sim N(\theta _i,\sigma ^2).$$

The conditional distribution of \(\theta _i\) is also normal distribution

$$\theta _i\sim N(\mu _i,\tau _i^2),$$

where \(\mu _i = \frac{{\mu \sigma ^2 + \mathop {\sum}\nolimits_{j = 1}^{n_i^{{\rm obs}}} \theta _{ij}^{{\rm obs}}\tau ^2 + \mathop {\sum}\nolimits_{k = 1}^{n_i^{{\rm mis}}} \theta _{ik}^{{\rm mis}}\tau ^2}}{{n_i^{{\rm obs}}\tau ^2 + n_i^{{\rm mis}}\tau ^2 + \sigma ^2}}\) and \(\tau _i^2 = \frac{{\sigma ^2\tau ^2}}{{n_i^{{\rm obs}}\tau ^2 + n_i^{{\rm mis}}\tau ^2 + \sigma ^2}}.\)

The conditional distribution of \(\mu \) is normal distribution

$$\mu \sim N\left( {\frac{{\mathop {\sum}\nolimits_{i = 1}^I \theta _i\gamma _0^2 + \mu _0\tau ^2}}{{I\gamma _0^2 + \tau ^2}},\frac{{\tau ^2\gamma _0^2}}{{I\gamma _0^2 + \tau ^2}}} \right).$$

The conditional distribution of \(\tau ^2\) will be inverse gamma distribution

$$\tau ^2\sim {\rm Inv} - {\rm Gamma}\left( {\frac{{I + \eta _0}}{2},\frac{{\mathop {\sum}\nolimits_{i = 1}^I (\theta _i - \mu )^2 + \eta _0\tau _0^2}}{2}} \right).$$

The conditional distribution of \(\sigma ^2\) will be inverse gamma distribution

$$ \sigma ^2{\mathrm{\sim }}{\rm Inv} \\ - {\rm Gamma}\left( \begin{array}{l}\frac{{\mathop {\sum }\nolimits_{i = 1}^I n_i^{{\rm obs}} + \mathop {\sum }\nolimits_{i = 1}^I n_i^{{\rm mis}} + \upsilon _0}}{2} ,\\ \frac{{\mathop {\sum }\nolimits_{i = 1}^I \mathop {\sum }\nolimits_{j = 1}^{n_i^{{\rm obs}}} \left( {\theta _{ij}^{{\rm obs}} - \theta _i} \right)^2{\mathrm{ + }}\mathop {\sum }\nolimits_{i = 1}^I \mathop {\sum }\nolimits_{k = 1}^{n_i^{{\rm mis}}} \left( {\theta _{ik}^{{\rm mis}} - \theta _i} \right)^2 + \upsilon _0\sigma _0^2}}{2}\end{array} \right).$$

Posterior approximation proceeds by iterative sampling of each unknown quantity from its full conditional distribution. First we choose the number of iterations S to be 10,000 and decide starting values for each of these parameters. Given a current state of the unknowns \(\left\{{\theta _{11}^{{\rm obs}(s)}, \ldots ,\theta _{In_I}^{{\rm obs}\left( s \right)},\theta _{11}^{{\rm mis}\left( s \right)}, \ldots ,\theta _{In_I}^{{\rm mis}(s)},\theta _i^{(s)},\mu ^{(s)},\tau ^{2(s)},\sigma ^{2(s)}} \right\}\), a new state is generated as follows:

  1. 1.

    Posterior step: sample \(\theta _i^{(s + 1)},i = 1, \ldots ,I\) from \(\theta _i|\mu ^{\left( s \right)},\theta _{i1}^{{\rm obs}\left( s \right)}, \ldots ,\theta _{in_i}^{{\rm obs}\left( s \right)},\theta _{i1}^{{\rm mis}\left( s \right)}, \ldots ,\theta _{in_i}^{{\rm mis}(s)},\tau ^{2(s)},\sigma ^{2(s)}\) based on its full conditional distribution

  2. 2.

    Posterior step: sample \(\mu ^{(s + 1)}\) from \(\mu |\theta _1^{\left( {s + 1} \right)}, \ldots ,\theta _I^{\left( {s + 1} \right)},\tau ^{2(s)}\)

  3. 3.

    Posterior step: sample \(\tau ^{2(s + 1)}\) from \(\tau ^2|\theta _1^{\left( {s + 1} \right)}, \ldots ,\theta _I^{\left( {s + 1} \right)},\,\mu ^{(s + 1)}\)

  4. 4.

    Posterior step: sample \(\sigma ^{2(s + 1)}\) from \(\sigma ^2|\theta _{11}^{{\rm obs}\left( s \right)}, \ldots ,\)\(\theta _{In_I}^{{\rm obs}\left( s \right)},\theta _{11}^{{\rm mis}\left( s \right)}, \ldots ,\theta _{In_I}^{{\rm mis}\left( s \right)},\theta _1^{\left( {s + 1} \right)}, \ldots ,\theta _I^{(s + 1)}\)

  5. 5.

    Posterior step: sample \(\theta _{ij}^{{\rm obs}(s + 1)},i = 1, \ldots ,I,j = 1, \ldots ,n_i^{{\rm obs}}\) from \(\theta _{ij}^{{\rm obs}}|\theta _i^{(s + 1)},\sigma ^{2\left( {s + 1} \right)}\)

  6. 6.

    Imputation step: sample \(\theta _{ij}^{{\rm mis}(s + 1)},i = 1, \ldots ,I,j = 1, \ldots ,n_i^{{\rm mis}}\) from \(\theta _{ij}^{{\rm mis}}|\theta _i^{(s + 1)},\sigma ^{2\left( {s + 1} \right)}.\)

Appendix 2

Table 6

Table 6 Predicted noise exposure based on model results

Figures 6 and 7

Fig. 6
figure 6

Major SOCs that are decreasing over time

Fig. 7
figure 7

Major SOCs that are not decreasing over time

Appendix 3 – Full Time Trend Results

Table 7

Table 7 Full model results for the temporal analysis for major SOCs

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roberts, B., Cheng, W., Mukherjee, B. et al. Imputation of missing values in a large job exposure matrix using hierarchical information. J Expo Sci Environ Epidemiol 28, 615–648 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:



Quick links