Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN

Journal name:
Nature Protocols
Volume:
9,
Pages:
1416–1427
Year published:
DOI:
doi:10.1038/nprot.2014.090
Published online

Abstract

Data processing for 1D NMR spectra is a key bottleneck for metabolomic and other complex-mixture studies, particularly where quantitative data on individual metabolites are required. We present a protocol for automated metabolite deconvolution and quantification from complex NMR spectra by using the Bayesian automated metabolite analyzer for NMR (BATMAN) R package. BATMAN models resonances on the basis of a user-controllable set of templates, each of which specifies the chemical shifts, J-couplings and relative peak intensities for a single metabolite. Peaks are allowed to shift position slightly between spectra, and peak widths are allowed to vary by user-specified amounts. NMR signals not captured by the templates are modeled non-parametrically by using wavelets. The protocol covers setting up user template libraries, optimizing algorithmic input parameters, improving prior information on peak positions, quality control and evaluation of outputs. The outputs include relative concentration estimates for named metabolites together with associated Bayesian uncertainty estimates, as well as the fit of the remainder of the spectrum using wavelets. Graphical diagnostics allow the user to examine the quality of the fit for multiple spectra simultaneously. This approach offers a workflow to analyze large numbers of spectra and is expected to be useful in a wide range of metabolomics studies.

At a glance

Figures

  1. Deconvolution and peak quantification in metabolomic NMR spectra is complicated by peak overlap and shift.
    Figure 1: Deconvolution and peak quantification in metabolomic NMR spectra is complicated by peak overlap and shift.

    Three typical spectra of bacterial supernatants are shown. All spectra show clear overlap (e.g., lysine/leucine at 3.7 p.p.m.), and peaks shift positions between spectra (e.g., histidine at ∼7–7.1 p.p.m.), sometimes swapping relative frequency ordering with other peaks (e.g., threonine/glycine at ∼3.55 p.p.m.). His, histidine; Tyr, tyrosine; Leu, leucine; Lys, lysine; Ile, isoleucine; Val, valine; Thr, threonine; Gly, glycine; Glc, glucose; Ala, alanine; Lac, lactate; BCAA, branched chain amino acids.

  2. Workflow of a single BATMAN run.
    Figure 2: Workflow of a single BATMAN run.

    BATMAN takes as inputs the NMR spectroscopy data, a library of metabolite templates, a list of target metabolites and the parameters controlling the Bayesian model. Its main outputs are the estimated metabolite relative concentrations, the wavelet fit of the NMR signal not covered by the targets and diagnostic plots, which can be used to examine the fit quality.

  3. Typical fit results for two spectra of bacterial supernatants.
    Figure 3: Typical fit results for two spectra of bacterial supernatants.

    Insets show magnified regions of the fit for examples of smaller peaks. Note the different chemical shift positions of some resonances in the two spectra, e.g., threonine at 3.56 p.p.m. and 1.32 p.p.m.

  4. Chemical-shift sorting and spline fitting.
    Figure 4: Chemical-shift sorting and spline fitting.

    Graphical user interface for the spline-fitting tool (top). Illustration of how positions of shifting peaks can be estimated by fitting splines to user-selected resonances, defined by the maximum intensity (bottom left) or intersection points between the spline and spectral baseline (bottom right) in a range.

  5. Fitting complex multiplets with empirical and raster templates.
    Figure 5: Fitting complex multiplets with empirical and raster templates.

    The multiplicity of the 2.35 p.p.m. resonance of glutamic acid is not well-defined (listed as type 'm' in the HMDB). In the bottom graph, we attempt a fit using a pattern of peaks where p.p.m. offsets and relative intensities are specified by the user. The top graph shows the fit using a piece of the spectrum of the pure standard. Although neither is perfect, this allows the user to capture metabolites, which may otherwise not be possible.

  6. Diagnostic scatter plots for threonine in the bacterial supernatant data.
    Figure 6: Diagnostic scatter plots for threonine in the bacterial supernatant data.

    Plots indicate a poor fit (top) and a better fit (bottom) for the metabolite component (left) and wavelet component (right). Numbers correspond to sample identifiers. Colors indicate different multiplets. Bin intensities are scaled to a maximum of unity and maximum integrated values are given in the plot legend. These plots correspond to the poor- and better-fit scenarios of Figure 7.

  7. Diagnostic mirrored stack plots.
    Figure 7: Diagnostic mirrored stack plots.

    Plots show a poorly fitted threonine doublet around 3.56 p.p.m. (left) and improved fitting of this doublet (right). In each plot the wavelet fit is inverted to accentuate areas of poor fit. Such plots can be used to diagnose poorly fitting multiplets with up to several tens of spectra.

  8. Chemical shift distribution plots.
    Figure 8: Chemical shift distribution plots.

    (a) Estimated chemical shift histogram for acetate singlet at 1.905 p.p.m. showing a bimodal distribution. (b) Mirrored stack plot showing that the estimated positions near −0.008 p.p.m. (marked with *) correspond to spectra where the acetate concentration is below the effective limit of detection and the peak is not present. (c,d) Chemical shift histogram after restricting the shift range (rdelta) to 0.004 p.p.m.; unimodal distribution corresponds to the better fits seen in d. Note that these histograms plot the frequency of estimated chemical shifts across a series of spectra. It is also possible to plot the MCMC samples for the estimated position of a multiplet in a single spectrum (not shown).

References

  1. Sreekumar, A. et al. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457, 910914 (2009).
  2. Holmes, E. et al. Human metabolic phenotype diversity and its association with diet and blood pressure. Nature 453, 396400 (2008).
  3. Weckwerth, W. et al. Differential metabolic networks unravel the effects of silent plant phenotypes. Proc. Natl. Acad. Sci. USA 101, 78097814 (2004).
  4. Lindon, J.C. et al. The Consortium for Metabonomic Toxicology (COMET): aims, activities and achievements. Pharmacogenomics 6, 691699 (2005).
  5. Nicholson, J.K. & Wilson, I.D. High-resolution proton magnetic-resonance spectroscopy of biological-fluids. Prog. Nucl. Magn. Reson. Spectrosc. 21, 449501 (1989).
  6. Fan, T.W.M. Metabolite profiling by one- and two-dimensional NMR analysis of complex mixtures. Prog. Nucl. Magn. Reson. Spectrosc. 28, 161219 (1996).
  7. Zhang, S. et al. Advances in NMR-based biofluid analysis and metabolite profiling. Analyst 135, 14901498 (2010).
  8. Beckonert, O. et al. Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nat. Protoc. 2, 26922703 (2007).
  9. Ebbels, T.M.D. & Cavill, R. Bioinformatic methods in NMR-based metabolic profiling. Prog. Nucl. Magn. Reson. Spectrosc. 55, 361374 (2009).
  10. Ebbels, T.M.D. & De Iorio, M. Statistical data analysis in metabolomics. in Handbook of Statistical Systems Biology (eds. Girolami, M.A., Balding, D. & Stumpf, M.) (Thompson Digital, 2011).
  11. Wishart, D.S. et al. HMDB: the Human Metabolome Database. Nucleic Acids Res. 35 (Database issue), D521D526 (2007).
  12. Ulrich, E.L. et al. BioMagResBank. Nucleic Acids Res. 36 (Database issue), D402D408 (2008).
  13. Holmes, E. et al. Automatic data reduction and pattern recognition methods for analysis of 1H nuclear magnetic resonance spectra of human urine from normal and pathological states. Anal. Biochem. 220, 284 (1994).
  14. Spraul, M. et al. Automatic reduction of NMR spectroscopic data for statistical and pattern recognition classification of samples. J. Pharm. Biomed. Anal. 12, 12151225 (1994).
  15. Cloarec, O. et al. Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. Anal. Chem. 77, 517 (2005).
  16. Csenki, L. et al. Proof of principle of a generalized fuzzy Hough transform approach to peak alignment of one-dimensional 1H NMR data. Anal. Bioanal. Chem. 389, 875885 (2007).
  17. Alm, E. et al. A solution to the 1D NMR alignment problem using an extended generalized fuzzy Hough transform and mode support. Anal. Bioanal. Chem. 395, 213223 (2009).
  18. Astle, W. et al. A bayesian model of NMR spectra for the deconvolution and quantification of metabolites in complex biological mixtures. J. Am. Stat. Assoc. 107, 12591271 (2012).
  19. Hao, J. et al. BATMAN—an R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model. Bioinformatics 28, 20882090 (2012).
  20. Liebeke, M. et al. Combining spectral ordering with peak fitting for one-dimensional NMR quantitative metabolomics. Anal. Chem. 85, 46054612 (2013).
  21. Reily, M.D. et al. DFTMP, an NMR reagent for assessing the near-neutral pH of biological samples. J. Am. Chem. Soc. 128, 1236012361 (2006).
  22. Behrends, V. et al. Metabolite profiling to characterize disease-related bacteria: gluconate excretion by Pseudomonas aeruginosa mutants and clinical isolates from cystic fibrosis patients. J. Biol. Chem. 288, 1509815109 (2013).
  23. Zheng, C. et al. Identification and quantification of metabolites in (1)H NMR spectra by Bayesian model selection. Bioinformatics 27, 16371644 (2011).
  24. Weljie, A.M. et al. Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Anal. Chem. 78, 44304442 (2006).
  25. Tredwell, G.D. et al. Between-person comparison of metabolite fitting for NMR-based quantitative metabolomics. Anal. Chem. 83, 86838687 (2011).
  26. Sokolenko, S. et al. Understanding the variability of compound quantification from targeted profiling metabolomics of 1D-1H-NMR spectra in synthetic mixtures and urine with additional insights on choice of pulse sequences and robotic sampling. Metabolomics 9, 887903 (2013).
  27. Slupsky, C.M. et al. Investigations of the effects of gender, diurnal variation, and age in human urinary metabolomic profiles. Anal. Chem. 79, 69957004 (2007).
  28. Viant, M.R. et al. International NMR-based environmental metabolomics intercomparison exercise. Environ. Sci. Technol. 43, 219225 (2009).

Download references

Author information

Affiliations

  1. Computational and Systems Medicine, Department of Surgery and Cancer, Imperial College London, London, UK.

    • Jie Hao,
    • Manuel Liebeke,
    • Jacob G Bundy &
    • Timothy M D Ebbels
  2. Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Quebec, Canada.

    • William Astle
  3. Department of Statistical Science, University College London, London, UK.

    • Maria De Iorio
  4. Present address: Max Planck Institute for Marine Microbiology, Bremen, Germany.

    • Manuel Liebeke

Contributions

T.M.D.E. and M.D.I. conceived the project and supervised the development of the Bayesian model. W.A. developed the Bayesian model and J.H. implemented the corresponding R package. J.G.B. and M.L. provided theoretical and practical input on NMR spectroscopy, including software testing and ideas for new features. T.M.D.E. and J.H. wrote the manuscript. All authors read and approved the final manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Data (805 KB)

    Comparison of BATMAN and Chenomx NMR Suite results.

Additional data