Introduction

The importance of vitamin A compounds for homeostasis and normal physiology is well-established1, and its key biological functions include critical roles in embryonic development and growth, cell differentiation, tissue remodeling, reproduction, integrity of the immune system, vision, maintenance of skin and membranes, and hematopoiesis2,3,4,5,6. Vitamin A deficiency can lead to xerophthalmia, blindness, infections, and even death, especially in low-income settings7. Retinol and vitamin A compounds including retinoic acid are also integral to lipid metabolism, insulin signaling, and energy balance4,5,6, and they exert pleiotropic effects by regulating or co-regulating the expression of over 500 genetic response elements after binding with its nuclear receptors retinoic acid receptor (RAR), retinoid X receptor (RXR), and peroxisome proliferator-activated receptor β/δ (PPAR β/δ), as well as heterodimerizing with the vitamin D and thyroid hormone receptors8. These biological actions are thought to be responsible for their experimental anti-carcinogenic effects9,10,11,12, although increased cell proliferation and decreased cell differentiation have also been observed13.

The role of vitamin A in the prevention of common chronic diseases is less clear, however14, 15. Early pre-clinical and population-based observational studies10 suggested protective effects of retinol on cancer11, 16 and cardiovascular disease17,18,19. By contrast, subsequent evidence from randomized trials and meta-analyses20 have not supported many of the observational findings21, instead showing increased risk for some outcomes such as cancer of the prostate22, 23 and lung24, cardiovascular disease25, and even overall mortality24 for individuals with high circulating retinol concentration or following supplementation with vitamin A or β-carotene14, 23, 24, 26. For example, a recent pooled analysis of 15 cohort studies that included more than 11,000 prostate cancer cases found 13% higher prostate cancer risk in the highest versus lowest category of serum retinol23. Such data have led the U.S. Preventive Services Task Force to question the public health benefits of supplementation with vitamin A in the absence of deficiency14, 27.

Despite evidence from molecular and laboratory studies that metabolic derivatives of retinol could promote carcinogenesis28, 29, the relevant biologic pathways are not understood. Elucidating the biological mechanisms underlying these vitamin A associations would have implications for any future prevention trials, selection of their target populations, and general population vitamin A supplementation30.

Here, we hypothesized that serum retinol in an un-supplemented state may be associated with other small, low-molecular metabolites in circulation, and conducted an agnostic metabolomic analysis to identify biologically relevant metabolites related to vitamin A status.

Results

Characteristics at study entry for the 1,282 participants included in this analysis are shown in Table 1. The median serum concentration of retinol was 579 μg/L. The median number of cigarettes smoked per day was 20 (interquartile range, IQR, 14–25), and 21% of participants exercised 3 times per week or more. Serum retinol <0.7 µmol/L (or 200 µg/L) is an accepted definition of vitamin A deficiency in most age groups31, 32. In the present subset of 1,208 men, only one had a borderline-deficient serum retinol value (i.e., 192 µg/L).

Table 1 Pre-randomization characteristics of 1,282 Finnish male smokers in the ATBC Study

In multivariable linear regression models, 263 metabolites were associated with retinol concentrations at p < 0.05 (Supplemental Table 1), of which 63 (Table 2) were significant after Bonferroni correction (p < 5.3 × 10−5). All of them were positively associated with a standard deviation (SD) increase in retinol, except for the peptide metabolite adsgegdfxaegggvr (fibrinopeptide A) that showed an inverse association (β = −0.16). The strongest association with serum retinol was seen for N-acetyltryptophan (β = 0.27 per SD increase in retinol concentration; p = 9.9 × 10−17), followed by the lipids myo-inositol and 1-palmitoylglycerophosphoethanolamine, the amino acid 4-acetamidobutanoate, and the purine N6-carbamoylthreonyladenosine (β = 0.23, p = 9.8 × 10−13; β = 0.22, p = 3.2 × 10−12; β = 0.22, p = 8.9 × 10−12; β = 0.21, p = 2.9 × 10−11

Table 2 Metabolites associated with serum retinol concentration at 5.3 × 10−5 level of statistical significance.

, respectively). The findings were not altered by adjustment for serum creatinine. Also, results were the same after model adjustment for storage time (from blood collection to metabolomics measurement) (data not shown).

Of the metabolites that remained statistically significant after Bonferroni correction, the majority were lipids (n = 30 of 264 lipids) and amino acids (n = 20 of 157 amino acids). The remaining were nucleotides (n = 4 of 31), xenobiotics (n = 4 of 124), carbohydrates (n = 2 of 26), energy metabolites (n = 1 of 9), and peptides (n = 1 of 91). The metabolite chemical classes of amino acids, lipids, and cofactors/vitamins had strong positive associations with serum retinol (p = 1.6 × 10−10, 3.3 × 10−7 and 3.3 × 10−7, respectively; Table 3, Supplemental Table 2). In addition, 23 out of 59 metabolite chemical sub-classes exceeded the Bonferroni threshold (P < 8.5 × 10−4), with the top signals being for inositol, creatine, valine/leucine/isoleucine, and purine/urate pathways (p = 2.0 × 10−14, 1.4 × 10−12, 2.8 × 10−11, and 2.4 × 10−10, respectively; Table 4, Supplemental Table 3).

Table 3 Gene set analysis for chemical class of metabolites for serum retinol
Table 4 Gene set analysis for chemical sub-class of metabolites for serum retinol at 8.5 × 10−4 level of statistical significance.

Discussion

This large-scale agnostic investigation of nearly 1,300 men identified 63 metabolites in several chemical classes associated with serum retinol concentrations after stringent correction for 947 retinol-metabolite comparisons. Amino acids and lipids were most highly represented among the top metabolites, with N-acetyltrytophan and myo-inositol (followed by the lipid 1-palmitoylglycerophosphoethanolamine) ranked first within these two chemical classes, respectively. In addition, N6-carbamoylthreonyladenosine, erythronate, erythritol, adsgegdfxaegggvr, and succinylcarnitine were represented as the top metabolites within the nucleotide, carbohydrate, xenobiotic, peptide, and energy metabolite classes.

N-acetyltryptophan prevents oxidative degradation of proteins33, and the tryptophan/kynurenine pathway influences the regulation of inflammation, oxidative stress and immune activation34, 35. The relevance of several other N-acetyl amino acids among the top metabolites (e.g., N-acetyl-3-methylhistidine, -valine, −1-methylhistidine, -lysine, and -threonine) is unclear, but may indicate perturbations in acetylation activity and influence on cell growth through histone-chromatin function and gene regulation36,37,38.

Retinol plays an important role in lipid metabolism through activation of RAR and PPARβ/δ signaling and gene expression enhancing lipid accumulation through control of adipocyte differentiation, fatty acid oxidation, and lipolysis39,40,41,42,43. Myo-inositol ranked first among lipid metabolites associated with retinol, which along with its phosphorylated derivatives including inositol 1-phosphate (I1P), may be related to mediation of retinoic acid signaling and cellular functions including adhesion, growth, vesicular trafficking, and cell survival44, 45, and is involved in regulation of enzyme activity and hormone secretion46. Through its role in phosphatidylinositol-4-phosphate (IP4) and phosphatidylinositol-4,5-bisphosphate (IP4, 5) biosynthesis, myo-inositol and I1P are plasma membrane mediators of protein phosphorylation and second messenger signaling47.

Several lysolipids and androgen metabolites were associated with serum retinol. The former glycerophospho-fatty acids are also important plasma membrane cell-signaling molecules48, 49 that influence cell differentiation, growth, proliferation, and invasion50,51,52,53,54,55. The four serum androstenediol (adiol) sulfate metabolites positively related to serum retinol could serve as androgenic mediators and confer the vitamin A-prostate cancer association56.

The present analysis also shows retinol is correlated with histidine pathway metabolites, including N-acetyl-3-methylhistidine, N-acetyl-1-methylhistidine, 1-methylhistidine and 3-methylhistidine. Experimental data indicate that biosynthesis of histidine metabolites may be mediated through the 5-phosphoribosyl-1-pyrophosphate signaling pathway57. 1- and 3-Methylhistidine are abundant in actin, formed by post-translational methylation58, 59, and are considered markers of myofibrillar protein degradation60. Excessive vitamin A intake may stimulate 3-methylhistidine excretion both in vivo and in vitro, suggesting accelerated myofibrillar protein breakdown and protein turnover61. Aside from myofibrillar protein degradation, aberrations in histidine metabolites may also be indicative of tissue repair, tissue damage, alterations in the inflammatory signals and oxidative stress62,63,64,65.

This cross-sectional analysis was conducted in order to both contribute to elucidation of the molecular basis of retinol’s diverse biological activities and provide mechanistic clues related to how higher vitamin A status might adversely impact prostate (and other) cancer risk. In this regard, the present findings regarding inositol and lysolipid metabolites are of interest in that they (e.g., I1P) were associated with aggressive prostate cancer in our recent serum metabolomic analysis66. In addition, findings from our group based on size and extent of primary prostate cancer cases demonstrated a positive association with histidine metabolites including N-acetyl-3-methylhistidine, and 1- and 3-methylhistidine in T2 prostate cancers67. Some of the sex steroids related to serum retinol here were associated with non-aggressive prostate cancer the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial68.

Fibrinopeptide A is the only metabolite we found to be inversely associated with serum retinol in this study. Fibrinopeptide A is produced from fibrinogen by thrombin during blood coagulation, and it is elevated in several malignancies, with persistent coagulation activation, thrombosis and disseminated intravascular coagulation being common complications69,70,71,72. Given that some cancers have been related to vitamin A status, the inverse fibrinopeptide A-retinol association observed here should be examined in other studies.

To our knowledge, this is the first metabolomic profiling analysis of vitamin A status. Strengths of the analysis include its relatively large sample size, assaying of serum collected after an overnight fast, and high laboratory quality control. Some limitations should also be acknowledged, including that our study was cross-sectional, which limits our ability to make causal inferences about the association between metabolites and retinol levels. In addition, the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) cohort constitutes a relatively homogeneous population of Finnish male smokers of European ancestry, which may limit the generalizability of our findings to other populations. Residual confounding by unidentified metabolic or other factors such as chronic health conditions is possible, although we extensively adjusted our models for several potential confounders including diabetes, body mass index (BMI), serum cholesterol, smoking intensity, and alcohol consumption, and the findings were unchanged after adjusting for serum creatinine, an important indicator of kidney function.

Our study identified a large number of metabolites in various metabolic pathways associated with serum retinol concentration, some of which directly relate to prostate cancer risk associations and other known biological functions. Lipid and amino acid biochemical classes are heavily represented, which may provide some molecular insights into the role of retinol in human health and disease. In particular, lipid bioactivity related to inositol, lysolipids and sterol/steroid compounds, may potentially provide biological clues relevant to the positive association between circulating retinol and the development of prostate cancer. Further studies in diverse populations are warranted to re-examine these findings.

Materials and methods

Study population

The ATBC Study has been described in detail elsewhere73. In brief, it is a 2 × 2 factorial randomized controlled trial in which 29,133 male Finish smokers aged 50–69 were assigned to receive either alpha-tocopherol (dl-α-tocopheryl-acetate, 50 mg/day), beta-carotene (20 mg/day), both vitamins, or placebo. Participants received the study capsules for a median of 6.1 years (range 5–8 years) until the trial ended on 30 April 1993. Information on general risk factors, smoking, and medical history was obtained through a questionnaire at study entry along with a validated food-frequency questionnaire. Serum was collected at baseline (i.e., pre-supplementation and years prior to cancer diagnoses) after overnight fast was protected from light and stored at −70 °C until assayed. Incident cancer diagnosed during 20 years of follow-up were identified from the Finnish Cancer Registry. All participants provided written informed consent. The trial was approved by institutional review boards at the U.S. National Cancer Institute and the Finnish National Public Health Institute. All methods were performed in accordance with the relevant guidelines and regulations.

The present analysis is based on participants included in seven on-going or completed analyses (case-control and other, subsequently referred to as “metabolomic sets”) nested within the ATBC Study66, 74, 75. These included cases of prostate, lung, esophagus, and stomach cancer, along with their controls, as well as controls from a pancreatic cancer study. After excluding duplicate participants, 1,282 were included in the present analysis.

Laboratory analysis

Baseline serum retinol concentration was measured using an isocratic high-performance liquid chromatography platform76. Assays were carried out from 1986–88 in a dedicated laboratory at the National Public Health Institute in Helsinki, Finland, which was certified by a National Institute of Standards and Technology quality-control testing program. Serum samples were subsequently analyzed at Metabolon, Inc. (Durham, N.C.) using ultrahigh performance liquid chromatography (LC)/mass spectroscopy (MS) and gas chromatography (GC)/mass spectrometry (MS) as described before66, 74, 75. The methods of sample accessioning, sample preparation, quality control, data extraction and compound identification have also been described in detail77. Briefly, each of the samples were analyzed using three different analytical conditions: ultrahigh performance LC-MS/MS ( + electrospray ionization (ESI)), ultrahigh performance LC-MS/MS (-ESI), and GC-MS. To remove proteins, dissociate small molecules bound to protein or trapped in the precipitated protein matrix, and to recover chemically diverse metabolites, each of the samples were extracted using an aqueous methanol extraction procedure. The methanol contained four recovery standards (DL-2-fluorophenylglycine, tridecanoic acid, d6-cholesterol and 4-chlorophenylalanine) to allow confirmation of extraction efficiency. For each sample, four aliquots were obtained from the extract and dried. Two aliquots of each sample were then reconstituted in 50 µl of 6.5 mM ammonium bicarbonate in water (pH 8) for the negative ion analysis, and the other two aliquots of each were reconstituted using 50 µl 0.1% formic acid in water (pH ~3.5) for the positive ion analysis. The resulting extracts were divided into fractions for analysis by ultrahigh performance LC/MS/MS (positive mode), ultrahigh performance LC/MS/MS (negative mode), and GC/MS. Samples were placed briefly on a TurboVap® (Zymark) to remove the organic solvent. Each sample was then frozen and dried under vacuum. The samples were then prepared for the appropriate instrument, either ultrahigh performance LC/MS/MS or GC/MS. The raw data was further extracted, peak-identified and quality control (QC) processed using Metabolon’s hardware and software. Internal controls included extraction process (5 recovery standards), injection (up to 11 standards), and alignment standards for quality assurance/quality control procedures to control for experimental variability. Compounds were identified by comparison to library entries of purified standards or recurrent unknown entities. Metabolon maintains a library based on authenticated standards that contains the retention time/index (RI), mass to charge ratio (m/z), and chromatographic data (including MS/MS spectral data) on all molecules present in the library. Furthermore, biochemical identifications are based on three criteria: retention index within a narrow RI window of the proposed identification, accurate mass match to the library + /− 0.005 amu, and the MS/MS forward and reverse scores between the experimental data and authentic standards. The MS/MS scores are based on a comparison of the ions present in the experimental spectrum to the ions present in the library spectrum. More than 2400 commercially available purified standard compounds have been acquired and used in both the LC and GC platforms for determination of their analytical characteristics78.

Batch variability was standardized by dividing the signal strength by the batch median value for each metabolite and subject. Metabolite signal strength were log-transformed and centered according to normalization. Within each metabolomic set, metabolite values below the limit of detection were imputed to have the minimum of all non-missing values.

After excluding metabolites that had fewer than 10 non-missing values across all metabolomic sets, or single-value metabolites in individual metabolomic set, 947 identified compounds remained eligible for analysis. Based upon a standard chemical classification scheme, 920 of the metabolites were categorized in eight mutually exclusive chemical classes: amino acids and amino acid derivatives (referred to as “amino acids”), carbohydrates, cofactors and vitamins, energy metabolites, lipids, nucleotides, peptides, and xenobiotics. Blinded quality control samples (9%) from a pooled sample were included in each batch to assess the technical reliability of data, and coefficient of variation (CV) was calculated. The median and interquartile range of the CV% across the metabolites was 9% (4%-20%). These CVs are similar to those previously observed for blood samples analyzed by the same laboratory77, 78, and previous studies have reported a high reliability for the metabolomics platform used in the present study79.

Statistical analysis

We investigated the association between serum levels of retinol and each metabolite, using linear regression adjusted for age at randomization (continuous), BMI (continuous), case status (binary, case or control status in each metabolomic set), metabolomic sets (categorical), serum cholesterol (continuous), number of cigarettes per days (continuous), and baseline alcohol consumption (continuous). We further performed additional analysis to adjust for creatinine in the model. For each model, we computed the standardized beta-coefficient which represents the change in the level of each metabolite per 1-SD increase in the levels of retinol.

For all analyses, we used log-transformed metabolite concentrations. To control for multiple comparisons (n = 947 retinol-metabolite associations), we applied the Bonferroni correction80 using p = 0.05/947 = 5.3 × 10−5 as the adjusted significance threshold. We further performed sensitivity analyses additionally adjusted for storage time (from blood collection to metabolomics measurement).

Within each metabolomic set, we applied Gene-Set Analysis (GSA), which is a pathway analysis method, to examine whether pre-defined metabolic pathways including super- and sub-pathways were related to retinol within individual metabolomic sets81. For Z values (i.e. Z1 to Zs) tested from S metabolites in a pre-defined pathway, GSA examines the “maxmean” statistic max that is the average of all Z values (both positive and negative) and calculates the p values by 10,000 permutations. With the seven individual metabolomic sets, we used Fisher’s method, namely sum of logs method, to combine p value for each pre-defined pathway.

Analyses were performed in SAS 9.4, and R 3.2.3. All statistical tests and reported p values are two-sided.