Introduction

Fresh Navel oranges are an important part of California life. With production of the fruit valued in the tens of millions per year,1 the fruit plays a large role nutritionally and culturally, marked by wide consumption throughout the state. With the recent decrease in citrus production in Florida,2 oranges from California currently demand a premium.

Traditional breeding efforts have numerous goals, including improved fruit appearance, enhanced storage techniques, and improved texture and flavor.3 However, the issue of how to enhance any sensory property of the fruit is subjective and depends mainly on the parties involved in creating the novel variety. It is often that a grower or breeder may select and breed fruit that is more suited to his or her preferences. Economic viability of the new cultivar relies on consumer consumption and liking which may be disjointed from decisions that guide ease of growth. The end users should have a say as to what sensory profiles new varieties should exhibit, as they drive purchasing.

Citrus flavor is complex and difficult to characterize.4 The Citrus genus contains at least 100 unique volatile components,5 with dozens important to oranges.6 Specifically, for oranges, known important volatiles include limonene, ethyl butanoate, octanal, decanal, hexanal, (S)-linalool, and many other hydrocarbons, alcohols, aldehydes, and esters.4,6,7 Paired with non-volatile compounds, such as sugars and acids, these aromatic chemicals combine to evoke orange flavor. The flavor of any specific orange is also dependent on numerous confounding factors, including ripeness, waxing, and storage.8,9,10,11 Production practices, shipping, and other processes combine to produce fruit that arrives at market markedly different than those freshly picked from the tree.9,12,13,14 Few studies have investigated Navel oranges obtained directly from the marketplace, which best represents fruit that consumers would typically purchase.

While consumer preference studies for oranges are uncommon, past work has proved effective in ushering change for the industry. Formal research on improving oranges goes back at least to 1917 when the soluble solids concentration to titratable acidity ratio (SSC/TA) was introduced to the California citrus industry by the Department of Agriculture.15 Since then, work has gone on to improve chemical standards for quality, dictating a change to the BrimA measurement,16 a measurement based on subtracting acid from total solids content rather than using a ratio. Other work has addressed flavor from an expert standpoint,9 but few studies have been performed through evaluation with naive consumers.

Preference mapping is a product optimization technique that has trained panelists and consumers evaluate a set of products representative of a product category to determine preference segmentation for that category and to identify sensory drivers of liking for the uncovered preference segments.17 In a past study of Navel oranges, we found preference segmentation in both adults and children consumers from Northern California,18 where samples high in sweetness, overall flavor, orange flavor, and juiciness were preferred by both groups.

The main objective of this work was to identify chemical and sensory markers of consumer preference for Navel oranges using both chemical (flavoromic and metabolomic methods) and sensory analyses (descriptive analysis and consumer testing). Additionally, we explored a combination of penalty drop analysis via Just-About-Right scaling19 and consumer descriptive classification using Check-All-That-Apply20 for the purpose of explaining how consumers choose and appreciate Navel oranges

Results

Chemical analyses

Concentrations of volatile and non-volatile compounds in the Navel oranges are shown in Table 1. Nearly all non-volatile compounds were found to differ significantly among the oranges. Citrate levels were lower in Navel B than in the others. Fructose and glucose values were highest in samples A, OL, S, and F. Sucrose concentrations were highest in samples B and F. Only four volatile compounds were found to be significantly different among the orange samples: hexanal, ethylhexanoate, octanal, and linalool. Octanal and linalool showed a similar trend, with highest values in Navels OL and SW. For ethylhexanoate, Navels S and A had the highest values. Navel OL and Navel W had the highest and lowest concentrations of measured hexanal. Many of the compounds could not be compared using ANOVA as at least one sample replicate provided results that were below the detection threshold.

Table 1 Chemical compound concentrations (µM), detection method, and chemical cluster of the seven tested Navel oranges

A distance matrix was created using the scaled (to standard deviation of 1) and mean-centered concentrations for hierarchical clustering using Ward’s Method. The dendrogram is shown in Fig. 1 indicating the presence of eight chemical clusters. The cluster groups for the compounds are shown in Table 1. Cluster 1 consisted of a few aromatic compounds, such as octanal (fruity-like) and linalool (floral-like). Cluster 5 contained fructose and glucose along with other amino acids. Ethanol and sucrose grouped into cluster 6. Citrate and ascorbate solely composed cluster 7.

Fig. 1
figure 1

Cluster dendrogram of chemical compounds detected in the seven Navel oranges. Concentrations were scaled and centered (to mean = 0 and standard deviation = 1). A distance matrix was created between the compounds which was used to perform hierarchical clustering using Ward’s method

Descriptive analysis

Mean intensity ratings of the significant descriptive variables across oranges are presented in Table 2. The most intensely flavored oranges were Navels A, F, and OL while Navels B and W were the least flavorful. The highest sweetness values were given to Navels F, A, and B. The least sour samples were Navels B, W, and F.

Table 2 Descriptive analysis mean ratings of significantly different attributes for the seven Navel orange samples

Consumers tests

In total, 193 adults and 69 children participated in the tasting sessions. In general, adults liked the samples (Table 3), as all samples were rated above 5 = “neither like or dislike” on the 9-point hedonic scale and, except for Navel W, all were rated above 6 = “like slightly”. As a group, the most liked sample was Navel F rated around 7 = “like moderately”. On average, however, no sample exceeded 7 points. The children showed a similar trend to the adults (Table 3). No samples were significantly different except for Navel W, which was liked less than the others, not above 4 = “I like it a bit” on the 7-point hedonic scale used with the children. Both adults and children were asked how much they liked other attributes such as the flavor, texture, and appearance which were correlated to overall liking, shown in Supplementary Tables 6 and 7. These additional ratings were significantly correlated to overall liking but not all were correlated to the same degree. Liking of appearance showed the lowest correlations with overall liking while liking of flavor had the highest correlation.

Table 3 Overall liking values for the seven Navel orange samples by both clustered and total populations of children (n = 69 children, age 7–12) and adult (n = 193 adults, age 18+) consumer tasters—the consumers were provided with a paper ballot and randomized samples

Just-About-Right ratings

The Just-About-Right ratings for the fruit were aggregated for all consumers on a percentage basis and are shown in Table 4 for the adults and Table 5 for the children.

Table 4 Just-About-Right (JAR) rating proportions for the seven Navel oranges for different taste and texture modalities as rated by the adult consumers (n = 193, age 18+)
Table 5 Child Just-About-Right (JAR) rating proportions for each fruit for different taste and texture modalities

For the adults, the samples with the highest proportion of Just-About-Right ratings for sweetness were Navels A and F, both over 60%. Their ratings were skewed right; even the sweetest fruit showed only 8% ratings of “Too Sweet”. A similar trend was seen with JAR sourness. The samples with the highest proportion of JAR sourness were Navels A, S, and F. For all fruit, however, at least 33% of the population found them to not be sour enough. For the most sour fruit (as rated by DA), only 11% of the group found it be to be sour.

For the children, their ratings were more normally distributed, with higher proportions expressing that they found the fruit to be just right for sourness (Navels A and S). The other Navels tended to not be sour enough for them. Navel W, for example, was rated “Not sour enough” by 46% of the child population.

For JAR firmness and juiciness, the trends were similar in both adults and children. The ratings were right-skewed; the consumers thought the fruit was both not firm enough and not juicy enough. The proportions for Just-About-Right were, however, higher than for sourness and sweetness. This indicated that the texture was acceptable but all of the oranges could have been a bit more juicy and firm.

Penalty analysis

Mean liking penalty analysis was performed using the JAR data and the overall liking values. For the adults, too little sweetness was marked frequently and had the largest impact on overall liking. Too little sourness was used slightly more often but had a lower penalty, just over 1.2 hedonic points on the 9-point hedonic scale. Too much sourness carried a large penalty as well but was used much less frequently overall.

For the children, too little sourness was used most frequently but had a relatively small penalty in comparison to too little sweetness. The children had much higher proportions of samples being rated JAR. Too little sweetness and too much sourness had the highest penalties, at just over 1.5 hedonic points (but on a 7-point hedonic scale).

Check-All-That-Apply

The consumers also characterized the fruit using Check-All-That-Apply attributes that were generated from a mix of hedonic terms and descriptive attributes from the descriptive panel (Supplementary Tables 1 and 2). All of the CATA attributes showed significant differences between the products as shown for the adults in Supplementary Table 8.

Based on the overall liking values (Table 3) the oranges that were liked the most were Navels A and F, while the least liked samples were Navels B and W. Navels A and F had the highest values of the terms “Aromatic”, “Sweet Tasting”, “Flavorful”, “Tropical Flavor”, “Fresh”, “Complex Flavor”, “Balanced Flavor”, and “Juicy”. Navels B and W had the lowest ratings of “Sour Tasting”, “Flavorful”, “Typical Orange Flavor”, “Fresh”, “Balanced”, and “Juicy”. These samples also had the highest values of “Bland”.

Consumer clusters and descriptive attributes

The consumers were clustered according to their overall liking scores (Table 3). These clusters were correlated to the descriptive attributes through partial least-squares (PLS) regression (Fig. 2). The children showed two different preference clusters. The larger group of child consumers (n = 43) had preferences positively correlated to sweeter fruit and negatively correlated to chemical, pine, lemon/lime, and woody aromas, as in Navel samples A, B, and F. The smaller group (n = 26) of children preferred the sour samples, Navel OL, S, and SW.

Fig. 2
figure 2

Partial least-squares 1 regression of consumer clusters onto significantly different descriptive analysis ratings. The product scores were scaled to fit to include into the regression to create a biplot. a Adult cluster 1 (n = 62) average ratings (R2 = 57%, 38% for t1, t2, respectively). b Adult cluster 2 (n = 84) average ratings (R2 = 85%, 10% for t1, t2, respectively). c Adult cluster 3 (n = 29) average ratings (R2 = 74%, 20% for t1, t2, respectively). d Adult cluster 4 (n = 18) average ratings (R2 = 63%, 26% for t1, t2, respectively). e Child cluster 1 (n = 26) average ratings (R2 = 61%, 26% for t1, t2, respectively). f Child cluster 2 (n = 43) average ratings (R2 = 76%, 15% for t1, t2, respectively)

Adult Cluster 1 (n = 62) was loaded positively to the first and second dimensions. These consumers preferred the Navel A and F samples, two of the samples with the highest overall flavor. The largest consumer cluster, Adult Cluster 2 (n = 84) was clearly driven by the overall flavor, juiciness, and orange flavor. Their preferences were negatively correlated with fibrousness and woody flavor. There were no easily identified sensory drivers of liking for the smaller clusters 3 and 4. In follow-up analyses, clusters 3 and 4 were combined but no significant relationship was found between their pooled preferences and sensory attributes as measured by the descriptive panel.

Consumer clusters and chemical clusters

One of the main goals of this work was to relate the chemical measurements performed on the groups of oranges with the consumer preference clusters that were uncovered in the analyses. Partial least-squares regression was performed on the consumer clusters with regard to the chemical cluster values. This was done for the adults (Fig. 3a–d) and for the children (Fig. 3e, f).

Fig. 3
figure 3

Partial least squares 1 regression of consumer clusters on scaled and centered chemical cluster averages. a Adult cluster 1 (n = 62) average ratings (R2 = 47%, 26% for t1, t2, respectively). b Adult cluster 2 (n = 84) average ratings (R2 = 76%, 4% for t1, t2, respectively). c Adult cluster 3 (n = 29) average ratings (R2 = 20%, 5% for t1, t2, respectively). d Adult cluster 4 (n = 18) average ratings (R2 = 59%, 7% for t1, t2, respectively). e Child cluster 1 (n = 26) average ratings (R2 = 84%, 7% for t1, t2, respectively). f Child cluster 2 (n = 43) average ratings (R2 = 69%, 4% for t1, t2, respectively)

Within Fig. 3, the majority of consumers showed a strong correlation to higher relative values of fructose and glucose (chemical cluster 5). These compounds made up the majority of the sugar detected in the Navel oranges (Table 1). Liking was unrelated to sucrose content. In this case, aspartate, proline, and alanine were also correlated with fructose and glucose. These chemical clusters, combined with the descriptive analysis, show that the preferred sensory profiles included sweetness, overall flavor, and orange flavor, and occasionally sourness.

Descriptive attributes and chemical clusters

The chemical attributes influenced the ratings given by the descriptive panel. Sweetness was found to be strongly correlated to chemical cluster 5 (Fig. 4a), the cluster containing fructose, glucose, aspartate, proline, and proline betaine. Sourness was positively correlated with chemical cluster 7 (citrate and ascorbate, Fig. 4b), while negatively correlated to compounds in chemical cluster 8 (malate, galactose, trigonelline). Overall flavor was most strongly related to chemical cluster 5 (Fig. 4c). Fruity flavor was strongly associated with chemical clusters 4 and 6 and away from chemical cluster 7 (Fig. 4d).

Fig. 4
figure 4

Partial least squares 1 regression of selected descriptive attributes average ratings for the seven Navel oranges on scaled and centered chemical cluster averages. a Sweetness (R2 = 73%, 15% for t1, t2, respectively). b Sourness (R2 = 80%, 15% for t1, t2, respectively). c Overall flavor (R2 = 83%, 4% for t1, t2, respectively). d Fruity flavor (R2 = 90%, 5% for t1, t2, respectively)

Discussion

In a previous study, it was found that Californian consumers tended to prefer Navel oranges that were high in overall flavor, sweetness, and juiciness.18 All four adult clusters identified in this study should be taken in context and in tandem with the last study, as consumer cluster stability is difficult to prove in sensory studies. It seems plausible that the unidentified preferences of adult clusters 3 and 4 may be due to their smaller panelist count and the overall similarity of tested Navel oranges. However, the two largest adult consumer clusters (clusters 1 and 2) preferred oranges that were rated highly in overall flavor, orange flavor, sweetness, and juiciness (Fig. 2a, b), which is in agreement in the past study.18 The children were more clearly split between sweetness and sourness, where the first cluster preferred the more sour fruit and the other preferred fruit that was sweet and fruity (Fig. 2e,f).These findings also support past work that has shown key attributes such as sweetness and orange flavor are often predictive of consumer liking.21,22,23

The chemical relation to descriptive attributes is clear with the attributes of fruity flavor, sourness, and sweetness. Fruity flavor was related to compounds that clustered with ethanol (Fig. 4d). Past work has found that flavor degradation can be traced to increasing concentrations of ethanol, with fruit transitioning from tart and citrusy to fruity with off flavors.24,25,26 Sweetness and sourness are usually well explained by sugars and acids in citrus.4 Sourness was related to citrate and ascorbate which comprised chemical cluster 7 (Fig. 4b). For sweetness, total soluble solids is often used to help predict sweetness.16 The findings presented here show that fructose and glucose contributed more to sweetness than sucrose (Fig. 4a), even though sucrose was found at a similar concentration to both of the monomeric sugars. In addition, the compounds that clustered with glucose and fructose, such as proline, aspartate, and alanine are amino acids that are increased in response to plant stress.27 Stress, such as that caused by drought,28 can increase synthesis of flavor compounds as well as flavonoids, which are key nutrients in oranges. The stress induced flavor compounds may have additionally influenced the perceived sweetness of the fruit through taste–odor interactions as has been shown with fruity compounds.29

Compounds responsible for orange flavor and overall flavor intensity were elusive. Orange flavor was not well described by the statistical models presented here. Although there was no cluster of compounds that were positively related to those attributes, chemical cluster 8, consisting of malate, trigonelline, and galactose, showed a negative relationship with orange flavor. Trigonelline has been found in higher levels in salt-stressed citrus fruits.30

The purpose of this study was to relate the sensory profile and consumer liking to chemical compounds. Sugars, acids, and volatiles are at the core of citrus flavor.4,6,7,9,12,22 The key compounds studied here affected the perceived citrus flavor, as shown by the ratings from descriptive panel. These flavor differences influenced the non-homogeneous consumer preferences clusters. While the chemical drivers of liking for the consumers varied by cluster, there were clear trends. Sweetness, driven by compounds in cluster 5 such as fructose, glucose, proline, aspartate, alanine, and proline betaine, was a major driver of liking for nearly all of the consumers. Sourness, as chemical cluster 7, was also shown as a strong driver of liking for child cluster 1. Chemicals in cluster 1, hexanal, octanal, and linalool, acted as a driver of liking for adult cluster 1. These compounds are known to be important to citrus flavor31 but did were not strongly related to any descriptive attribute in this study.

In order to market Navel oranges more effectively to the consumers, those words selected by the consumers in the CATA portion with a positive sentiment could be used in messaging and advertising. For example, adult cluster 1 liked Navel A (Table 3) and they also noted that this sample could be described by the terms “Good Appearance”, “Aromatic”, “Sweet Tasting”, “Flavorful”, “Typical Orange Flavor”, “Fresh”, and “Juicy” (Supplementary Table 8). Successful marketing might leverage these key words while avoiding more confusing terms such as “complex flavor” or “floral flavor”.

The consistently high quality and flavor homogeneity of oranges produced in California make it difficult to study how consumers preferences segment in accordance to different flavor profiles. However, improvement in internal fruit quality is still possible, which would boost consumer liking. Our research has shown that commercially produced Navel oranges could benefit from more sugar and acidity, based on consumer JAR (Tables 4 and 5) and CATA (Supplementary Table 8) ratings. Changes to the fruit happen during processing, waxing, shipping, and storage leading to fruit from that at the packinghouse.14,25,32 As the fruit presented here was not stored in true supermarket conditions, a future study of fruit throughout the entire supply chain may identify key control points in sustaining high acceptance for Navel oranges overall.

Methods

Fruit

Seven commercial Navel oranges from various producers in California, harvested in February 2017, were obtained from growers and packers through the California Citrus Research Board (CRB) or purchased at local grocery stores and produce wholesalers. The fruits were treated according to current industrial processing practices, including washing, rinsing, waxing, grading, and boxing.3 Upon reception at the University of California, Davis, the fruit was stored cold at 4 °C and 85% humidity. All fresh samples were stored for less than one month and all sensory experiments were performed within a 10-day span (3/3/17–3/10/17) to prevent sensory changes to the fruit. Samples tested were taken out of cold storage approximately 12 h before both descriptive analysis and consumer tests in order to equilibrate them to room temperature. For NMR analysis, samples were peeled, juiced using a handheld citrus press, and immediately frozen at −80 °C until processing. For gas chromatography–mass spectrometry (GC-MS) analysis, unpeeled samples were squeezed manually using a hand blender without applying heat or any solvent to obtain juice, and unused peel, pulp, and seeds were eliminated. Seven lots of fruit were collected due to the number of commercially available navel oranges on the dates of testing as well as to limit the sensory fatigue of the consumer panelists.

Descriptive analysis

Generic descriptive analysis17 was performed using 13 judges (9 females, 4 males, ages 25–75), many of whom were part of a previous citrus descriptive panel.33 Panelists completed seven training sessions on Navel oranges. The first two sessions involved term generation based on citrus found in local retail outlets. The following sessions focused on attribute alignment, aided by the use of references, to finalize a list of descriptive terms, shown in Table 6, to ensure the judges rated the attributes in a similar manner. During training, judges used a sample ballot that listed each of the terms with an adjacent 10 cm line scale anchored at 1 cm indentations to limit scale end use effects.17 An electronic ballot was designed for data collection through FIZZ (v2.47B, Biosystèmes, Couternon, France) for the actual descriptive analysis. For evaluation, the judges were first presented with a group of seven Navel oranges to rate the visual attributes of the samples. For the following terms, the judges were presented with one half of one orange sample cut through the stem end, instructed to peel the half, and evaluate. This technique was performed because the panelists felt that they could not accurately score the appearance attributes of the fruit from a single piece of fruit. Unsalted crackers (Mondelez, East Hanover, NJ) and water were provided to cleanse their palates between samples. The samples were identified using random three-digit codes and evaluated in triplicate under white light. Presentation order of the samples was randomized using a William’s Latin Square design provided by the FIZZ system.

Table 6 Descriptive analysis attributes and references—13 judges (9 females, ages 25–75) were trained to perform generic descriptive analysis on market-available Navel oranges using a 10 cm line scale

Consumer testing

Adults and children (7–12 years old) from the local (Davis, Woodland and Sacramento) community were recruited for the tests. Potential participants were screened for appropriate age, lack of allergies, and consumption frequency of citrus. The age of the children was determined based on cognitive abilities required to use hedonic scales, intensity scales, and other sensory measures.34 Children were accompanied by a parent/guardian at all times but seated at their own booth. The adults participating in the study completed an online questionnaire aimed to collect information regarding their demographics, consumption habits, and psychographics.

The tasting sessions took place in the Silverado Vineyards Sensory Theater of the Robert Mondavi Institute for Wine and Food Science at the University of California, Davis. In total, 193 adults and 69 children participated in the tasting. Each consumer was given a double-sided test ballot for evaluation of each sample, plain crackers, water, and napkins. The fruit were served as two one-sixth wedges cut from the same fruit in soufflé cups placed on the tray in a fully balanced sequential monadic order. Their tray provided arrows, matched with the testing ballots, to ensure they tasted in the order designed. This study was approved for the use of human subjects by the Institutional Review Board of the University of California, Davis and all consumers consented to participating in the tasting session.

Adult consumers rated their degree of liking on the 9-point hedonic scale for appearance, overall liking, flavor, and texture as well as the adequacy of sweetness, sourness, firmness and juiciness on a 5-point Just-About-Right scale. Children consumers rated their overall liking and liking of appearance, taste and texture of the fruit on a smaller, 7-point hedonic scale as well as the adequacy of the sweetness, sourness, firmness, and juiciness of the fruit on a 3-point Just-About-Right scale. Check-All-That-Apply attributes were also presented to the consumers with a combination of hedonic and descriptive attributes. CATA terms are shown in Supplementary Tables 1 and 2. Ballots used were based on past consumer citrus evaluation.18,33 The mix of liking, JAR, and CATA questions provided the opportunity for mixed consumer analyses and comparisons with the descriptive panel. These combinations have shown effective results in the past.19,35

1H nuclear magnetic resonance of non-volatile components

The frozen juices were allowed to thaw at room temperature before being centrifuged. A 500 µL of juice supernatant was filtered using an Amicon Ultra-0.5 Centrifugal Filter Unit with a 3 kD cut-off (MilliporeSigma, Burlington, MA), which had been previously washed with deionized water. Two hundred and seven microliters of filtrate was mixed with 27 µL of an internal standard of 5 mM 3-(trimethylsilyl)-1-propanesulfonic acid-d6 (DSS-d6) in >98% D2O (Chenomx, Edmonton, AB, Canada). The pH of the samples was adjusted to 6.8 ± 0.1 and 180 µL of sample was transferred to a 3 mm NMR tube. 1H-NMR spectra were acquired on a Bruker Avance 600 NMR spectrometer at 298 K using the Bruker “noesypr1d” pulse program as previously described.36 The resulting NMR spectra were processed and profiled using Chenomx NMR Suite v8.3 as described.36 Quantitation of each compound was achieved as described in Weljie et al.,37 based on the concentration of the DSS-d6 internal standard.

Chemicals

Authentic GC standards were purchased from Sigma-Aldrich (St. Louis, MO), Molekula Group, LLC (Santa Ana, CA), Acros Organics (Pittsburgh, PA), and TCI America (Portland, OR).

HS-SPME-GC-MS-O analysis of volatile components

Volatile compounds in the orange juice samples were extracted using headspace solid-phase microextraction (HS-SPME). A 5 mL aliquot of freshly squeezed orange juice was transferred into a 40 mL vial before the addition of 1.80 g NaCl and 2 µL octyl acetate (0.05 µg/µL in methanol, internal standard) into each sample. The juice samples were gently agitated using a stir bar and placed in a 40 °C water bath (Baxter Scientific Products, Cincinnati, OH, USA) at 200 rmp for 30 min. After equilibration, a Stableflex fiber (2 cm, 50/30 µm, Divinylbenzene/CarboxenTM/polydimethylsiloxane, Supelco, Bellofonte, PA, USA) was exposed in the vial headspace at 40 °C for 30 min. Volatile compounds were identified using a PerkinElmer Clarus 680 gas chromatograph (PerkinElmer, Waltham, MA) equipped with a PerkinElmer Clarus SQ 8T mass spectrometer (PerkinElmer). The mass spectrometer was operated in the electron impact ionization mode with an ionizing energy of 70 eV. A constant pressure of helium, the carrier gas, was set at 30 psi as calculated using PerkinElmer Swafer utility software (PerkinElmer). Chromatographic separation was achieved by TR-FFAP capillary column (30 m × 0.32 mm × i.d., 0.25 µm df; Chrompack, Mühlheim, Germany). Injection port temperature was set at 280 °C, and the carrier gas, helium, was set to a flow rate of 1.5 mL/min. The original oven temperature was set at 40 °C for 2.0 min, then gradually increased to 230 °C at a rate of 5 °C /min, and finally held at 230 °C for 10 min. The scan range of the mass spectrometer was m/z 50 to 300. Aroma-active compounds were directly detected by the sniffing port of the GC-MS-O. Data collection was performed using TurboMass (v6.1.0, PerkinElmer). Peak identification of volatile compounds was achieved by comparing the linear retention index (LRI) values and mass spectra with the NIST library (National Institute of Standards and Technology, Gaithersburg, MA, USA). A mixture of n-alkane standards (C7-C30) was also analyzed to calculate retention indices. Additionally, authentic standards were run on a TR-FFAP column using their retention times to confirm compound identity. To evaluate the amount of each volatile in Navel oranges, a semi-quantification method was conducted using analyte/internal standard peak area ratio, based on the concentration of internal standard.

Data analysis

The level of alpha was set at 0.05 for all statistical parameters. All data analysis was performed using R (Version 3.5.2, R Core Team, Vienna, Austria).

Chemical data for volatile compounds and non-volatile compounds were evaluated individually by analysis of variance (ANOVA). Any compound that did not differ significantly (at P ≤ 0.05) across the fruit was removed from further analysis. To compare with consumer cluster preferences, the compounds intensities were scaled, centered, and hierarchically clustered using Euclidean distances and Ward’s method.38 This effectively grouped the chemical compounds into clusters and reduced multi-collinearity in the dataset. The method of clustering chemical attributes is based on past analysis of red wine.39 Principal component analysis was also performed on the significant, scaled chemical values. More detailed information regarding the F-values and effect sizes of the chemical compounds is shown in Supplementary Table 4.

For the descriptive analysis data, a three-factor MANOVA (judges, replications, products) was performed on all attributes and followed with three-factor ANOVAs on each attribute using a pseudo-mixed model to test for product significance.40 The pseudo-mixed model uses judge by product and replication by product interactions as the denominator when testing for product effect significance. Fisher’s least significant difference (LSD) test followed ANOVA to separate the means for the different Navel oranges using the agricolae package (v1.2-8). More detailed statistical information regarding F values and effect size are shown in Supplementary Table 3.

For the consumer data, univariate statistics were performed on the hedonic questions. As is often done with consumer liking data measured on the 9-point hedonic scale, ANOVA and Fisher’s LSD were used to determine differences in overall liking17 and then principal component and cluster analyses were performed for preference mapping purposes. Just-About-Right data were compared across clusters or products using rating proportions and mean drop penalty analysis.19 CATA scores were analyzed across products using Cochran’s Q-test41 using the the RVAideMemoire package (v0.9-63-3). More detail regarding the Q values for the CATA attributes for the adult consumers is shown in Supplementary Table 5. For preference clustering, values for overall liking values were scaled and a Euclidean distance matrix was computed between the consumers. The consumers were then clustered according to Ward’s Method.38 The clusters were validated using a two-way ANOVA with cluster and product as main effects.

PLS regressions (PLS1 and PLS2) were employed for to model the dependence of consumer and descriptive analysis data onto chemical data and were generated using the plsdepot package (v0.1.17) with a ggplot2 (v3.1.0) wrapper.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.