## Introduction

Olive oil (OO) is one of the oldest and essential edible oils commercially traded in the history of mankind. Olive oils are commonly classified into extra-virgin olive oils (EVOOs), virgin olive oils (VOOs) or mixed with refined olive oils (refined OOs) (Fig. 1A), depending on among other factors, its fatty acids (FA) profiles and the trace compounds (e.g. concentration of free fatty acids (FFA) or acid value (AV)1,2,3,4, phenolic compounds5). FAs are predominantly defined by its saturation levels (e.g. saturated fatty acids (SAFA), monounsaturated fatty acids (MUFA) and polyunsaturated fatty acids (PUFA)) (Fig. 1B). The FFA content is influenced by a number of phytosanitary factors and extraction processes6,7,8. As a consequent of variation in processing (e.g. poor olive quality or inadequate extraction process), triacylglycerols structural breakdown may occurs (due to for example high temperature and moisture induced hydrolysis9), resulted in an increment in the final acidity of the oils3,4 (Fig. 1C, D).

The high demand of OO comes from its multiple nutritional benefits and its irreplaceable organoleptic properties10,11. Olive oil is by far one of the most frequently adulterated food products due to its high customer appeal and large profitable margin12,13. The highly desired and expensive EVOO is frequently diluted with cheaper adulterated oils, leading to indirect economic consequences and health concerns. Hence, olive oil has been the subject of rigorous quality regulations, with its standardization characteristics set amid tight legislation.

Laboratory-based methods, such as chromatography13,14,15,16, spectroscopy12,17,18,19,20,21, or DNA analysis13,22,23, have been extensively developed to reduce the cases of adulteration. Nuclear magnetic resonance (NMR) spectroscopy in the high-field frequency domain has also been proposed to be an effective method on the detection of authentication, quality control, and adulteration of the oils. High-field NMR, however, has a number of drawbacks, such as the requirement of large, dedicated laboratory facilities with costly cryogenic cooling gases, complicated pre-analysis steps, and the need of a highly specialized workforce17,24,25. None of the above-mentioned detection methods are simple to use, require minimal sample preparation, nor present short turn-around time.

We have recently demonstrated that two-dimensional time-domain NMR can be used to classify edible oils based on their physiochemical composition (e.g. saturation levels) with much higher accuracy than the conventional methods26. The low-field NMR-based point-of-care (PoC)27,28,29,30,31,32analysis is based on pairing the longitudinal (T1) and transversal (T2) relaxation times, which improves the sensitivity and specificity of the detection significantly. It works on the rationale that accumulative characteristics of each dimensionality form a specific and unique signature, in a way similar to the radiomics technique developed in the field of radiology.

In this work, we demonstrate that NMR-based phenotypic traits in the time-domain (at molecular level) can be used for classifying the OOs. Using just a single droplet, we demonstrated that using benchtop sized NMR33,34,35, olive oils can be rapidly classified (into EVOOs, VOOs or refined OOs) in non-destructive manner (i.e. label-free or without sample pre-treatment). The subtle differences in physiochemical composition and molecular microenvironment of the olive oils induce substantial changes in the relaxation mechanism in the time-domain NMR regime (Fig. 1E, F). With the aid of machine learning, the sensitivity and specificity of the detection were shown to have AUC = 0.95 using T1 relaxation and T2 relaxation, much higher than current gold-standards, the near-infrared spectroscopy (NIRS, AUC = 0.84) and Ultraviolet-Visible spectroscopy (UV-Vis, AUC = 0.73) (Table 1), and much better performance in the identification of regions of origin (Table 2). In addition, the proposed NMR-based detection methods were much cheaper per assay, user-friendly, and can be used at point-of-detection (Table 3). This work demonstrated the spirit of combining the (old-fashioned) machine with the (new-wave) of machine learning, to produce an ′intelligent machine′30,36,37, an attractive scientific solution for the food science community.

## Results

### Rapid identification and characterization of olive oils with NMR-based PoC

In order to demonstrate the industrial applications, we use the proposed technique to validate the authenticity of EVOO from VOOs and refined OO (Fig. 2). The relaxometry measurements and acidity determination (details in Methods) were performed on thirty-six types of OOs (i.e. 21 EVOOs, 8 VOOs, and 7 refined OOs,) without disclosing the manufacturers label and country of origin. For each sample, the relaxation measurements were carried out using five different samplings, with the refined OO was performed as control experiment.

A two-dimensional map T1-T2 magnetic state diagram was used to enumerate the object clustering based on the composite intrinsic relaxation properties of the oils, thereof, forming a calibration standard for the (EVOOs, VOOs, refined OOs), and (150.5 ms, 168.0 ms), (153.2 ms, 174.4 ms), (146.3 ms, 162.8 ms), respectively (Fig. 2A and details in Supplementary Fig. 1).The oil types were significantly clustered (P < 0.005) indicate that the intra-variation samplings were much smaller than the inter-variation of the OOs (Fig. 2B).The details breakdown for each commercial brand is shown in heatmap (Fig. 2C).In addition, the Receiver Operating Characteristics (ROC) analysis (Fig. 2D and Supplementary Fig. 2) indicated that relaxometry measures have excellent detection sensitivity and specificity with Area Under the Curve (AUC) of 0.95 as compared to its counterparts NIRS (0.84) and UV-Vis (0.73), respectively (Table 1).

### Identification of OO based on the regions of origin

We demonstrated the feasibility of using the proposed NMR analyses in identification of production based on their countries (or regions) of origin. Apart from the genotypic variation, the variation in phenotypic traits is governed by number of factors, such as migration drift (e.g. diversification and domestication events)38, and abiotic stress (e.g. local climate, soil conditions)39,40. For the identification of the regions of origin for OO, a matrix of data subsets, encompasses four different regions taken from the European regions (i.e. 3 Greece, 4 Italy, 9 Portugal, and 5 Spain) were enumerated using two-dimensional T1-T2 magnetic state diagram (Fig. 3) and the details of each oil variations (details in Supplementary Fig. 3, and Supplementary Table 1).

The mean T1 relaxation times of (166.3, 166.7, 168.9, and 168.9) ms, and for T2 relaxation times of (147.7, 150.1, 150.2, and 151.0) ms for (Greece, Spain, Italy, Portugal), respectively (Fig. 3, and details in Supplementary Fig. 3). The regional-based identification for NMR technique is AUC = 0.71, much higher or comparable to NIRS (AUC = 0.70) and UV-Vis (AUC = 0.69) (details in Table 2). Interestingly, when a pair-wise comparison matrix (i.e. pair-wise ROC-AUC evaluation) is employed using NMR-based traits (e.g. T1, T2, A-ratio) it resembles the geographical orientation (Fig. 4A). For example, Greece-Italy (AUC = 0.74), Greece-Spain (AUC = 0.84), and Greece-Portugal (AUC = 0.89) shown as a heatmap (Fig. 4B). The Iberian region (i.e. Spain-Portugal) and Italy-Greece displayed stronger similarities with AUCs of (0.69, 0.74), respectively. This is to be expected as neighbouring countries are expected to have much higher of species exchange due to its proximity in geographical location. The details of each oils purchased displayed a unique information on their location (Fig. 4C and D).

### Limit of detection of NMR-based traits technique

We evaluated the limit-of-detection of NMR-based traits by mixing sunflower oil into a selected EVOO, to mimic the cases of adulteration. For each sample, the relaxation measurements were conducted in double using five different samplings, covering from 0% (as control) to 100% of OO in the mixed edible oil (Supplementary Fig. 4). As clearly indicated in the T1-T2 magnetic state diagram, a linear relation (r2 = 0.93) between NMR-based traits and the concentration of sunflower oil (PUFA-rich) reduced into EVOOs (MUFA-rich) relaxation effect becomes clearer (due to a decrease in saturation level). Therefore, the (T2, T1) coordinates were (188.3, 202.9) and (155.3, 174.6) for sunflower oils and EVOO (controls), respectively. The limit of detection for NMR-based traits were approximately (1%), were either comparable to NIRS (1%) or much better than UV-Vis spectroscopy (5%) (details in Supplementary Fig. 4).

## Discussion

We report NMR-based point-of-care technology for fast, label-free, and distinctive OO profiling and to assure its high quality, which can be used to reduce the attempts in adulteration. The NMR-based phenotypic traits represent the intrinsic molecular relaxation dynamics (or molecular mobility) due to the composite effect of the FA profiles (e.g. saturation level) and concentration of FFA (e.g. acid value). Nevertheless, despite OOs consists of predominantly the monounsaturated fat (more than 70%), we found in this work that the overall saturation levels (e.g. increasing PUFA/MUFA ratio, lower SAFA content) has profound impact on the NMR traits (details in Supplementary Fig. 5). Secondly, we observed that FFA concentration has direct effect on the NMR-based phenotypic traits. We hypothesized that, with similar mechanism i.e. the saturation levels and FFA concentration disrupts the packing41,42 ‘efficiency’ (i.e. weakening of Van der Walls forces) leading to a disruption in the molecular mobility and hence introducing much longer pathways for relaxations (i.e. longer T1 and T2). This is in agreement with the recent work reported by Cistola43.

Conventionally, chromatographic-based techniques, are extremely slow, time-consuming, require complicated multiple sample preparation steps with expensive laboratory equipment, while complicated chemometric analysis (e.g. vibrational, RAMAN spectroscopy) is required for in depth data interpretation, in comparison to the proposed NMR-based detection methods and other state-of-the-art technologies (refer to SWOT-like Table 3). The information derived from the analytical instrument represents one of the major challenges faced by food scientist during the identification and classification of pure and adulterated food samples. With the introduction of EU Protected Designation of Origin registration and equivalent in other geographical locations, rapid classification (preferably in non-destructive manner) of EVOOs will be invaluable to industry and regulatory agencies alike.

On the other hand, the proposed NMR-based technology provides rapid, precise, low-cost, label-free, and accurate analysis for grading the olive oils quality using the NMR-based phenotypic traits in the time-domain NMR. In this framework, the central hypothesis of radiomics is that it is possible to decode tissue characteristics and pathology by examining the textural features in medical images. Similarly, clustering NMR techniques work on the rationale that accumulative characteristics of each dimensionality form a specific and unique signature (‘molecular fingerprint’) is extremely powerful for rapid and accurate classification of OOs based on the NMR-based phenotypic traits. In addition, with the introduction of machine learning, it is now inexpensive to process large datasets running in almost real-time setting, opening door to intelligent machine which can make interpretation with much higher sensitivity and specificity.

## Methods

### Details and sample preparation of the OOs

OOs analyzed were cooking oils bought locally in Braga, Portugal or purchased online (e.g. international brands). The commercial brands names were disclosed (in details in Supplementary Table 2). No further processing was made before the NMR measurements and all other measurements.

### NMR measurements and parameters

The 1H magnetic resonance measurements of olive oils were acquired at the resonance frequency of 21.7579 MHz polarized using a portable permanent magnet (Metrolab Instruments, Switzerland), Bo = 0.5 T, using a benchtop-type console (Kea Magritek, New Zealand). A temperature controller was set to maintain the measurement chamber at 30 °C. The T1 relaxation and T2 relaxation times were acquired using standard inversion recovery (IR) and Carr-Purcell-Meiboom-Gill (CPMG) train pulse sequences, respectively. The experimental parameters used were echo time = 200 μs, number of echoes = 10,000, and signal averaging = 32. A recycle delay of 2 s was set between each experiment to provide sufficiently long time to allow all molecular spins to return to thermal equilibrium. (T1 relaxation, T2 relaxation) measurements were carried out on commercial EVOOs, VOOs and refined OOs. NMR measurements were performed blindly on each oil ten repeated times, with a total of 360 points for olive type classification, and 210 points for origin assessment. Clustering NMR methodology uses a pair of relaxation times (T2, T1) for each object (oils in this case) to construct a (pseudo) two-dimensional map (Figs. 2A and 3).

### UV-VIS and NIR measurements and detection

UV-Vis measurements were performed in a SHIMADZU UV-2550 spectrophotometer (Kyoto, Kyoto, Japan), while for NIR measurements a PerkinElmer LAMBDA 950 instrument was used. All samples were measured in matched 1 cm path length quartz or optical glass cells, running an empty cell as a reference. UV-Vis spectra were measured within 200–800 nm spectral range at 1 nm spectral resolution, while NIR, spectra were obtained within 500–2200 nm with 5 nm steps. NIR spectra spike removal algorithms44 were applied (cut-off = 6, threshold = 10).Every sample was measured three times and the mean values were taken as representation.

### Acid value measurements

The acid value determination was performed under the EN ISO 660:200945 protocol for oleic acid quantification. Simply, 10 mL of edible oil were weighted and diluted in 20 mL of ethanol (φ = 99%,) with small amounts of phenolphthalein. Titrations with 0.1 mol/L of potassium hydroxide (KOH) were done under magnetic stirring until slight colour changes appear (and persisted for +10 s). Measures were executed twice per sample. The acid value was extrapolated from the amount of KOH required for each sample, defined as the amount of KOH required to neutralize 1 g of chemical substance, with the following formula:

$$w_{AV} = \frac{{56.1 \times cV}}{m}$$
(1)

where, c is the exact concentration of the standard KOH solution (mol/L), V the volume of KOH added (mL), and m the mass (g) of the test portion. Acidity, or the free fatty acid content, can be estimated by:

$$w_{{\mathrm{FFA}}} = \frac{{VcM}}{{10 \times m}} \approx 0.5 \times w_{AV}$$
(2)

wherein, M is the molar mass (g/mol) of the predominant fatty acid in the edible oil, in this case oleic acid (282.47 g/mol).

### Machine learning algorithm and workflows

Using statistical programming languages (e.g. Orange 3.1.246 or R), the raw datasets were processed using supervised and unsupervised learning techniques. The machine learning algorithms were written and run on a personal laptop (Intel Core Pentium i7 CPU @ 2.70 GHz, 8.00GB RAM). Once the model in machine learning was built, all the tasks run simultaneously and completed typically in less than 1 min. Using unsupervised learning, the relationship between each object was rapidly constructed using clustering analysis (e.g. hierarchical clustering) and its quantitative linkages (e.g. inter-/intra-cluster similarity) were shown on a dendrogram and a heatmap. Supervised learning models (i.e. Neural Network, kNN, Logistic Regression, Naive Bayes, and Random Forest) were used to train the datasets and the best model with the highest accuracy was chosen to predict the object classification (e.g. oil classification) using pre-trained datasets.

### Statistical analysis

For any two groups of separation, it is considered as statistically significant when this criterion (P < 0.5) is achieved or otherwise denote as non-significant (n.s). The student’s unpaired t-test was used throughout this study. One-tailed and two-tailed were used as mentioned in the figure captions. OriginLab–Pro 8 was used to handle all the graphs plotting.

### Receiving operating characteristic

The analyses were used to evaluate the specificity and sensitivity of the diagnostic techniques. Various supervised models were used for the ROC tests. These were namely the kNN, Logistic Regression, Naïve Bayes, Neural Network, and Random Forest models. A fitting of power function y = axb were used through the study. Iterations were run with the Levenberg–Marquardt algorithm until a chi-squared tolerance of 10−9 was achieved. Final function AUC was compared to the real averaged AUC from all supervised models (details in Supplementary Fig. 2).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.