Striking essential oil: tapping into a largely unexplored source for drug discovery

Essential oils (EOs) have been used therapeutically for centuries. In recent decades, randomized controlled (clinical) trials have supported efficacy in specific therapeutic indications for a few of them. Some EOs, their components or derivatives thereof have been approved as drugs. Nevertheless, they are still considered products that are mainly used in complementary and alternative medicine. EO components occupy a special niche in chemical space, that offers unique opportunities based on their unusual physicochemical properties, because they are typically volatile and hydrophobic. Here we evaluate selected physicochemical parameters, used in conventional drug discovery, of EO components present in a range of commercially available EOs. We show that, contrary to generally held belief, most EO components meet current-day requirements of medicinal chemistry for good drug candidates. Moreover, they also offer attractive opportunities for lead optimization or even fragment-based drug discovery. Because their therapeutic potential is still under-scrutinized, we propose that this be explored more vigorously with present-day methods.


Analysis of EOCs: introducing the (unique) Core Molecular Constitution of EOCs: (u-)cmcEOCs.
EO analysis by Gas Chromatography with Mass Spectrometry and Flame-Ionization Detection (GC-MS-FID) identified a total of 6,142 EOCs (≥0.10%; n EO = 175), at least at the level of their Core Molecular Constitution (CMC; Box 1); they are further referred to as cmcEOCs and were retained for further analysis. A total of 764 EOCs (≥0.10%; n EO = 175) could not be identified at least at their CMC level, resulting in incomplete or no data on the DDPs being studied; therefore, they were not included in further analyses.  Because there is an overlap in composition between EOs, many of the 6,142 cmcEOCs are identical, and the whole set can be described with only 627 different InChIKeys-14; these are further referred to as unique-cmcEOCs (u-cmcEOCs; Box 2; SI 2). Approximately 35% (n = 218) of the u-cmcEOCs appear only in one EO. This is in strong contrast with the five most frequently found u-cmcEOCs in our EO set ( Fig. 1), i.e. limonene (identified in 153 EOs: n EO = 153; see also SI 2 no 551), alpha-pinene (n EO = 149; SI 2 no 137), beta-myrcene (n EO = 141; SI 2 no 457), beta-caryophyllene (n EO = 139; SI 2 no 319) and beta-pinene (n EO = 133; SI 2 no 523). Together, these results largely correspond with previous findings 57 . As a robustness check, the EOC composition of three subsets (A-C) of our EO set were analyzed, i.e. (A) EOs of conventional cultivation (n = 101), (B) EOs of certified-organic cultivation (n = 74), and (C) all EOs of conventional cultivation, complemented with those EOs of certified-organic cultivation that originated from other plant species, the same plant species but from other plant parts, or the same plant species but a different chemotype (n = 141; SI 1). No differences in rank order were found for the five most frequent u-cmcEOCs for the defined EO (sub)sets (SI 3). This indicates that a typical EO consists of a combination of common and rare EOCs, and these findings suggest that the most common EOCs are present in a majority of EOs.
Suitability of eocs for drug discovery and development: DDps. DDPs are (derivatives of) common physicochemical parameters. DDFs may use the same DDPs, but possibly with different value ranges. To evaluate the potential lead-and drug-likeness of the u-cmcEOCs present in our EO set, we calculated the values of 13 DDPs, and determined how many times these satisfy the criteria for one or more of the six standard DDFs in our study ( Fig. 2 and Table 2). We also demonstrate that some DDPs, e.g. log P and log D (pH 7.4) or Muegge's atoms and polar surface area, are unsurprisingly very highly correlated and may be interchangeable (SI 4).
More than 90% of the u-cmcEOCs had values within the criteria limits for at least nine out of 13 DDPs, irrespective of the DDF. For only two DDP criteria of one DDF, i.e. the Muegge filter, more than half of the values of the u-cmcEOCs DDPs were outside the limits for these criteria ( DDFs for bioavailability. For the Ro5 benchmark DDF, all (n = 627) u-cmcEOCs passed the Ro5 DDF when assumed that an orally active drug usually has no more than one criterion violation 21 , and more than 94% (n = 591) of the u-cmcEOCs passed all four criteria (Tables 1 and 2). According to Lipinski, candidate drugs

Box 1
The Core Molecular Constitution (CMC) of a molecule describes i.a. its chemical formula, connectivity and hydrogen positions 83 . It is encoded by the first block of 14 (out of 27 total) characters of an InChIKey, subsequently referred to as InChIKey-14.
For example, EOCs trans-α-bisabolene (InChIKey-14 = YHBUQBJHSRGZNF), α-bisabolene (InChIKey-14 = YHBUQBJHSRGZNF), β-bisabolene (InChIKey-14 = XZRVRYFILCSYSP) and (-bisabolene (InChIKey-14 = XBGUIVFBMBVUEG) are all identified, at least at their CMC. The InChIKey-14 for the first two EOCs are the same, as stereochemical information like geometric isomerism is not part of the CMC. This is because information on isomerism is not encoded in the first -, but in the second block of the InChIKey, and is indeed different for all four aforesaid EOCs (data not shown). In contrast, for an EOC identified at a lower level of granularity, e.g. bisabolene without α-, βor (-preposition, no InChIKey exists as bisabolene refers to several closely related compounds with a different CMC. that meet the Ro5 criteria tend to have a lower attrition rate during clinical development, and therefore have an increased chance of reaching the market 29,58 . Approximately 98% (n = 607) of the u-cmcEOCs passed the Veber DDF, which again implies that u-cmcEOCs generally should have good oral bioavailability assuring good intestinal absorption 22,59 . Veber et al. 22 reported that the DDPs polar surface area and rotatable bonds probably discriminated better than DDP molecular mass between compounds that are orally bioavailable, and those which are not 22 . For our sample, we could not prove that DDPs polar surface area and molecular mass are correlated (ρ = 0.07, p > 0.05). We found that DDPs rotatable bonds and molecular mass are very weakly correlated (ρ = 0.10; p < 0.05) indicating that the Veber DDF captures mostly different information compared to the Ro5 DDF (SI 4). Furthermore, because the DDPs polar surface area and rotatable bonds are only moderately correlated (ρ = 0.46; p < 0.0001), these DDPs capture partially different information (SI 4). Therefore, the DDFs Ro5 and Veber can be considered complementary DDFs, at least for our EO set. Also, approximately 98% (n = 607) of the u-cmcEOCs passed the Bioavailability DDF (Table 2). This is not entirely surprising, because the Bioavailability DDF is www.nature.com/scientificreports www.nature.com/scientificreports/ essentially the merging of the Ro5 and Veber DDFs, complemented with the fused aromatic rings DDP, and whereby for this filter only any of 6 out of 7 criteria must be met (Table 1).
DDF for lead-likeness. DDFs are most often applied to hits from high throughput screens. However, in order to improve affinity and selectivity of a drug candidate, additional chemical groups are usually added, so that molecular mass and lipophilicity often increase during lead optimization. The Lead Likeness DDF, for example, is biased towards lower lipophilicity and molecular mass, so that interesting lead candidates can be further optimized towards drug-like candidates ( Table 1). The standard Lead Likeness DDF uses the DDPs Log D at pH 7.4 or alternatively Log P; approximately 73% (Table 2) or 87% (n = 544; not shown in Table 2), respectively, of the u-cmcEOCs pass this DDF.
DDF for fragment-based drug discovery. Furthermore, a DDF derived from Ro5 appears useful for efficient lead discovery in a fragment-based drug discovery approach, i.e. the Ro3 DDF 24 . About 32% (n = 202) of the u-cmcEOCs pass all 5 DDP criteria of the Ro3 (for criteria see Materials and Methods section). Most u-cmcEOCs that do not pass this DDF fail because the criteria limit(s) of DDP log P and/or DDP rotatable bonds were exceeded in 55.5% (n = 348) and 32.7% (n = 205) of the cases, respectively.
DDFs for drug-likeness. The DDP criteria of the high drug-likeness Ghose DDF are based on an analysis of known drugs from the Comprehensive Medicinal Chemistry database 60 and approximately 60% (n = 377) of the u-cmcEOCs passed this DDF (Table 2). In contrast, less than 10% (n = 59) of the u-cmcEOs passed the Muegge filter, which tries to differentiate between drug-like and non-drugs based on the observation that non-drugs are often under-functionalized 61 . The reasons for failing the Ghose or Muegge DDFs were the lower limit of DDP molecular mass for both DDFs, combined with the DDP Muegge's atoms (i.e. the total number of atoms of a molecule minus the total number of carbon and hydrogen atoms) for the latter (Table 1). Only about 32% of the u-cmcEOCs passed Muegge's atoms criterion (Table 2).
In general, only nine out of 627 u-cmcEOCs (SI 2; nos. 33, 99, 159, 165, 226, 233, 251, 571, 584) did not pass any of the DDFs under study, including variants, with the exception of the DDF Ro5 variant, where only three out of four criteria suffice, so that all u-cmcEOCs pass this latter DDF variant. In contrast, eight u-cmcEOCs (SI 2; nos. 8, 73, 371, 407, 490, 510, 543 and 560) passed all ten DDFs (variants) in our study, implying that these u-cmcEOCs passed the most stringent criteria of each DDP in our study.  www.nature.com/scientificreports www.nature.com/scientificreports/ Comparing EOCs in our sample with approved drugs in DrugBank. The results of the standard DDF analyses of the u-cmcEOCs in our sample and the u-cmcADs (Box 2) in DrugBank were compared (Tables 2 and 3). Except for the Muegge DDF, proportionally more u-cmcEOCs than u-cmcAD passed the individual standard DDFs ( Table 2), indicating that overall, EOCs meet the combined criteria of most DDFs at least as well as the drugs on the market. However, proportionally about six times more approved drugs (35.4%) passed all the DDFs compared to the u-cmcEOCs (6.2%). Nonetheless, about 94% of all u-cmcEOCs passed at least four out of six DDFs compared with only 67.5% of the approved drugs. It should be noted that a relatively large proportion of the approved drugs (13.8%) did not pass any of the DDFs (Table 3).
Some eo(c)s made it into DrugBank. DrugBank lists few u-cmcEOCs approved as drugs in at least one jurisdiction (SI 5). Eugenol for instance found routine use as a topical antiseptic in dentistry, as a counter-irritant and for pain control; it is the major EOC of the EO Syzygium aromaticum, also known as Eugenia caryophyllus. Menthol is used as a local anesthetic, has counter-irritant qualities, and relieves minor throat irritation; it is a major EOC of EO Mentha x piperita 62 . Moreover, none of the EOCs approved as drugs have ever been withdrawn (SI 5), though a limited number of u-cmcADs (n = 138; 5.8%) have been withdrawn to date. However, it is difficult to draw conclusions from this because the sample of EOCs approved as drugs is too small. We can, nevertheless, conclude that, once approved as a drug, EOCs have stood the test of time. In addition, of the 180 approved drugs in DrugBank for which no InChIKey could be defined, e.g. because they were (among others) complex mixtures of components, we identified at least seven (EOs of Eucalyptus, turpentine, sage, tea tree, Pinus mugo (needle), Curcuma aromatica (root), Atractylodes japonica (root).) EOs that are approved as drugs. None of these EOs have been withdrawn to date, although this does not imply their efficacy or lack of toxicity.

Discussion
A diverse set of 175 commercially available and chemically defined EOs from a multinational company specialized in scientific aromatherapy was selected for analysis. Possible advantages are that (i) they must meet a minimum number of (quality) requirements, and (ii) when they are purchased from the same reliable source, analysis procedures and handling of the EOs have been standardized where possible, hence minimizing variation. A possible disadvantage is selection bias i.e. the exclusion of non-commercial EOs, or bias against some EOs because they are e.g. toxic, insufficiently marketed, locally regulated, not available in sufficient quantities, or considered insufficiently interesting from the company's viewpoint. The quality control parameters of our set of commercial EOs (SI 6) coincide with previous findings where no distinction was made between commercial and non-commercial EOs 36,49,[63][64][65] . From a drug discovery perspective, however, the investigation of non-commercial, uncommon and toxic EOs merits attention as they are potentially an interesting source of possibly unknown and rare lead-and drug-like EOCs.
We found it useful to define the CMC of an EOC as its InChIKey-14 (Box 1); this permitted e.g. rapid and effective deduplication. This CMC contains in a coded manner essential information about the structure and composition of a molecule, but without data on isomerism (which are encoded by the remainder of the InChIKey). We are not aware of earlier uses of the CMC as we defined it, and believe that it may be useful for the chemoinformatic analysis of other compounds.
In this study, each component with an InChIKey-14 (n = 627) was considered an EOC. Two of these InChIKeys-14, however, belong to molecules that are not considered EOCs, but are commonly found in the EOs due to procedural contamination or fermentation, e.g. acetone (SI 2; no. 58) and ethanol (SI 2; no. 254). However, eliminating these two molecules would not have influenced our conclusions as their number is small (0.32%) compared to the entire sample.
discriminate drugs from non-drugs very well. The requirement for multiple pharmacophore points fits with the idea that these groups confer binding specificity. Although the clinical effects of EO(C)s often suggest sufficient therapeutic specificity, functional groups could be added to lead-like EOCs to increase their specificity for the desired molecular target. As these groups will also increase the molecular mass, this will also help to satisfy the second of Muegge's DDPs. Since most EOCs tend to be small, they provide ample room for adding functional groups before running foul of other DDP criteria based on molecule size. Therefore, we think that the Muegge DDF should not be a limiting filter for the evaluation of EOCs. The fact that relatively few EO(C)s made it into approved drugs could be due to their unusual properties. However, when these properties are benchmarked against various measures of drug-likeness, most EOCs pass with flying colors. For example, all u-cmcEOCs passed the Ro5 when only three of the four criteria had to be met. In addition, almost 94% of the EOCs passed at least any four out of six standard DDFs. Because DDFs are based on marketed drugs, it was expected that many approved drugs in DrugBank would pass most, but not necessarily all DDFs. The Lead Likeness and Ro3 DDFs, for example, were developed to search for lead molecules, and therefore not necessarily for marketed drugs that have already passed this development stage. Conversely, it was expected that most u-cmcEOCs would not pass all six DDFs.
One of the major drawbacks, however, in the transition of a natural compound from a hit to a drug is the increased amount of compound required, which often cannot be met by re-isolation from the relevant plant sources 34,66 . EO production and distribution, however, is a mature industry and EOs were the 446 st most-traded product in the world in 2017, with a total export value of 5.44 billion $ (SI 7) 67 . Therefore, if necessary, EO production can be relatively easily scaled up, with or without the use of biotechnology 68 , while medicinal chemists 69 can find a way to synthesize the EOC and derivatives thereof 70 .
In the end, this suggests that EOCs, are promising (sources of) new drugs and deserve more attention in the future. EOCs also have unique properties that might be useful for some therapeutic applications, e.g. for lung or airway diseases [71][72][73][74] , for transdermal administration 75,76 and diseases of the central nervous system [77][78][79] .

Materials and Methods essential oils (eos).
A set of 175 EOs, representing a cross-section of what is currently commercially available, were retained for further data analysis from a sample of 188 chemotypically defined EOs, obtained from Pranarôm International S.A. (Belgium). Chemical composition, quality and origin of the EOs were certified by the company. The reduction from 188 EOs to our final set of 175 is essentially due to deduplication 54 . EOs were considered different when originating from (i) other plant species, (ii) the same plant species but from other plant parts, (iii) the same plant species but a different chemotype, and (iv) certified-organic versus conventional cultivation. Chemical analyses of the EOs were performed by GC-MS-FID using the NF ISO 11024-1/2 standard (Pranarôm International S.A., personal communication). The detection of organophosphorus and organochlorine pesticides residue levels was in compliance with the relevant EU-legislation and maximum permitted levels were never exceeded 56 . The chemical composition (≥10%) and metadata of the EOs used in this study was reported previously; see www.nature.com/articles/s41598-018-22395-6 under the heading electronic supplementary material 54 . More detailed analysis certificates and the methodology used (in French) can be consulted at www.inula-group.com/fr/pranaquality (see also SI 1).

Data preparation, calculations and visualisation of the EO set.
Initially all GC-FID peaks ≥ 0.01% from the EO set (n = 175) were considered. After a preliminary evaluation, only peaks ≥ 0.10% were retained for further analysis because many GC-peaks < 0.10% were not or only partially identifiable. Subsequently, any EOC that was at least identifiable at its core molecular constitution (CMC; see also Box 1) was retained for further analysis, and a standard International Chemical Identifier (InChI) along with the corresponding hashed 27-character counterpart, i.e. InChIKey, was assigned using publicly accessible databases e.g. ChemSpider, PubChem or Chemistry WebBook 80,81 . Only the first 14 characters of the InChIKeys (InChIKeys-14) were retained of each EOC, thereby removing additional layers of information other than the CMC of the EOC (cmcEOC). After deduplication of the cmcEOCs (u-cmcEOCs), the unique InChIkeys-14 of the u-cmcEOCs were retained for further analysis. To display with Marvin the u-cmcEOCs molecular structures (SI 2), a structure-data file was created with ChemMine Tools using the u-cmcEOCs Simplified Molecular-Input Line-Entry System, a.k.a. SMILES, notation. To this end, the unique InChIkeys-14 was first complemented with an information-neutral second InChIKey block, i.e. UHFFFAOYSA, to re-establish a full InChIKey that was then translated with JChem (for Office) into SMILES 81,82 . RapidMinerStudio was used for data preparation and data blending.
Data preparation of, and calculations on, the DrugBank sample. The CSV-file (n entries = 2,594) 'approved' in the 'drug group' column was downloaded from DrugBank containing the names of all drugs that were once approved in any jurisdiction at any given time, and the structure information in the form of, e.g. InChI/InChI Key/SMILES for most of them (n entries = 2,414). All approved drugs with no structure information (n entries = 180) were initially not considered and therefore removed from the sample. Subsequently, the second block of the InChIKeys was removed, resulting in an InChIKey-14 for each drug. After deduplication, a total of 2,359 unique InChIKeys-14, corresponding to the unique CMCs (Box 1), of all approved drugs (u-cmcADs) in DrugBank were retained for further analysis.
Drug Discovery parameters (DDps). JChem (for Office) and Excel were used for (i) chemical database access, (ii) structure-based property calculations ( Fig. 2 and Table 2) and (iii) for searching and reporting the chemical structures i.e. u-cmcEOCs (SI 2). Briefly, to estimate the octanol/water partition and distribution coefficients of the EOCs, the consensus model of ChemAxon, based on the Viswanathan et al. 83  www.nature.com/scientificreports www.nature.com/scientificreports/ partition coefficient, P, only the un-ionized form was considered, whereas the distribution coefficient, D, also considers, if applicable, all charged forms of the molecule for a given pH; thus we obtained DDPs (i) Log P = log 10 (octanol/water partition coefficient) and (ii) Log D = log 10 (octanol/water distribution coefficient). (iii) The molar refractivity was calculated based on the atomic method described by Viswanadhan et al. 83 and to estimate (iv) the polar surface area of the EOCs, the topological polar surface area method as described by Ertl et al. 88 was used by JChem (for Office) 88 . (v) Muegge's atoms DDP is equal to total number of atoms of a molecule minus the total number of carbon and hydrogen atoms.
Drug Discovery filters (DDfs). JChem (for Office) and Excel were used for calculating the number of u-cmcEOCs and u-cmcADs that passed the different DDFs. The six DDFs supported by JChem (for Office) are referred to as standard DDFs (Tables 1 and 2). Three of the six standard DDFs each have two variants: (i) for the Ro5 DDF, three or four out of four criteria have to be met to pass this filter, and the latter more conservative variant was considered standard 87 . For the (ii) Lead Likeness and (iii) Veber DDFs, the combinations of DDPs supported by JChem (for Office) were considered standard, whereas the alternative combination of DDPs mentioned in the respective publications were considered non-standard variants (see also Tables 1 and 2) 22,87,89 . We added one DDF not included in JChem (for Office); it is derived from the Ro5 (i.e. the Ro3 DDF with the following DDPs: (i) log P ≤ 3, (ii) molecular mass ≤ 300, (iii) hydrogen bond donors ≤ 3, (iv) hydrogen bond acceptors ≤ 3 and (v) rotatable bonds ≤ 3) 24 . In all, we use 10 DDF(s) (variants) i.e. Ro5 (2 variants), Lead Likeness (2 variants