Advances in biocultural geography of olive tree (Olea europaea L.) landscapes by merging biological and historical assays

Olive tree is a vector of cultural heritage in Mediterranean. This study explored the biocultural geography of extra virgin olive oil (EVOO) from the cultivar Ogliarola campana in Campania region, Italy. Here, the rich cultural elements related to olive tree and oil represent a suitable case study for a biocultural analysis. We joined analytical techniques, based on stable isotopes and trace elements of EVOOs, with humanistic analyses, based on toponymy and historical data. In order to provide a science-based assessment of the terroir concept, we set up a new method of data analysis that inputs heterogeneous data from analytical and anthropic variables and outputs an original global evaluation score, named terroir score, as a measure of biocultural distinctiveness of the production areas. The analysis highlighted two distinct cultural sub-regions in the production area of Ogliarola campana: a continental cluster in the inner area of Irpinia and a coastal one around Salerno province. Finally, a biocultural map displays the diversity of heterogeneous variables and may support science-based decision making for territory valorisation. This novel biocultural analysis is a promising approach to substantiate the terroir concept with science-based elements and appears suitable to characterize local agri-food products with old tradition and historical data.


Introduction
In this document, we present a part of the study that addresses the biocultural diversity of extravirgin olive oil (EVOO) production areas of the Italian region Campania. This region, owing to its geographic and historical complexity, represents an ideal study area to develop a novel paradigmatic interpretation of the EVOO biocultural diversity, which considers the oil analytical features and the cultural and historical elements of the territory (Agnoletti, 2006). The olive cultivar Ogliarola campana is well represented in four EVOO Protected Designations of Origin (PDO) of Campania region: Irpinia-Colline dell'Ufita, Penisola Sorrentina, Colline Salernitane and Cilento (Di Vaio, 2012). Campania region has a structured EVOO value chain and the historical importance of olive oil production is well documented since the Roman Age, as reported by Pliny the Helder in the Naturalis Historia. Later, the Christian monasticism drove the transformation of uncultivated lands, reprising the olive cultivation after the instability due to the invasions and the wars of the Early Middle Ages (Dalena, 2010). In the medieval southern Italy, the strong influence of monasticism on the agriculture sector is testified by ancient sources and toponymy. Campania region is rich of toponyms related to agriculture, which form a complex "semantic network" revealed by their etymology (Aversano, 2001). The local toponyms with archaic origin reveal a background related to the late Latin or to the vernacular of the Early Middle Ages (Pellegrini, 1990;Pellegrini 2008), when the monasticism promoted the development of olive oil production in Campania. This rich geographic and historical information could aid in dissecting and interpreting the present biocultural diversity of EVOO landscape. The concept of biocultural diversity bases on the assumption of an inextricable link between biological (environment, fauna and flora) and cultural (history, religion, ethnics and language) diversity (Maffi, 2007). The UNESCO Florence Declaration (2014) recognizes the social importance of the biocultural diversity and recommends actions at regional level to implement certification and product labelling, and promote the competitiveness of local productions and rural landscape. However, due to their limited extension, the local olive production systems cannot be interpreted according to the standard requirements of the biocultural diversity concept, which includes variables of humanity richness (e.g., languages, ethnicity, religion, technology) at a large geographic scale (Loh and Harmon, 2005). On the contrary, the toponymy can characterize production areas at a small scale and reveal the diversity of the local intangible heritage in terms of ancient practices of local land use. The bio-environmental and cultural complexity of Campania region appears adequate to overcome the intrinsic limitations of biocultural studies at a small geographical scale (Roßmann, 2013) and to adopt an analytical-classificatory-semantic study of toponyms (Siniscalchi, 2008).

Methods
We set up a method to analyse toponymy and historical sources. Some olive-related elements were taken as biocultural indicators, assuming that they are preserved in a semantic frame. This method combines information from medieval sources and from toponymy, in order to characterize the contemporary biocultural landscape. Starting from the etymology of the word Ogliarola, we carried out a bibliographic search in the agricultural contracts of the Middle Ages within the current production areas of Ogliarola in Campania. However, in some cases, we examined the relevant terms occurring in the history of certain areas (e.g., Montis Corvini) that do not correspond to the current localization of olive groves. This historical period is testified by an abundant written documentation, which highlights a diversity of agricultural landscapes. The keywords and word roots utilized to investigate the toponyms of vegetation (phyto-toponymy) (Pellegrini, 1990) were related to Ogliarola (Ogli-) and to the olive plant (oliv-, olib-, olev-, etc.) (see the etymology by Rhizopolou, 2007). We also investigated the terms linked to practices of olive cultivation and oil production. For instance, the toponyms Zapino and Torchiara are referred to the orchard cultivation practices and to the oil extraction method, respectively. The historical search was limited to the sources dating back to the medieval period, from IX to XIV century Current Era (CE). A cultural-historical database was set up for each current production area of Ogliarola and allowed the count of the following elements, also included in the evaluation of the terroir score: • agriculture contracts, • historical toponyms, • historical places with past presence of olive orchards, • present toponyms, • medieval cities or stations (stationes), • medieval structures for care of poor people (hospitia et hospitalia).
We considered the present toponyms when referred to practices of cultivation specifically associated to olive orchards (e.g., Torchiara and Sanza, close to Salerno) but we ignored those related to generic agricultural practices (e.g., Pastena, from the Latin verb pastenare, which means to plough).
Furthermore, we took into account all the toponyms tightly linked to the cultivar Ogliarola, e.g., Civita di Ogliara (Irpinia) and Ogliara (North of Salerno).

Results and Discussions
We characterized the biocultural identity of the Ogliarola campana mono-varietal EVOO in four production areas of Campania region, according to the the biocultural postulate (Maffi, 2007), which integrates science and humanities aspects of a whole. Therefore, we analysed the cultural and historical elements from the Ogliarola production zones.
We started with the etymological analysis of the word "Ogliarola", which was not previously reported in the specialized literature (Pellegrini, 1990;Pellegrini 2008). Its origin could be explained by five hypotheses: 1) palatalization of the original term "druppa oleariola", meaning "little olive fruit for oil production"; 2) the Latin olea, which is the ancient name of the olive tree (Gledhill, 2008), whilst oleum is the oil, -ara indicates a collective and -ola is a diminutive; thus, the overall etymological meaning of Ogliarola could be "little plantation of olive"; 3) the dialectal word ugghialoru, which indicates a "little pot usually composed by polished or tinned terracotta, used to contain oil for daily consume" (Cortelazzo and Marcato, 2005); 4) a woman working in the olive mill (oliaria or olearia) is nicknamed oliarola in an ancient document (Di Muro, 2005); 5) oliarolo was the name given to the area around Bari in the XV century, with reference to a district specialized in olive oil production, with relevant olive cultivation and commerce. All the five hypotheses about Ogliarola etymology equally suggest the existence of a well-established olive oil supply chain in the medieval Campania region. The etymology analysis substantiates the biocultural approach and provides semantic tools for the search of historical terms by appropriate keywords.
The historical and toponymy analysis focused on the medieval period (IX to XIV century CE).
The toponym search identified a total number of 151 terms with semantic correspondence to Ogliarola and olive cultivation: 49 terms in Irpinia zone, 48 terms in Colline Salernitane, 30 terms in Cilento, and 24 terms in Penisola Sorrentina (Tab. S1). The highest number of results was retrieved in the category agricultural contracts (46), while much less elements were found in the categories historical toponyms and Hospitia et hospitalia, with only 10 and five occurrences, respectively. The four productions zones were heterogeneous in the partitioning of the six categories. In Irpinia, Penisola Sorrentina, and Colline Salernitane, the richest results were in agriculture contracts and historical places, whereas in Cilento the most represented terms were in present toponyms and stationes. The relatively high number of agricultural contracts associated to olive groves of the Early Middle Age indicates an ancient and solid tradition of olive cultivation in Campania region. This information is relevant for the cultural geography of the present olive cultivation. The period from the IX to the XIV century was highly dynamic in this region, which was subjected to different dominations. In such a troubled historical frame, the development of notary deeds about properties, including olive orchards, represented a social warranty for the land owners (Di Muro, 2013). The number of agricultural contracts found in the medieval Codex was particularly elevated in Irpinia and Colline Salernitane, the areas that currently produce most of Ogliarola. These areas also presented a high number of present toponyms and historical places quoted in the historical sources, testifying the relevance of olive oil production in the past.
The search for toponyms also provided remarkable elements related to Ogliarola campana, suggesting that this cultivar is a vector of heritage for the biocultural geography of olive cultivation. The relatively high number of toponyms found in this small area constitutes a semantic network related to Ogliarola and olive tree (Aversano, 2001). We found only two toponyms matching the keyword "Ogliarola", Civita di Ogliara in Irpinia and Ogliara in Colline Salernitane. Civita di Ogliara is the most important toponym corresponding to Ogliarola found within the semantic network. The toponym Civita di Ogliara identifies an archaeological site located in Serino (lat. 40.83°N; long. 14.90°E; elev. 615 m a.s.l.). Here we report the key facts about the history of this site. Civita di Ogliara was built in 839 CE as defensive structure, during the war that contrasted the Longobardian Principality of Count Siconolfo and Count Radelchi (Baranowski, 2002). This political occurrence strongly affected the biocultural evolution of Campania region in the next centuries. Civita di Ogliara also represented an outpost along a commercial route in the Longobardian Age and the history of the local routes is relevant to understand the past evolution and the present geography of agricultural land use, including olive cultivation.
Contrary to the rich documentation about Civita di Ogliara, the toponym Ogliara found in Colline Salernitane matches just few historical references. Although the origin of this settlement is Etruscan (Greco Pontrandolfo, 1980), the tradition of olive cultivation in this area is relatively recent and is not addressed to any medieval origin (De Crescenzo, 1949). The interpretation of old regional maps helped reconstructing the history of the two toponyms Ogliara found in our survey. Giustiniani (1805) reported the term Ogliara in Irpinia only. Nevertheless, olive plantations were not present in the ancient site of Civita di Ogliara, but rather oak, beech and chestnut woods characterized the area, as they currently do (Bellabona, 1642). Therefore, we can hypothesize different origins for the two toponyms Ogliara. In Irpinia, Civita di Ogliara would be related to a well-established olive oil supply chain in the past, particularly addressed to marketing and trade. In contrast, the toponym Ogliara found in Colline Salernitane would not represent a trace of an ancient olive oil value chain but it is likely related to a recent land use change in favour of olive orchards.
The Middle Age documents referring to the old civilizations and their socio-political dynamics are a relevant source to understand the history and geography of olive cultivation. In Campania, coastal olive groves are widely spread and related to the history of diverse civilizations: Byzantines, Ecclesia and Normans (Dalena, 2010). In the transition from Roman to Middle Age, olive oil represented a symbol of the Roman's cultural heritage. As such, it was preserved in its fundamental nutritional and commercial values by the new Longobardian dominators in Southern Italy. Furthermore, they extended the use of olive oil to the liturgical and religious ceremonies, in addition to its traditional role in food preparation (Montanari, 2007). In the last part of the historical period investigated (XIII century CE), the political power shifted from the Longobardian to the Christian Church Ecclesia salernitana (the synod of the salernitanae diocesae). As a consequence, the ancient route linking the neighbouring Principalities of Benevento (inner mountain area) and Salerno (coastal area) decreased in importance, with a relocation of the route towards Northwest, to cover other areas of interest (Di Muro, 2013). This change was coincident with the development of new olive cultivations by the Benedectine monks of Cava de' Tirreni (Dalena, 2010), the abbey that provided the Codex Diplomaticus Cavensis (CDC).
The analysis of historical elements included cities and localities that in the past were known as Stationes and hospitia et hospitalia. The stationes, evolution of the Roman mansiones, were the posts supporting the connectivity among cities and villages, where travellers found rest and services (Augenti, 2006). Hospitia et hospitalia, known as xenodochia in the Roman and Early Middle Ages, were structures dedicated to the care of poor and infirm people (pauperes) (Dalena, 2003). The historical data related to these "social" variables can provide useful information about the flux of people, including pilgrims, armies, merchants and travellers. The routes (itinera) were well-established since the Roman Age, often indicated by maps (i.e. the Tabula Peutingeriana) and chronicles. The similarity in the number of Hospitia et hospitalia indicates similar social relevance of these structures among the four production zones of this study. Indeed, the high number of stationes in Cilento could be explained by the wildness of that area in the Middle Ages. This is in agreement with the prevalence of modern, rather than ancient toponyms and places, found in the same area (Tab. S1).
The multivariate analysis by PCA summarized the historical information and provided a global description of its diversity (Fig. S1). The first two dimensions of PCA explain a cumulative 82.6% of the total variance. Therefore, they largely describe the data variability. The score plot of samples ( However, the number of hospitia et hospitalia was very limited (1 or 2) for all the 4 territories. Therefore, the observed variation pattern of of this variable is not robust against small random fluctuations. Overall, the multivariate PCA method provides an overall statistical description of qualitative data of historical, geographic and cultural nature, and clustered Irpinia and Colline Salernitane, according to their similar features, whereas Penisola Sorrentina and Cilento were well separated. The PCA also highlighted the agricultural contracts as a very informative variable, which well describes the intangible heritage in the area of production influenced by cultural diversity. The analysis of toponyms and historical information suggests that Irpinia and other areas in Campania differ according to the succession of religious and political powers, which left different cultural footprints and affected the history and geography of olive cultivation. In combination with physical and analytical features (elemental composition and carbon stable isotope ratio of EVOOs), the historical and geographical variables contribute to characterize the identity and geography of local olive oil production areas. Table S1. Historical analysis on four production areas of the olive cultivar Ogliarola campana in Campania region.
Results of search for keywords related to Ogliarola and olive cultivation. In short, the algorithm goes as follows; for each area to be evaluated: • The anthropic variables are scaled to 0…1 dividing the area value by the maximum one. Zero means no observation at all and one represents the greatest number of observations.
• The physical variables are used to derive the associated information content, or syntropy, a sort of reverse-entropy 2 that measures how typical the value is in a certain area with respect to the general distribution of the variable across all the areas. Also these values are scaled according to the maximum one. Any area has an associated distribution of values that can be associated to an entropy. Zero means a totally uninformative distribution of the variable, while one is associated to the highest level of typicality.
• All the transformed values are placed along the arms of a radar diagram, as in figure S2. The border of the plot is a regular polygon with a number of sides equal to the number of variables.
The area of the polygon (shaded in fig. S2) is taken as the measure of the score. Zero means a totally shrunk area, while one is associated to the top value of all the variables 3 • The score is finished with a small logarithmic transformation of the area value, 4 compensating for some saturation of the highest values, followed by a multiplication by 100. The integer part is the terroir score. Of course, zero means no significant terroir-related anthropic-historical 2 See the seminal paper C. E. Shannon: A mathematical theory of communication, Bell System Technical Journal, 27:379-423 (1948); for an updated introduction to the subject see R. Gray: Entropy and Information Theory, Springer (2011). 3 The diagram perimeter gives some information, too: a small perimeter is associated with low variables values but a large perimeter can be associated to a large area or to an irregular shape as well, so that the perimeter alone is not enough, but it can be useful for a finer analysis of the terroir score. The area-to-perimeter ratio, for instance, is a measure of regularity of the shape, which can be interpreted as a robustness indicator of the index. Notice, however, that the maximum perimeter is not constrained to 1, using as a scaling factor the perimeter of the regular N-polygon. 4 About 0.1 at most, on a 0…1 scale, thus promoting the lower-middle scores up to 10 points, keeping unchanged the lowest and highest scores. Figure S2 observations and no typical values of the physical variables (i.e. maximum entropy), while the maximum score is only associated with a top placement of the area for every variable. The scores can be shown along the accompanying radar plots as in figure S3, which depicts an example of seven variables for three areas.

Terroir score algorithm
This rather technical section contains all the mathematical details of the algorithm. The procedure can be split into two steps: 1.Variables pre-processing and 2.Score evaluation. The accompanying source file radar.py contains a Python implementation of the score evaluation part.
Let's suppose that we have K geographical areas A1 … AK. For any area Ak, k = 1… K, we have M anthropic variables Xm whose values are ! " # and N physical variables Yn, whose distributions are $ %  where m(Ak) is the measure of the weighting parameter. 5 The rescaling then proceeds as before.

1B. PRE-TREATMENT OF THE PHYSICAL VARIABLES
This part of the procedure transforms a distribution in a variable in the 0…1 range. It is the most involved part of the procedure, that can be implemented in any statistical package or spreadsheet, according to the user's taste. For any variable Yn, let's consider the set of all the observations of the variable in all the areas: this is a set of . / single observations from which we derive a distribution according to the four main quantiles: 25%, 50% (the median) and 75%. This division into four classes is a matter of convenience, it has been chosen since the total number of observations . / can be quite small, so the use of finer classes leads to empty or singly-populated bins of the distribution histogram.
For the sake of generality, said H the number of bins of the histogram, every distribution $ % # has an associated set of frequencies $ % # = () %,+ , # -→ (C %,9 # , C %,D # , ⋯ C %,F # -from these we evaluate the corresponding Shannon entropy for the n th variable in the k th area as 6 The base of the logarithm is not really important, since the resulting values will eventually be rescaled. 6 Note that the expression for the entropy is well-defined even if one has one or more empty histogram classes, posing 2 log 2 = 0, as customary, since lim X → Y Z 2 log 2 = 0. Care should be taken, however, not trying to evaluate log 0 directly, especially in a spreadsheet. Some programming languages can handle infinities, but not indeterminate forms like 0 • ∞. 7 See Gray 2011 (cit.)

SCORE EVALUATION
To each area Ak we associate a vector of w k of Q = M + N variables This "gluing" is algebraically consistent since all the quantities involved have homogenous values in the [0,1] closed interval. Once an order for the variables is chosen, the ws are plotted along the arms of a radar diagram, as in figure S4.
The radar area is evaluated as the sum of all the triangles which sides form a couple of contiguous variables, split apart by an angle of Dd . In general, for each w k the corresponding radar area Rk is Where PQ is the area of the regular polygon with N sides. The division by PQ rescales the value of Rk within the [0,1] interval. 8 The previous formula is based on the summation of the areas of the triangles that constitute the radar polygon; in fact, the product of two adjacent the sides lengths times the sine of the angle in between (i.e. their vector product) is twice the area of the triangle.
There is a subtle point, though. The Rk value thus defined depends on the order of the variables in the plot: a different arrangement of the variables gives a different estimate of Rk. To correct this biased estimate one must evaluate Rk considering all the possible arrangements of the variables, retaining the maximum, 9 although the radar plots are presented with the same variable order for clarity. The unbiased Rk is: where we have condensed the common factors into C`= 9 D n t Ugh Dd , which is common to all the Rk.
In fact, this fQ is a function of the number of variables Q only.
Finally, the score for the Ak area is defined as follows: The logarithm transforms the interval [0,1] in itself, with a slight amplification of the smaller values at the expense of those closer to 1. This small correction has been introduced to compensate for the accumulation of observations, promoting the areas with a smaller, but not zero, value of R. The factor 100 simply scales the score in the 1…100 range for ease of reading. 9 It is also possible to take the average value of e < # over all the permutations, or any other correction that takes into account all the possible arrangements of the variables, as long as it is applied coherently to all the areas.

Score Sensitivity Analysis
To assess the score reliability, we conducted a sensitivity analysis with an ensemble of gaussian perturbations to a set of simulated measurements. The ensemble is composed as follows: • we consider up to 8 zones, Nz = 2 … 8 • we consider up to 8 variables, Nv = 3 … 8 • for each Nz, Nv pair we consider 10 randomly generated measurements, with the variables uniformly distributed in the interval [0, 1] • for each measurement we consider 1000 gaussian perturbations, 100 each for σ in the interval [0.1, 1.0] in 0.1 increments; µ is taken as the measurement value the ensemble is thus composed of (8-2+1) × (8-3+1) × 10 × 10 × 100 = 420 000 elements.
For each perturbation, we accumulate the total squared score residuals, from these we computed the average score fluctuation per zone, as a measure of the score reliability.
Our results show a general decreasing trend of the score fluctuation with increasing number of zones and variables, in particular (figures S6 and S7), for our case study Nz = 4 and Nv = 8: in this case the distribution of the score fluctuations is: • first quartile: score fluctuation = 2 • median: score fluctuation = 2 • third quartile: score fluctuation = 4 the distribution is left-skewed, as it is apparent in figure S6.
These values suggest, as a conservative approach, that difference of scores of about 5 is significant within the sensitivity of the proposed score.
We include the python code of the ensemble sensitivity test.