A global dataset of experimental intercropping and agroforestry studies in horticulture

Intercropping and agroforestry systems have been increasingly well studied and documented. Yet, so far, no dataset has provided a systematic synthesis of existing data on intercropping experiments in the specific field of horticulture. A systematic literature search was carried using search terms and applied to Web of Science. The resulting dataset includes data from field experiments published in 191 articles covering experiments worldwide, between 1982 and 2022. The selected experiments cover five continents and involved 118 different crop species. Through manual extraction of information from publications, the dataset includes (i) general information on the articles; (ii) experimental site soil and climate conditions; (iii) descriptions of intercropping designs; (iv) crop management practices; (v) measurements of sole crop and intercrop yields and (v) Land Equivalent Ratios. The dataset is arranged in an easily reusable spreadsheet with columns as variables (n = 45) and rows as treatment (n = 1544). The dataset is freely reusable and updateable. We expect that it will provide valuable information for statistical analysis, modeling and innovative farming system design based on intercropping.


Background & Summary
In the context of global changes, current agriculture is facing major challenges.With stagnating yields 1 , declining agricultural land area, and a growing food demand 2 , world food systems will have to find new ways to increase production 3,4 .In addition, environmental issues are becoming increasingly important, and expanding biofuel use will dramatically increase the pressure on global agriculture 5,6 .In short, we need to produce more, in a more ecological way, and with less available land, which would imply major changes in the way we produce and consume food.
In this context, intercropping (i.e. the simultaneous growth of two or more crops in the same field area) and agroforestry systems (i.e. the simultaneous growth of trees and crops) have gained attention worldwide and appear to be a promising model of ecological intensification to produce more with lower environmental impact 7,8 .Research has often highlighted the benefits of intercropping and agroforestry systems (IAS) for their positive effects on productivity 9,10 , better use of biotic and abiotic resources [11][12][13] , enhancing soil fertility and nutrient cycling 14 , or controlling pests and diseases 15,16 .In the specific case of horticulture, defined here as including fruit and vegetable crops, data on the effects of crop association is however missing.Indeed, given the high variety of crop species compared to other production systems, the number of possible intercropping systems is exponential, while research studies remain patchy and heterogeneous.
Numerous field trials conducted over the past 40 years have evaluated the agronomic performance of intercropping systems including horticultural species.Depending on the choice of species, climatic factors, soil types or management practices, these performances differ between sites and growing seasons.Yet, so far, no existing dataset has provided a systematic synthesis of existing data on IAS experiments in the specific field of horticulture.Providing agronomic scientists with such a dataset on IAS over a wide range of environments would make it possible to assess intercrops regarding their capacities to maintain and improve the productivity of agricultural land.
In this paper, we present a global dataset based on results from field intercropping experiments including 118 crop species worldwide, established between 1982 and 2022.Experimental data were extracted from 191 published articles.In total, 1544 experiments were collected across 19 Köppen-Geiger 17 climatic zones in 45 countries over five continents (Fig. 1).Through manual extraction of information from publications, the dataset includes (i) general information on the articles; (ii) experimental site soil and climate conditions; (iii) descriptions of intercropping designs; (iv) crop management practices; (v) measurements of sole crop and intercrop yields and (v) Land Equivalent Ratios (Table 1).

Methods
Literature search.A systematic literature search was carried out for articles experimenting intercropping that include horticultural crops.Although the term "horticultural" may include ornamental, aromatic or medicinal plants, we limited our scope of investigation to fruits and vegetables crops.The literature search was carried out in October 16, 2023, on papers published up to and including the year 2022, on the Web of Science search engine.The search equation was as follows: TS = ((intercrop* OR inter crop* OR agroforest* OR agro-forest* OR "agr*s*lv*cult*" OR agrihortisilvicult* OR "woody polycultur*" OR "mixed crop*" OR "alley crop*" OR "home garden*" OR "forest garden*" OR "multilayer tree garden*" OR "fruit-vegetable crop*") AND (fruit* OR orchard* OR vegetable* OR legume* OR "market garden*" OR horticultur*) AND (LER OR "land equivalent ratio" OR yield* OR "agronomic performanc*" OR productivity OR profitability)).The literature search identified 3043 articles as of potential interest.In addition, other articles were identified through others sources (e.g.references cited in the selected articles) and were included if they were relevant to the same criteria.Each article was examined to determine if it met our inclusion criteria.The criteria used to include papers in our corpus were: (1) article title or abstract reporting at least one horticultural crop (fruit or vegetable) and not more than two crops, grown in intercropping and as sole crops (N.B. for crops that can be considered both horticultural and field crops (e.g.maize), we verified that the crop was associated with a horticultural crop, to avoid including field crop studies, a list of unique intercropping systems included in the database is provided in the data repository); (2) article title or abstract reporting at least one experiment conducted with yield and/or LER data collected; (3) article published in a peer-reviewed journal; (4) article written in English and (5) full-text article available in open access or through author's institutional access.From the 534 full-text articles that met these first criteria, the eligible articles were then screened according to additional criteria: (6) full-text article reporting raw data not duplicated in other articles; (7) full-text article reporting total and/or commercial yield for crops as intercrops and as sole crops, or land equivalent ratios.In addition, the CRAAP 18 framework (Currency, Relevance, Authority, Accuracy and Purpose) was applied to check the quality of papers to be included in the database.The paper selection process is reported in the PRISMA 19 diagram (Fig. 2) and the PRISMA checklist was used to report the systematic review.We finally ended up with 191 full-text articles  that met all seven criteria, published between 1982 and 2022.
Data extraction and collection.Data was extracted from tables, graphs or text.Values reported in graphs were digitized manually with the WebPlotDigitizer application (https://automeris.io/WebPlotDigitizer/).When data were not reported for some variables (e.g., Land Equivalent Ratio), we systematically recalculated data from related variables in order to retrieve the missing data.Data are recorded as a CSV-formatted file.This format is widely supported by spreadsheets and enhance data interoperability for scientific applications.The different field site * years * treatment * control combinations are presented in rows, including the row header for the names of variables.All years of data were extracted corresponding to the same control-treatment pairs over different years.Columns represent all variables collected for each treatment.Data collected are grouped into 5 variable groups.Table 1 presents an extract of the variables collected from the articles; the full table is presented in the data repository.

Data Records
The data are accessible on the Data INRAE repository 211 , available at https://doi.org/10.57745/HV33V1.It includes the following files: 1. "Database.csv"includes the data in ".csv" format.Experimental site soil and climate conditions.'Country', 'Latitude', 'Longitude' and 'Geocoordinates' variables.These variables present site-related information: country where experiments were carried out, latitude and longitude (in decimal degrees) of experimental sites.The lat-long coordinates were either extracted directly from the paper, or estimated based on site name identified in Google Maps (https://maps.google.com/).The 'Geocoordinates' variable reports whether the lat-long coordinates were exact data directly extracted from the paper or estimated through Google Maps.
Intercropping design descriptors.'Intercropping_design'.This variable presents in which way both species were intercropped (Fig. 3  component.As a result, the density of each component is less in the mixture than in its pure stand, but the total stand density is the same in the mixture as in each pure stand 17 .II.Additive design: In the standard additive design, intercropping systems are formed by adding plant densities of respective pure stands.As a result, the total density is greater in the intercropping than in the pure stand, but the density of each component is the same in the mixture as in the pure stand. 'Intercropping_pattern'.This variable presents in which way both species were spatially sown 212 (Fig. 4): (i) Row intercropping, where two plant species are cultivated in separate alternate rows; (ii) Strip intercropping, where several rows of a plant species are alternated with one or several rows of another plant species (one strip includes more than one row), (iii) Mixed intercropping, where the component crops are planted simultaneously within the same row or without a distinct row or strip pattern, (iv) Agroforestry.All the agroforestry systems were alley cropping systems 213,214 .
crop and yield variables.Since the intercropping experiments include two species, the demonstration is made for one specie (called Crop_1) and is replicable for the second specie (Crop_2).
'Crop_1_Common_Name' and 'Crop_1_Scientific_Name'.These variables give species scientific and common names.The scientific name of each species was related to the common name listed in the United States Department of Agriculture Plants Database (http://plants.usda.gov/java/), to avoid confusion due to the use of different common names for the same species (Table 2).'Yield_unit' and 'Yield_measure'.The variable Yield_unit gives the yield unit as provided by authors.Although it is mostly Kg per hectare or ton per hectare, it can sometimes be more specific (e.g.ton per feddan) or even not normalized (e.g., Kg per plant).The variable 'Yield_measure' indicates what type of yield was recorded (e.g., grain yield, dry weight, marketable yield, etc.).
'LER_crop_1', 'LER_tot', 'LER_crop_1_calc' and 'LER_tot_calc'.The variable 'LER_tot' is the total Land Equivalent Ratio, which is the sum of partial LERs 'LER_crop_1' and 'LER_crop_2'.The partial and total LERs were reported by the raw value provided by the paper, but were also recalculated ('LER_crop_1_calc' and 'LER_ tot_calc') according to Eq. 1).The Land Equivalent Ratio is a widely used indicator to assess intercropping performances 215 calculated as follows: where LER tot_calc is the calculated total Land Equivalent Ratio, LER crop n is the partial LER of crop n, Y n is the yield of crop n in intercropping and S n is yield of crop n as sole crop.

technical Validation
The database only contains works that have been published in peer-reviewed journals.The same individual thoroughly reviewed each publication to assess its eligibility and the reliability of data.In total, 122 (64% of the total number) papers were checked by at least two different readers to avoid any errors.All co-authors participated in the selection and reading of the papers.A kappa test was performed between the two main contributors to test interrater reliability 216 .From a sample of 101 randomly selected papers, we obtained a kappa score of 0.839, which is considered as strong agreement.For numerical data in the text or in tables, the values provided are directly from the primary data, for figures, WebPlotDigitizer (automeris.io/WebPlotDigitizer/)program was used, allowing for a semi-automatic and more precise extraction of data presented in figures.During data extraction, outliers were routinely and manually examined for possible errors by recalculating the data (Eq.1).We checked the data's validity as many times as necessary by going back to the source publications.We deleted studies when the meaning of the data reported in the articles was unclear (e.g., no unit on yield data, no consistency in the names of the different treatments or too approximate description of the protocol that made it impossible to identify the different treatments with certainty).Once the dataset was constructed, we checked the qualitative and quantitative content of all the continuous, categorical and binary variables (Table 1).We checked for variable format, range, factor levels, uniqueness and valid/missing observations information.The check was carried out by two of the co-authors.Finally, variable attributes were checked by a visual assessment of the summary statistics and data distribution for each variable in turn with the summarytools R package (v.0.6.5).A summary table is provided in the data repository.

Usage Notes
The dataset is based on a collection of experimental data published in 191 articles between 1982 and 2022.It is to our knowledge the first and most exhaustive dataset on intercropping and agroforestry studies exclusively in horticulture.We identify four potential uses of the dataset that could benefit alternatively researchers in agronomy and ecology, agricultural advisors or farmers.First, the dataset can be analyzed to evaluate the agronomic performance of a wide variety of fruit and vegetable species in intercropping.We can already anticipate that this dataset can be used to identify the best performing species or to better understand the effect of variables of interest: environmental variables (climate, soil), intercropping design (row, strip, replacement, etc.), management practices (fertilization, irrigation, etc.).The data provided could be particularly suitable for a meta-analysis of variance 217,218 , or as input data for intercrop models 219,220 .
Second, a more exploratory approach may be used with our dataset.Figure 6 illustrates the geographical differences in intercropping studies by displaying regional networks (subsets of the dataset) of species pairs.This data may be useful to solve local problems, target the right species under specific climatic and soil conditions, and determine which species allow for accurate comparisons and which have insufficient information.Also, the database can serve as a tool for comparing assessments of species production in contexts with biotic and abiotic yield-limiting stresses in the context of climate change 221 .
Finally, the location of the trials was systematically reported through latitude and longitude coordinates.This makes it simple to connect our database to other large-scale databases, such as soil (e.g.data.isric.org/),climate (e.g.worldclim.org)or agroecological zone classifications (e.g.gaez.fao.org/).The dataset is freely reusable and easy to update.The data file is structured in such a way that it is straightforward to add new studies from alternative scientific databases, grey literature, or new variables to be investigated to extend the range of possible applications of our dataset.Fig. 6 Regional networks of horticultural species included in the database.The regions considered are regions defined in the World Bank Development Indicators.The edges (links) between two species represent a pair of species grown in intercropping experiments.Edges width is proportional to the number of field experiments.Nodes with similar colors are crops from the same botanical families.

Fig. 1
Fig. 1 Geographical distribution of sites and number of experiments included in the database.The Köppen-Geiger climatic classification was used to link each field site to a grid size with a resolution of 0.50 degrees of latitude by 0.50 degrees of longitude.The five main Köppen-Geiger climatic zones are represented by acronyms beginning with the letter A (tropical), B (arid), C (temperate), D (continental), and E (polar).Within each climatic zone, each Köppen-Geiger climatic subzone is indicated by a color gradient.

3 .
"Summary of the database.csv"includes a summary of the dataset (meta-data), presenting the name, the definition, the unit and the availability of all extracted variables 4. "List of references.pdf", presents all the references of the publications included in the dataset 5. "Paut_data_paper_figures_code.Rmd" includes an R script to generate figures.6. "Classification of F&V.xlsx" includes a fruit and vegetable classification table used in the R script 7. "List of unique intercrops.xlsx"presents a list of the unique intercropping systems included in the database General information on the articles.'Title', ' Authors', 'DOI' and 'year_of_publication' variables.These variables include basic information allowing to easily retrieve the articles integrated in our database: full title, Authors, DOI (Digital Object Identifier), year of publication.
): I. Replacement (or substitutive) design: In the standard replacement design, intercropping systems are formed by replacing a given number of plants of one component by the same number of the other

Fig. 2
Fig. 2 PRISMA diagram of paper selection through the different phases of the systematic literature review.The number of records identified, included and excluded, and the reasons for exclusion are indicated between parentheses.

Fig. 3
Fig. 3 Planting arrangements for pure stand of crop A (•) and crop B (○) for replacement and additive designs (adapted from Snaydon 1991 222 ).

Fig. 4
Fig.4 Planting arrangements for pure stand of crop A (•) and crop B (○) for row, strip and mixed design, and for the specific case of agroforestry with trees (▲).

Figure 5
represents the partial Land Equivalent Ratios of the most represented crops grouped by (a) crop and (b) crop botanical family.

Fig. 5
Fig. 5 Distribution of partial Land Equivalent Ratios (pLER) for the most represented species in our database (a) and for the species grouped by botanical family (b).The median is represented by vertical bars inside boxes.Box edges indicate first and third quartile, whiskers indicate minimum and maximum values.The dashed line represents a pLER of 0.5.A pLER Value greater than 0.5 indicates a yield advantage for the intercrop compared to sole crop.The number indicated for each modality is the total number of observations and the number between parentheses is the number of articles.Species and botanical families are ranked in descending order of median values.Modalities with less than 20 occurrences are not plotted.

Table 1 .
Extract from the description and definition of variables included in the dataset.Total number (percentage) of available and missing data for these attributes across all treatments.The full table is available in the data repository.

Table 2 .
Summary table of most common species included in the database.Data with species with less than 20 entries are not shown.