Background & Summary

In the context of global changes, current agriculture is facing major challenges. With stagnating yields1, declining agricultural land area, and a growing food demand2, world food systems will have to find new ways to increase production3,4. In addition, environmental issues are becoming increasingly important, and expanding biofuel use will dramatically increase the pressure on global agriculture5,6. In short, we need to produce more, in a more ecological way, and with less available land, which would imply major changes in the way we produce and consume food.

In this context, intercropping (i.e. the simultaneous growth of two or more crops in the same field area) and agroforestry systems (i.e. the simultaneous growth of trees and crops) have gained attention worldwide and appear to be a promising model of ecological intensification to produce more with lower environmental impact7,8. Research has often highlighted the benefits of intercropping and agroforestry systems (IAS) for their positive effects on productivity9,10, better use of biotic and abiotic resources11,12,13, enhancing soil fertility and nutrient cycling14, or controlling pests and diseases15,16. In the specific case of horticulture, defined here as including fruit and vegetable crops, data on the effects of crop association is however missing. Indeed, given the high variety of crop species compared to other production systems, the number of possible intercropping systems is exponential, while research studies remain patchy and heterogeneous.

Numerous field trials conducted over the past 40 years have evaluated the agronomic performance of intercropping systems including horticultural species. Depending on the choice of species, climatic factors, soil types or management practices, these performances differ between sites and growing seasons. Yet, so far, no existing dataset has provided a systematic synthesis of existing data on IAS experiments in the specific field of horticulture. Providing agronomic scientists with such a dataset on IAS over a wide range of environments would make it possible to assess intercrops regarding their capacities to maintain and improve the productivity of agricultural land.

In this paper, we present a global dataset based on results from field intercropping experiments including 118 crop species worldwide, established between 1982 and 2022. Experimental data were extracted from 191 published articles. In total, 1544 experiments were collected across 19 Köppen-Geiger17 climatic zones in 45 countries over five continents (Fig. 1). Through manual extraction of information from publications, the dataset includes (i) general information on the articles; (ii) experimental site soil and climate conditions; (iii) descriptions of intercropping designs; (iv) crop management practices; (v) measurements of sole crop and intercrop yields and (v) Land Equivalent Ratios (Table 1).

Fig. 1
figure 1

Geographical distribution of sites and number of experiments included in the database. The Köppen-Geiger climatic classification was used to link each field site to a grid size with a resolution of 0.50 degrees of latitude by 0.50 degrees of longitude. The five main Köppen-Geiger climatic zones are represented by acronyms beginning with the letter A (tropical), B (arid), C (temperate), D (continental), and E (polar). Within each climatic zone, each Köppen-Geiger climatic subzone is indicated by a color gradient.

Table 1 Extract from the description and definition of variables included in the dataset.

Methods

Literature search

A systematic literature search was carried out for articles experimenting intercropping that include horticultural crops. Although the term “horticultural” may include ornamental, aromatic or medicinal plants, we limited our scope of investigation to fruits and vegetables crops. The literature search was carried out in October 16, 2023, on papers published up to and including the year 2022, on the Web of Science search engine. The search equation was as follows: TS = ((intercrop* OR inter crop* OR agroforest* OR agro-forest* OR “agr*s*lv*cult*” OR agrihortisilvicult* OR “woody polycultur*” OR “mixed crop*” OR “alley crop*” OR “home garden*” OR “forest garden*” OR “multilayer tree garden*” OR “fruit-vegetable crop*”) AND (fruit* OR orchard* OR vegetable* OR legume* OR “market garden*” OR horticultur*) AND (LER OR “land equivalent ratio” OR yield* OR “agronomic performanc*” OR productivity OR profitability)). The literature search identified 3043 articles as of potential interest. In addition, other articles were identified through others sources (e.g. references cited in the selected articles) and were included if they were relevant to the same criteria. Each article was examined to determine if it met our inclusion criteria. The criteria used to include papers in our corpus were: (1) article title or abstract reporting at least one horticultural crop (fruit or vegetable) and not more than two crops, grown in intercropping and as sole crops (N.B. for crops that can be considered both horticultural and field crops (e.g. maize), we verified that the crop was associated with a horticultural crop, to avoid including field crop studies, a list of unique intercropping systems included in the database is provided in the data repository); (2) article title or abstract reporting at least one experiment conducted with yield and/or LER data collected; (3) article published in a peer-reviewed journal; (4) article written in English and (5) full-text article available in open access or through author’s institutional access. From the 534 full-text articles that met these first criteria, the eligible articles were then screened according to additional criteria: (6) full-text article reporting raw data not duplicated in other articles; (7) full-text article reporting total and/or commercial yield for crops as intercrops and as sole crops, or land equivalent ratios. In addition, the CRAAP18 framework (Currency, Relevance, Authority, Accuracy and Purpose) was applied to check the quality of papers to be included in the database. The paper selection process is reported in the PRISMA19 diagram (Fig. 2) and the PRISMA checklist was used to report the systematic review. We finally ended up with 191 full-text articles20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210 that met all seven criteria, published between 1982 and 2022.

Fig. 2
figure 2

PRISMA diagram of paper selection through the different phases of the systematic literature review. The number of records identified, included and excluded, and the reasons for exclusion are indicated between parentheses.

Data extraction and collection

Data was extracted from tables, graphs or text. Values reported in graphs were digitized manually with the WebPlotDigitizer application (https://automeris.io/WebPlotDigitizer/). When data were not reported for some variables (e.g., Land Equivalent Ratio), we systematically recalculated data from related variables in order to retrieve the missing data. Data are recorded as a CSV-formatted file. This format is widely supported by spreadsheets and enhance data interoperability for scientific applications. The different field site * years * treatment * control combinations are presented in rows, including the row header for the names of variables. All years of data were extracted corresponding to the same control-treatment pairs over different years. Columns represent all variables collected for each treatment. Data collected are grouped into 5 variable groups. Table 1 presents an extract of the variables collected from the articles; the full table is presented in the data repository.

Data Records

The data are accessible on the Data INRAE repository211, available at https://doi.org/10.57745/HV33V1. It includes the following files:

  1. 1.

    “Database.csv” includes the data in “.csv” format.

  2. 2.

    “Database.xlsx” includes the data in Microsoft Excel® format.

  3. 3.

    “Summary of the database.csv” includes a summary of the dataset (meta-data), presenting the name, the definition, the unit and the availability of all extracted variables

  4. 4.

    “List of references.pdf”, presents all the references of the publications included in the dataset

  5. 5.

    “Paut_data_paper_figures_code.Rmd” includes an R script to generate figures.

  6. 6.

    “Classification of F&V.xlsx” includes a fruit and vegetable classification table used in the R script

  7. 7.

    “List of unique intercrops.xlsx” presents a list of the unique intercropping systems included in the database

General information on the articles

‘Title’, ‘Authors’, ‘DOI’ and ‘year_of_publication’ variables

These variables include basic information allowing to easily retrieve the articles integrated in our database: full title, Authors, DOI (Digital Object Identifier), year of publication.

Experimental site soil and climate conditions

‘Country’, ‘Latitude’, ‘Longitude’ and ‘Geocoordinates’ variables

These variables present site-related information: country where experiments were carried out, latitude and longitude (in decimal degrees) of experimental sites. The lat-long coordinates were either extracted directly from the paper, or estimated based on site name identified in Google Maps (https://maps.google.com/). The ‘Geocoordinates’ variable reports whether the lat-long coordinates were exact data directly extracted from the paper or estimated through Google Maps.

‘Climate_zone_Koppen_Geiger’, ‘Soil type’, Soil_texture’ and Soil_pH’ variables

These variables include main pedo-climatic conditions. The climate zone ‘Climate_zone_Koppen_Geiger’ is coded according to Köppen-Geiger climate classification13, the soil texture ‘Soil_texture’ is recoded according to the USDA soil texture calculator (https://www.nrcs.usda.gov/resources/education-and-teaching-materials/soil-texture-calculator), soil type ‘Soil_type’ is recoded according to the World Reference Base for Soil Resources (https://www.fao.org/3/i3794en/I3794en.pdf) or the USDA Soil Taxonomy classification (https://lod.nal.usda.gov/nalt/216302), depending on what the authors refer to, prefixes “WRB” and “USDA” are given respectively. Soil pH ‘Soil_pH’ is reported when available in papers.

Crop management practices

‘Greenhouse’, ‘Organic_ferti’, ‘Mineral_ferti’, ‘Pesticide_use’ and ‘Irrigation’ variables

These variables present the most commonly reported management practices that may influence intercropping system performances. The variables: greenhouse conditions (‘Greenhouse’), use of organic (‘Organic_ferti’) and mineral fertilizers (‘Mineral_ferti’), use of pesticides (‘Pesticide_use’), use of irrigation (‘Irrigation’) are all reported as binary variables (Yes/No).

Intercropping design descriptors

‘Intercropping_design’

This variable presents in which way both species were intercropped (Fig. 3):

  1. I.

    Replacement (or substitutive) design: In the standard replacement design, intercropping systems are formed by replacing a given number of plants of one component by the same number of the other component. As a result, the density of each component is less in the mixture than in its pure stand, but the total stand density is the same in the mixture as in each pure stand17.

  2. II.

    Additive design: In the standard additive design, intercropping systems are formed by adding plant densities of respective pure stands. As a result, the total density is greater in the intercropping than in the pure stand, but the density of each component is the same in the mixture as in the pure stand.

Fig. 3
figure 3

Planting arrangements for pure stand of crop A () and crop B () for replacement and additive designs (adapted from Snaydon 1991222).

‘Intercropping_pattern’

This variable presents in which way both species were spatially sown212 (Fig. 4): (i) Row intercropping, where two plant species are cultivated in separate alternate rows; (ii) Strip intercropping, where several rows of a plant species are alternated with one or several rows of another plant species (one strip includes more than one row), (iii) Mixed intercropping, where the component crops are planted simultaneously within the same row or without a distinct row or strip pattern, (iv) Agroforestry. All the agroforestry systems were alley cropping systems213,214.

Fig. 4
figure 4

Planting arrangements for pure stand of crop A () and crop B () for row, strip and mixed design, and for the specific case of agroforestry with trees (▲).

Crop and yield variables

Since the intercropping experiments include two species, the demonstration is made for one specie (called Crop_1) and is replicable for the second specie (Crop_2).

‘Crop_1_Common_Name’ and ‘Crop_1_Scientific_Name’

These variables give species scientific and common names. The scientific name of each species was related to the common name listed in the United States Department of Agriculture Plants Database (http://plants.usda.gov/java/), to avoid confusion due to the use of different common names for the same species (Table 2).

Table 2 Summary table of most common species included in the database.

‘Crop_1_yield_sole’

This variable gives the yield of Crop 1 when grown as sole crop. The yield is the harvestable yield, or when provided the commercial yield; which differs from the harvestable yield by subtracting non-marketable crops. Yield unit is kept as provided in the original paper.

‘Crop_1_yield_intercropped’

This variable gives the yield of Crop 1 when grown in intercropping. The unit is kept as provided in the original paper.

‘Yield_unit’ and ‘Yield_measure’

The variable Yield_unit gives the yield unit as provided by authors. Although it is mostly Kg per hectare or ton per hectare, it can sometimes be more specific (e.g. ton per feddan) or even not normalized (e.g., Kg per plant). The variable ‘Yield_measure’ indicates what type of yield was recorded (e.g., grain yield, dry weight, marketable yield, etc.).

‘LER_crop_1’, ‘LER_tot’, ‘LER_crop_1_calc’ and ‘LER_tot_calc’

The variable ‘LER_tot’ is the total Land Equivalent Ratio, which is the sum of partial LERs ‘LER_crop_1’ and ‘LER_crop_2’. The partial and total LERs were reported by the raw value provided by the paper, but were also recalculated (‘LER_crop_1_calc’ and ‘LER_tot_calc’) according to Eq. 1). The Land Equivalent Ratio is a widely used indicator to assess intercropping performances215 calculated as follows:

$$LE{R}_{tot\_calc}=LE{R}_{crop\_1\_calc}+LE{R}_{crop\_2\_calc}=\frac{{Y}_{1}}{{S}_{1}}+\frac{{Y}_{2}}{{S}_{2}}$$
(1)

where LERtot_calc is the calculated total Land Equivalent Ratio, LERcrop n is the partial LER of crop n, Yn is the yield of crop n in intercropping and Sn is yield of crop n as sole crop. Figure 5 represents the partial Land Equivalent Ratios of the most represented crops grouped by (a) crop and (b) crop botanical family.

Fig. 5
figure 5

Distribution of partial Land Equivalent Ratios (pLER) for the most represented species in our database (a) and for the species grouped by botanical family (b). The median is represented by vertical bars inside boxes. Box edges indicate first and third quartile, whiskers indicate minimum and maximum values. The dashed line represents a pLER of 0.5. A pLER Value greater than 0.5 indicates a yield advantage for the intercrop compared to sole crop. The number indicated for each modality is the total number of observations and the number between parentheses is the number of articles. Species and botanical families are ranked in descending order of median values. Modalities with less than 20 occurrences are not plotted.

Technical Validation

The database only contains works that have been published in peer-reviewed journals. The same individual thoroughly reviewed each publication to assess its eligibility and the reliability of data. In total, 122 (64% of the total number) papers were checked by at least two different readers to avoid any errors. All co-authors participated in the selection and reading of the papers. A kappa test was performed between the two main contributors to test interrater reliability216. From a sample of 101 randomly selected papers, we obtained a kappa score of 0.839, which is considered as strong agreement. For numerical data in the text or in tables, the values provided are directly from the primary data, for figures, WebPlotDigitizer (automeris.io/WebPlotDigitizer/) program was used, allowing for a semi-automatic and more precise extraction of data presented in figures. During data extraction, outliers were routinely and manually examined for possible errors by recalculating the data (Eq. 1). We checked the data’s validity as many times as necessary by going back to the source publications. We deleted studies when the meaning of the data reported in the articles was unclear (e.g., no unit on yield data, no consistency in the names of the different treatments or too approximate description of the protocol that made it impossible to identify the different treatments with certainty). Once the dataset was constructed, we checked the qualitative and quantitative content of all the continuous, categorical and binary variables (Table 1). We checked for variable format, range, factor levels, uniqueness and valid/missing observations information. The check was carried out by two of the co-authors. Finally, variable attributes were checked by a visual assessment of the summary statistics and data distribution for each variable in turn with the summarytools R package (v. 0.6.5). A summary table is provided in the data repository.

Usage Notes

The dataset is based on a collection of experimental data published in 191 articles between 1982 and 2022. It is to our knowledge the first and most exhaustive dataset on intercropping and agroforestry studies exclusively in horticulture. We identify four potential uses of the dataset that could benefit alternatively researchers in agronomy and ecology, agricultural advisors or farmers.

First, the dataset can be analyzed to evaluate the agronomic performance of a wide variety of fruit and vegetable species in intercropping. We can already anticipate that this dataset can be used to identify the best performing species or to better understand the effect of variables of interest: environmental variables (climate, soil), intercropping design (row, strip, replacement, etc.), management practices (fertilization, irrigation, etc.). The data provided could be particularly suitable for a meta-analysis of variance217,218, or as input data for intercrop models219,220.

Second, a more exploratory approach may be used with our dataset. Figure 6 illustrates the geographical differences in intercropping studies by displaying regional networks (subsets of the dataset) of species pairs. This data may be useful to solve local problems, target the right species under specific climatic and soil conditions, and determine which species allow for accurate comparisons and which have insufficient information. Also, the database can serve as a tool for comparing assessments of species production in contexts with biotic and abiotic yield-limiting stresses in the context of climate change221.

Fig. 6
figure 6

Regional networks of horticultural species included in the database. The regions considered are regions defined in the World Bank Development Indicators. The edges (links) between two species represent a pair of species grown in intercropping experiments. Edges width is proportional to the number of field experiments. Nodes with similar colors are crops from the same botanical families.

Finally, the location of the trials was systematically reported through latitude and longitude coordinates. This makes it simple to connect our database to other large-scale databases, such as soil (e.g. data.isric.org/), climate (e.g. worldclim.org) or agroecological zone classifications (e.g. gaez.fao.org/).

The dataset is freely reusable and easy to update. The data file is structured in such a way that it is straightforward to add new studies from alternative scientific databases, grey literature, or new variables to be investigated to extend the range of possible applications of our dataset.