Introduction

Wild rocket (Diplotaxis tenuifolia [L.] D.C.) is a leafy vegetable crop belonging to the Brassicaceae family, cultivated in many temperate areas worldwide for fresh consumption in salad preparations, and appreciated for the characteristic pungency, intense peppery flavour, antioxidant activity and other health-promoting phytonutrients1,2,3. This baby-leaf crop has a significant economic interest particularly with reference to the specific high-convenience food chain4. Nowadays, Italy is the major European producing country of packaged wild rocket, cultivated on around 6000 ha under unheated plastic greenhouses5,6. Because of intensive farming, multiple harvesting cycles per sowing season at 20–40 days intervals7, specific protocols, based on low-impacting agrotechniques, are necessary with the crucial aim to avoid pesticide residues in the product and obtain premium quality fresh-cuttings associated with improved aesthetic look, taste, nutraceutical properties and shelf-life potential8,9,10. Nevertheless, the growing conditions, general cultivar susceptibility to pathogens and the requested lower use of fungicides make the crop prone to recrudescence and/or re-emergence of soil-borne pathologies11,12,13, such as Rhizoctonia solani Kühn and Sclerotinia sclerotiorum (Lib.) de Bary, which can cause devastating lawn rotting and yield losses impacting the farmer economy in the short and mid-term11,14,15. These soil-borne pathologies rise to a real emergency in the intensive biological management systems and anywhere synthetic fungicides are banned or not permitted16,17,18,19,20. Valuable alternatives appliable in the management programs, need to be further improved in terms of use efficiency21. Digital technologies that analyse the light reflected from crop foliage in a wide wavelength range of the electromagnetic spectrum can aid in the making decision process aiming to crop protection targeted strategies based on a reduced use of pesticides22,23. For example, Manganiello et al.24 recently selected high-performing hyperspectral vegetative indices (VIs), such as Soil Adjusted Vegetation Index (SAVI) group and Triangular Vegetation Index (TVI), able to track Trichoderma spp. biocontrol strains in counteracting multiple soil borne diseases of baby leaf vegetables, including R. solani on wild rocket, opening the way to precision biological control applications.

The non-imaging technology is based on the application of high-resolved reflectance spectroscopy to acquire the punctual spectral fingerprints of a plant at foliar and/or canopy level, interrelated to its chemical-physical attributes, in order to provide useful remote sensed information on the plant status. Previous studies demonstrated the accuracy of hyperspectral optical sensing to well classified plant diseases and configure reference outlines for the proximal surveying of crop healthiness25. Karadağ et al.26 improved a method based on 350–2500 nm spectral reflectance for the Fusarium disease severity estimation on pepper. On peanut, reflectance data have been used to identify the disease stage of the bacterial wilting caused by Ralstonia solanacearum27. Junges et al.28 applied hyperspectral scouting to discriminate the described symptomatology associated with the grapevine decline and death syndrome. In tomato, following a laboratory-to-field approach, Abdulridha et al.29 pointed up hyperspectral surveying cultivations affected by non-specific leaf spotting at different stages, caused by the fungus Corynespora cassicola and the bacterium Xanthomonas perforans.

The above-mentioned technologies offer new opportunities to devise innovative tools for supporting the sustainable soil-borne disease management of leafy vegetables, increasing the farmers' monitoring capability to early identify outbreaks and apply a timely and spatially effective phytosanitary intervention, especially under continuous sequences in Mediterranean-type climate regions30. However, a basic computational study to filter the acquiring big data into more manageable synthetic hyperspectral indices is a decisive step to make the proposed digital tool exploitable in practical applications on crops with acceptable costs31. Therefore, this study aimed at the evaluation of the disease-specific hyperspectral signatures of wild rocket plants artificially inoculated, under laboratory conditions, with three soil-borne pathogens (R. solani, S. sclerotiorum and Sclerotium rolfsii Sacc.). To this purpose, wild rocket plants were subjected to hyperspectral data acquisition using a non-imaging hyperspectral sensor. The first objective was to obtain a high-resolution assessment of the differences among spectral signatures of healthy and infected (at different disease levels) plants at leaf scale, and thus the selection of the most discriminant wavelengths. This information was used to set specific de novo synthetic VIs under the most stable environmental conditions, without risk of changing environmental factors (i.e., using the sensor as an active sensor), through a Machine Learning pipeline using the Random Forest (RF) algorithm. The second objective was to understand factors affecting the use of newly developed VIs at plant/canopy scale (i.e., at pot scale and using the sensor as passive sensor), assessing their transferability to high-throughput experiments for practical applications. The methodology devised in the present study can be of larger applicability in other crop—(soil-borne) pathogen systems.

Materials and methods

The present study was conceptualized and conducted based on two independent experiments, named “guide experiment” (Exp_Gui) and “applicative experiment” (Exp_App), following the workflow shown in Fig. S1.

Leaf and canopy reflectance data (see the “Reflectance measurements” section) were collected from soil-inoculated wild rocket plants grown in controlled environments. From Exp_Gui we obtained leaf-scale reflectance data. The resulting dataset was used for calibration and validation purposes, and then divided into a “training dataset” and a “testing dataset” (see the “Data preparation and analysis” section), to model spectral responses at increasing levels of disease severity, and to develop new VIs capable of monitoring aerial symptoms of soil-borne diseases in wild rocket. Reflectance data at canopy scale were acquired from Exp_App. The resulting dataset, called the “canopy dataset”, was used to evaluate the applicability of the newly developed VIs using plant growth and data acquisition conditions comparable to those normally applicable, or expected to be applicable, in real growth environments for early detection of fungal diseases.

Fungal pathogens and inoculum preparation

Virulent isolates of soil-borne fungal pathogens R. solani (RhS) AG-4, S. rolfsii (ScR), and S. sclerotiorum (ScS) used in this study were obtained from the collection of fungi maintained on potato dextrose agar (PDA, Difco Laboratories, Detroit, Mich.) slants at 20 °C at the Research Centre for Vegetable and Ornamental Crops, Council for Agricultural Research and Economics (CREA), located in Pontecagnano Faiano, Italy. They were previously tested for pathogenicity on wild rocket; stock cultures were stored on PDA slants at 4 °C until use. For each pathogen, the artificial inoculum was prepared by infecting with 20 mycelia plugs (0.5 cm diameter) 500 g millet, previously autoclaved (22 min., 121 °C) and saturated (v/v) with 0.1 × potato dextrose broth (PDB, Difco Laboratories, Detroit, Mich.). After 21 days of incubation in the dark at 25 °C, the infected millet was ready for use after air-drying.

Experiment description

In both experiments, wild rocket (D. tenuifolia) cv. Tricia (Enza Zaden, Italy) was sown in black plastic pots (140 mm diameter) filled with autoclaved peat substrate (Klasmann-Deilmann GmbH, Geeste, Germany), at a density of 5 plants pot−1, and grown in a nursery glasshouse to obtain 1-month-old plants. Then, plants were transferred to benches in a climatic chamber at 25 ± 2 °C with a 12-h photoperiod (Exp_Gui) or in an experimental greenhouse (Exp_App). Daily irrigation with dechlorinated sterile water was performed to maintain water holding capacity around 80% of nominal, while a basal NPK mix liquid fertilization was applied twice a week.

For both Exp_Gui and Exp_App, three disease conditions (RhS-D, ScR-D, and ScS-D, disease) were imposed, each at three different levels of disease severity plus a healthy not-inoculated treatment. Each treatment consisted of a total of 10 pots (replicates) for each combination. The different levels of disease severity were obtained by inoculating different groups of plants at 10, 7, and 3 days before sampling, i.e., spectral acquisitions. Pots were inoculated by incorporating 1 g (dry weight) pot−1 of infected millet in the first layer (5 cm depth) of the substrate. Eventually, each pathosystem consisted of 40 pots (200 plants), characterized by different levels of disease severity for a total of 120 pots (600 plants) in the whole trial.

Leaves from Exp_Gui were visually assessed for their disease severity with a score on a 0–3 scale. For pots from Exp_App a visual inspection was also assessed by using a 0–3 rating scale according to the mean symptoms of foliage disease, for each pathosystem. Leaves/pots were classified as: healthy (Disease class 0), and diseased at early (Disease class 1), moderate (Disease class 2), and severe (Disease class 3) degree, according to Manganiello et al.24. Disease class 0 was observed only from the not inoculated healthy controls (Fig. 1).

Figure 1
figure 1

Representative photographs of the four disease classes (0–3) at foliar and pot level.

Reflectance measurements

In Exp_Gui, leaf reflectance (350–2500 nm) was measured by contact under laboratory conditions with a portable non-imaging spectroradiometer (FieldSpec® 4 Hi-Res, ASD Inc., Boulder, CO, USA) using an optical fiber contact probe (ASD Plant Probe; ASD Inc., USA) with a 10 mm field of view and an integrated halogen reflector lamp, which minimized atmospheric interference on the recorded signal (active sensor). To increase the quality and homogeneity of the acquired data, the instrument was warmed up for 90 min before measurement. The pre-calibrated Spectral 99% white reference panel was used for calibration, run every 5 min during data acquisition. Each sample scan represented an average of 20 reflectance spectra. Leaves were randomly selected from pots/plants characterized by different disease levels, resulting in a homogenous sampling across disease severity classes. A total of 216, 212, and 222 reflectance acquisitions were obtained for RhS-D, ScR-D and ScS-D, respectively. Leaves were removed from the plants and spectral reflectance was measured immediately on-site on their adaxial side.

For Exp_App, the canopy reflectance was assessed outdoors under full sunlight conditions using the ASD FieldSpec® 4 Hi-Res (ASD Inc., Boulder, CO, USA). All spectra were obtained using the spectroradiometer with a 25° field of view (FOV). Potted plants were placed in a fixed plate on a black background, maintaining a vertical distance between the sensor and the top of the plants of 10 cm (passive sensor). Three measurements per pot were conducted to obtain an average spectrum in the “canopy dataset” (n = 40 for each disease).

Dry weight, leaf area index

Plants in Exp_App were also evaluated for dry weight (DW, g pot−1), leaf area (LA, cm2 pot−1), and dry matter content (DM, %) of aerial biomass. Plants were sampled and transferred to the laboratory for aerial fresh and DW determinations, after drying them in an oven at 70 °C until constant weight. Prior to drying, LA was recorded using LI-3100 area meter (Li-Cor Corporation, Lincoln, NE, USA).

Data preparation and analysis

After removing the noisy bands at the extreme wavelengths, the spectral range from 400 to 2450 nm (2051 wavelengths) was analysed. For both leaf and canopy reflectance data, pre-processing involved splice correction to remove the abrupt changes in the recorded values of the neighbouring wavelengths, which appear at the spectral intervals between the three integrated sensors of ASD FieldSpec spectroradiometer. Splice correction was achieved using the commercial software package ASD-View Spec Pro (ASD Inc., Boulder, CO, USA).

Leaf reflectance was assessed under constant light and temperature conditions, so further pre-processing to smooth the spectrum and reduce signal noise was not performed32. Prior to spectral statistical analyses, leaf reflectance was pre-processed using first derivative and mean-centering, in order to reduce the baseline offset by amplifying small spectral features33. Data pre-treatments was performed with the ParLeS software34. For each disease, data were then randomly divided into two datasets by splitting the observations (approximately 70:30) into training (calibration) and test (validation) datasets (training set, n = 150, 149, and 160 for RhS-D, ScR-D and ScS-D, respectively; testing set, n = 66, 63, and 62 for RhS-D, ScR-D and ScS-D, respectively).

The RF supervised machine learning algorithm35, a classification algorithm consisting of an ensemble of decision trees, was used by applying R Caret package36 to select the most predictive bands to be used in the construction of the VIs as well as to evaluate the VIs performance in discriminating among the four classes for each disease. To this purpose, a two-step strategy was applied using two types of continuous variables, i.e., (i) reflectance features and (ii) VIs derived from Exp_Gui, to classify the categorical variables (Disease Classes 0, 1, 2, 3) through a set of decision tree process to develop the RF predictive models. The complexity of the input variables was then reduced gradually by selecting at each step the features (bands or VIs) with the highest predictive power through the VSURF R package37 and testing the performance of the restricted variable sets using the Confusion Matrix (R Caret package) and Kappa coefficient. In total, 4 RF models were obtained using: for the first step, (1) all the wavelengths (All λ) and (2) only the selected wavelengths (Selected λ); for the second step, (3) all the derived VIs (All VIs) and (4) only the selected VIs (Selected VIs).

The VIs related to disease severity classes were calculated from the selected wavelengths in the most important regions of the full spectrum. For RhS-D, ScR-D and ScS-D, wavelength selection was performed during the first RF step.

Two band combinations of the spectral indices, Normalized Difference Spectral Indices (NVIs) and Simple Ratio Indices (SRs) were generated for all the possible combinations of the selected wavelengths following the equations:

$${NVI}_{i-j}=\frac{(reflectance({wavelength}_{i})-reflectance({wavelength}_{j}))}{(reflectance({wavelength}_{i})+reflectance({wavelength}_{j}))}$$
$${SR}_{i-j}=\frac{reflectance({wavelength}_{i})}{reflectance({wavelength}_{j})}$$

where reflectance(wavelengthi) and reflectance(wavelengthj) are the spectral reflectance values of two random wavelengths (λi and λj, respectively).

The newly obtained NVIs and SRs were correlated (Spearman rank correlation)38 with the disease severity levels for each pathogen, to identify the most suitable indicators associated with the identification and development of the pathogenesis.

Finally, newly developed VIs strongly correlated with treatments were tested using the canopy dataset, obtained by canopy reflectance measurements from Exp_App. For each disease, Spearman rank correlations were applied with R software39 to assess the correlation between the VIs and disease severity (at the pot scale) as well as between the VIs and LA, DW and DM, all performed at the pot scale. NVIs and SRs were calculated from the raw spectra after splice correction, as previously indicated. One-way ANOVA followed by the post hoc Least Significant Difference test (LSD) was applied to assess statistical relationships among VIs values over the disease classes using the R package agricolae40.

Machine learning modelling, correlations and ANOVA were performed using the statistical software R version 4.0.239.

Ethics approval

The experimental research and field studies on plants, including the collection of plant material, complied with relevant institutional, national, and international guidelines and legislation.

Results

Disease index and plant growth

All plants in the infected pots developed disease symptoms, as expected, with varying severity depending on the specific pathogen and timing of inoculation. ScR proved to be the most aggressive pathogen producing the basal hydropic lesions immediately after the first day of interaction. Overall, all diseases progressed from hydropic halos/lesions on the plant collar and basal parts of the petioles to rot, wilt, and death of the plant. ScR formed the typical mycelial felt on the targeted basal organs and at a late stage produced the typical light brown spherical sclerotia. On the other hand, ScS signalled its presence with a slight production of white mould, whereas RhS did not burst.

Based on Exp_App, the effects of soil-borne pathogens on plant growth were monitored. Fungal diseases significantly affected plant DW, LA and DM, as they progressed through the four severity levels (Fig. 2).

Figure 2
figure 2

Dry weight (DW, g plot-1), leaf area (LA, cm2 plot-1), and dry matter (DM, %) as recorded for wild rocket inoculated with Rhizoctonia solani (RhS-D, black circle), Sclerotium rolfsii (ScR-D, black square), and Sclerotinia sclerotiorum (ScS-D, black triangle), at four disease severity levels (Disease classes 0, 1, 2 and 3 – x-axis) (Exp_App). Data are averages ± standard errors.

Hyperspectral fingerprints

Typical reflectance signatures of diseased leaves of wild rocket inoculated with RhS, ScR, and ScS are shown in Fig. 3A–C, respectively. Regardless of the fungal pathogen, the reflectance of leaves of infected plants tended to increase abruptly in the VIS region with advancing disease severity. At the highest disease severity level, the profile of the spectral signature was also affected, especially between 550 and 700 nm, resulting in the altered redshift characteristics and red-edge positions. Despite differences among the pathogens, the reflectance of the most infected leaves decreased in the NIR regions and increased again from around 1400 nm, without affecting the signature patterns (Fig. 3).

Figure 3
figure 3

Spectral reflectance as recorded, in the range 350-2500 nm, on wild rocket leaves inoculated with (A) Rhizoctonia solani (RhS-D), (B) Sclerotium rolfsii (ScR-D), and (C) Sclerotinia sclerotiorum (ScS-D) at four disease severity levels (Disease classes 0, 1, 2 and 3) (Exp_Gui).

Selection of the features by Random Forest modelling

In order to select the most discriminant wavelengths for each pathogen, the RF supervised machine learning algorithm was trained and validated using, respectively, 70 and 30% of the input observations from Exp_Gui. The predictive performance of the models (All λ) is highlighted in each related Caret's Confusion Matrix and related statistics shown in Table 1. RF ran efficiently on the first derivative reflectance data showing a stable prediction of disease levels compared to the healthy Control, already at the early stages of infection. The higher accuracy level was recorded for ScR-D (0.726) followed by RhS-D and ScS-D (0.625, and 0.500, respectively); these accuracy levels were significantly greater than the no information rate (p-value [A > N] < 0.05).

Table 1 Confusion Matrices and overall statistics of Random Forest models trained and validated on (1) all wavebands (All λ) and on (2) the selected wavebands (Selected λ)—reflectance data recorded from wild rocket leaves inoculated with Rhizoctonia solani (RhS-D), Sclerotium rolfsii (ScR-D), and Sclerotinia sclerotiorum (ScS-D) at four disease severities (Disease classes 0, 1, 2 and 3) (Exp_Gui).

The R package VSURF was then implemented on the obtained RF models in order to remove irrelevant information by the recursive elimination of the smallest important wavelengths and so leading to a subset of explanatory variables highly related to the model response, by minimizing the prediction error. This variable selection method returns a smaller subset of variables focusing more closely on the RF prediction objective avoiding redundancy. The second RF model (Selected λ) was then constructed for each pathogen. In this case, we observed high levels of the proportion of correctly classified cases as well as good metrics, indicating substantial invariance in the predictive value of the models compared to the previous benchmark and so demonstrating the efficiency of band selection (Table 1). In particular, an increase of accuracies of RF models for RhS-D and ScS-D (0.762 and 0.546, respectively), and a decrease for the ScR-D ones, were observed. Moreover, the Balanced Accuracy predicted better the two extreme disease classes (Class 0 and 3)—with values > 0.75—than the intermediate ones (Class 1 and 2), both when all wavelengths were used as entry variables of models and when the RF models were constructed using only the selected wavelengths (Tables S1, S2, S3). This also emerges from the confusion matrices where the misclassified observations tend to zero as we move away from the diagonal (Table 1).

For each telluric pathogenic disease, the selected wavelengths from RF models, most related to the classification response, are reported in Table 2. These bands fell in three regions of the full spectrum, mainly in the VIS–NIR and very few in the SWIR. A total of 9, 16, and 12 wavelengths for RhS-D, ScR-D, and ScS-D, respectively, were selected and combined to build 135, 72, 99 both SRs and NVIs, respectively, for each disease (data not shown). It is worth noting that, regardless of the pathogen, the models selected neighbouring wavelengths in the spectrum (e.g., 559/560 nm; 601/602 nm; around 665 nm; around 724 nm). Some differences were observed for ScR-D in the VIS (i.e., 447 and 854 nm) and, in general among diseases, in the SWIR region.

Table 2 Wavelengths (nm) selected from reflectance data recorded on wild rocket leaves inoculated with Rhizoctonia solani (RhS-D), Sclerotium rolfsii (ScR-D), and Sclerotinia sclerotiorum (ScS-D) (Exp_Gui).

The so obtained VIs were used for the construction of the third RF classification model (All VIs) for each disease. Indeed, thanks to the flexibility of the RF algorithm, all VIs can be used to get a more accurate prediction of the RhS-D, ScR-D and ScS-D levels. The related confusion matrix showed general accuracy predictions of all VIs based RF models that stood at 0.656, 0.645, and 0.530 for RhS-D, ScR-D, and ScS-D, respectively (Table 3).

Table 3 Confusion Matrices and overall statistics of Random Forest models trained and validated on (3) all the VIs generated by the combinations of the selected wavelengths (All VIs) and on (4) the selected VIs (selected VIs)—reflectance data recorded from wild rocket leaves inoculated with Rhizoctonia solani (RhS-D), Sclerotium rolfsii (ScR-D), and Sclerotinia sclerotiorum (ScS-D) at four disease severities (Disease class 0, 1, 2 and 3) (Exp_Gui).

These models were propaedeutic for the most effective classifier VIs selection; indeed, the subsequent VSURF step allowed the selection of the 12, 10, and 10 most suitable indices, listed in Table 4 and described in the next section, to model disease severity for their ability to classify via RF algorithm the RhS-D, ScR-D and ScS-D. To confirm their consistency for the mentioned objective, the newly obtained VIs fed the fourth RF model (Selected VIs), giving metrics shown in Table 3 as well as in Tables S1, S2, and S3.

Table 4 Hyperspectral vegetation normalized difference (NVI) simple ratio (SR) indices (dimensionless) used in this study for wild rocket disease detection.

Performances of VIs at both leaf and canopy scales

For each pathogen, the selected NVIs and SRs derived from Exp-Gui data are reported in Fig. 4. The highest number of new VIs was found for RhS-D, with 9 wavelengths falling within the ranges of green (559 nm), orange (602 nm), red (643, 664, 671, 703, 719, 724 nm) and SWIR (1377 nm) regions. Ten VIs (7 NVIs and 3 SRs) were obtained to discriminate ScR-D, based on 7 wavelengths of the VIS–NIR ranges, i.e., blue (447 nm), green (560 and 567 nm), yellow (579 nm), orange (602 nm), and red (636, and 733 nm). Lastly, 10 VIs were selected for ScS-D foliar symptoms discrimination: such VIs were developed starting from 8 wavelengths falling within green (560 nm), yellow (575 nm), orange (601 nm), red (656, 667, and 726 nm), and SWIR (1135 and 1908 nm) regions (Table 4). Some VIs were significantly different for all stages of pathogenesis (i.e., for RhS-D: NVI559-602, NVI703-724, NVI724-602, NVI724-719, SR559-602, SR703-724, and SR1377-664; for ScR-D: NVI560-567, NVI560-579, NVI579-567, NVI636-733, SR560-579, and SR636-733), while others discriminated only the highest level of disease severity (i.e., for RhS-D: SR643-559; for ScS-D: SR575-560, and SR601-560) (Fig. 4).

Figure 4
figure 4

Average of the selected vegetation indices (VIs, dimensionless) calculated from the leaf-reflectance data of wild rocket inoculated with (A) Rhizoctonia solani (RhS-D), (B) Sclerotium rolfsii (ScR-D), and (C) Sclerotinia sclerotiorum (ScS-D) at four disease severity levels (Disease classes 0, 1, 2 and 3) (Exp_Gui). NVI = normalized differences index; SR = simple ratio. Bars with different lowercase letters are significantly different (p-value ≤ 0.05).

Table 4 also displays the correlations among the selected VIs and the disease severity levels (Spearman correlation coefficient > 0.6 with p-value < 0.05). In general, high correlation coefficients were observed for RhS-D and ScR-D (except for NVI559-671 for RhS-D and NVI560-447 and NVI579-447 for ScR-D), while only 4 VIs showed high correlations (> 0.75) with the classes of severity in the case of ScS-D (i.e., NVI601-1908, SR575-560, SR601-560, and SR656-726). However, to evaluate the transferability of the selected VIs to high-throughput environments (i.e., canopy experiments), only the VIs showing Spearman correlation coefficients > 0.60 were considered.

The applicability of those VIs was tested using the canopy dataset from Exp_App. For each pathogen, VIs were calculated starting from canopy reflectance data at the pot scale (Fig. 5). Water absorption bands and noisy spectral regions over 1351–1409 nm and 1801–1949 nm were excluded from the canopy spectral signature and so the NVI601-1908 (ScS-D) was not calculated.

Figure 5
figure 5

Average of the selected vegetation indices (VIs, dimensionless) calculated from the canopy reflectance data of wild rocket inoculated with (A) Rhizoctonia solani (RhS-D), (B) Sclerotium rolfsii (ScR-D), and (C) Sclerotinia sclerotiorum (ScS-D) at four disease severity levels (Disease classes 0, 1, 2 and 3) (Exp_App). NVI = normalized differences index; SR = simple ratio. Bars with different lowercase letters are significantly different (p-value ≤ 0.05).

The identified VIs showed a significant relationship (based on Spearman correlation coefficients) with the disease classes of severity (Table 5). RhS-D maintained 11 VIs showing high correlation coefficients with disease levels (ranging from 0.60 to 0.65) but characterized by a good correlation (Pearson coefficient) with some important parameters of crop growth (i.e., LA and DM) (Table 5). The best results were achieved by the NVIs and SRs reporting the reflectance value at 559 and 602 nm. ScR-D presented the better performing indices among all the investigated pathogens. Those VIs were able to discriminate infection and well related to all measured growth parameters, including the plant DW at harvest. Finally, for ScS-D it was possible to maintain only three indices, characterized by a low correlation with disease severity, LA, and DM (Table 5).

Table 5 Spearman correlation coefficients for disease severities (Disease classes 0, 1, 2, 3) and Pearson correlation coefficients for dry weight (DW, g plot−1), leaf area (LA, cm2 plot−1), and dry matter (DM, %) and the newly developed vegetation indices (normalized difference indices: NVI; simple ratios: SR—listed in Table 4) calculated from reflectance data recorded on wild rocket inoculated with Rhizoctonia solani (RhS-D), Sclerotium rolfsii (ScR-D), and Sclerotinia sclerotiorum (ScS-D) (Exp_App).

Discussion

Greenhouse cultivation of leafy vegetables such as wild rocket on large areas requires continuous disease monitoring for the ready identification of the spatial/temporal distribution of disease symptoms within the crop, which allows increasing effectiveness (less quantities) and targeting (more precision) of preventive and/or curative control means. For this purpose, proximal detection methods using optoelectronic probes have proved suitable to feed farmers’ decision-making processes by ensuring automatic, rapid, non-destructive, and large-scale disease assessment, and reducing the reaction time to the early outbreak22.

In this study, a high-resolution hyperspectral array [i.e., VIS/infrared (IR) spectroscopy] was used to attempt to follow the progression of R. solani, S. rolfsii, and S. sclerotiorum disease symptoms on D. tenuifolia through four different severity levels. To the best of our knowledge, this is the first study that investigates the responses of wild rocket to soil-borne fungal pathogens in terms of spectral reflectance. Starting from the full spectral signature we introduced and calculated new vegetation indices aimed at reducing the dimensionality of the dataset, suitable to discriminate between healthy and infected classes of wild rocket leaves, which could be effectively transferable to real (or real like) growing conditions, in order to accelerate disease detection. The methodological information obtained in the present study can be transferred to other (leaf) crop—(soil-borne) pathogen systems.

Modelling was performed on an experiment conducted under controlled conditions (i.e., growth chamber) and reflectance data were collected in the laboratory (at leaf scale) using a specific light source41, thus resulting in high-resolution and high-precision measurements. As expected, fungal diseases influenced the spectral signature: from a qualitative point of view, healthy plants showed low reflectance values in the VIS and SWIR regions as well as high reflectance values in the NIR, highlighting changes in physiological and biological parameters in responses to pathogenesis, i.e., reduction in chlorophyll content, changes in leaf water content, and degeneration of internal leaf structure, respectively42,43,44. In agreement with previous results41, a slight blue-shift in the position of the red-edge due to the symptoms of developing necrosis was also observed. However, slight differences among pathogens were detected (e.g., ScR-D near 1700 nm). This deserves further investigation as specific pathosystems are characterized by specific spatiotemporal dynamics, as reported in sugar beet32, tomato45, and potato46. It is worth specifying that the soil-borne diseases studied generally allow comparable symptomatology, despite differences in terms of disease progression, and their impact on foliage is almost indirect.

The latter aspect might also have been related to the ability of the RF models to discriminate healthy and highly infected leaves with higher accuracy, in comparison with intermediate disease severity, as previously observed in similar studies performed with Propagation Neural Network spectral calibration models on blackleg potato that reduced accuracy on whole plants in field trials25. The RF algorithm represents one of the most common statistical machine learning methods used in modelling and classifying qualitative classes of disease severity from hyperspectral data and has been successfully employed in many host–pathogen combinations41,47. Recently, the RF algorithm has been applied by our group to model the detection of powdery mildew and tracheofusariosis in wild rocket48,49. However, since the continuous narrowband datasets contain redundant spectral information, the selection of significant wavebands was performed, based on the results of RF decision trees50.

The main objective of this study was to obtain new spectral VIs, starting from the selected wavelengths in the three full-spectrum regions (i.e., VIS, NIR and SWIR), useful in the scaling up and speeding up of disease detection51 in wild rocket plants under greenhouse or open-field conditions. In conceptual agreement with the described pipeline, similar hyperspectral data mining was previously conducted to identify the best indices with high healthy/diseased separation capacity of late blight in tomato52, Cercospora leaf spot in beet53, powdery mildew in squash29, bacterial wilt in peanut27, Yellow Rust in wheat54, and Alternaria leaf blight in potato55.

Regardless of fungal diseases, the new calculated VIs (at leaf scale) were generally effective in separating healthy from infected leaves; however, some VIs could also discriminate intermediate disease intensities. Interestingly, in the case of ScR-D, which seemed to be characterized by the highest hyperspectral spectroscopy performances, probably due to its greater aggressiveness, the selected NVIs and SRs indices are based on green and light red bands highlighting the effects of disease on leaf pigments concentrations as well as on photosynthetic efficiency56. The SR1377-664 was characterized by a good ability to separate disease intensities associated with RhS-D, confirming the role of IR bands to estimate cell damages. In particular, the spectral region centered around 1377 nm represented a key region in the identification of physically damaged mushrooms during the monitoring of their quality57. Moreover, for RhS-D, the selected wavelengths in green and orange regions, at 559 and 602 nm, determined high-performance VIs. Those wavelengths have been found ascribed to the leaf chlorophyll content. In particular, λ559, which is normally ignored in most VIs, has been found to be associated with yield and biomass accumulation58.

To verify the transferability of de novo VIs, they were tested against the canopy dataset. Close to the spectral regions of the VIs in the present study, MCARI has been previously found to be responsive to the early stage of disease development of Cercospora leaf spot and rust on sugar beet32, to tomato chlorophyll content and LAI reduction as affected by bacterial wilting59, and to asymptomatic shoot status in presence of abiotic root stress in pepper60. On the other hand, the indices estimating pigments, Plant Pigment Ratio and Carotenoid Indices, based respectively on range 550–450 and 500–570 nm, have been, respectively, used to predict plant N status and chlorophyll61 and carotenoid levels as a proxy of plant stress62.

As a matter of fact, the ongoing symptoms associated with soil-borne pathogens, i.e., losses of chlorophyll and other pigments, discoloration, reduced water content, wilting and, finally, desiccation, are closely linked to reduced leaf absorbance in the canopy63. In sugar beet, Reynolds et al.64 calculated two narrowband hyperspectral vegetation indices, such as Pigment Specific Simple Ratio (working on chlorophyll a) and modified Spectral Ratio (close to canopy reflectance at 445, 705 and 750 nm), for early detection of Rhizoctonia crown and root rot, while chlorophyll/carotenoids dependent SRPI (simple ratio pigment index) was proposed to discriminate the soil borne disease from Heterodera schachtii nematode symptoms65. Huang and Apan66 found VIS–NIR narrow range wavelengths valuable to predict the incidence of Sclerotinia rot disease in celery. For most of the literature about VIs, the discrimination between healthy and severely diseased plants leaves is attributable to the strong changes that occurred in the later soil-borne infection stages24. The relevance of these indices seems to be limited to the drastic exclusion of infected plants that can be successfully practised, whereas, in the late stage, disease progression is too advanced to apply preventive strategies.

In the present study, the newly formulated VIs at high-resolution and low-throughput levels (leaf scale) were tested directly on an independent dataset (high-throughput experiment), built on reflectance measurements performed on the canopy, confirming their transferability to ordinary growing and measuring conditions. Interestingly, for all diseases a good correlation between computable VIs and LA was observed.

Conclusions

Our study underlines the importance of scaling up the results of reflectance spectroscopy, performed at leaf scale with a contact probe, under controlled conditions and thus without by environmental factor effects (i.e., light intensity and direction, temperature, and canopy architecture), to larger-scale experiments (canopy scale), characterized by increasingly high-throughput and important for practical applications.

New vegetation indices able to discriminate disease stages were obtained. The most interesting wavelengths fell in the visible and infrared spectral regions, including interesting regions/wavelengths already investigated and previously identified under stress conditions. Extraction of the most significant spectral information from the big dataset is crucial to simplify the production of optoelectronic sensors, suitable for plant disease detection41 specifically aiming to cheap and innovative surveying devices, to support sustainable disease management in greenhouse or open-field conditions. On perspective, this technology, possibly integrated with different sensors (i.e., thermal sensors and/or 3-D shape sensors)67, can improve pathogen detection, since the availability of significant spectral information linked to machine vision-based remote sensing makes feasible the real-time mapping of disease management.