Detecting stress caused by nitrogen deficit using deep learning techniques applied on plant electrophysiological data

González I Juclà, Daniel; Najdenovska, Elena; Dutoit, Fabien; Raileanu, Laura Elena

doi:10.1038/s41598-023-36683-3

Download PDF

Article
Open access
Published: 14 June 2023

Detecting stress caused by nitrogen deficit using deep learning techniques applied on plant electrophysiological data

Daniel González I Juclà^1,2^na1,
Elena Najdenovska¹^na1,
Fabien Dutoit¹ &
…
Laura Elena Raileanu¹

Scientific Reports volume 13, Article number: 9633 (2023) Cite this article

1878 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Plant electrophysiology carries a strong potential for assessing the health of a plant. Current literature for the classification of plant electrophysiology generally comprises classical methods based on signal features that portray a simplification of the raw data and introduce a high computational cost. The Deep Learning (DL) techniques automatically learn the classification targets from the input data, overcoming the need for precalculated features. However, they are scarcely explored for identifying plant stress on electrophysiological recordings. This study applies DL techniques to the raw electrophysiological data from 16 tomato plants growing in typical production conditions to detect the presence of stress caused by a nitrogen deficiency. The proposed approach predicts the stressed state with an accuracy of around 88%, which could be increased to over 96% using a combination of the obtained prediction confidences. It outperforms the current state-of-the-art with over 8% higher accuracy and a potential for a direct application in production conditions. Moreover, the proposed approach demonstrates the ability to detect the presence of stress at its early stage. Overall, the presented findings suggest new means to automatize and improve agricultural practices with the aim of sustainability.

Decoding the physiological response of plants to stress using deep learning for forecasting crop loss due to abiotic, biotic, and climatic variables

Article Open access 26 May 2023

Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress

Article Open access 23 September 2023

Quantification of salt stress in wheat leaves by Raman spectroscopy and machine learning

Article Open access 03 May 2022

Introduction

The plant’s intrinsic electrical signaling network, known as electrophysiology, is the universal mechanism that plants use to rapidly transmit the perceived stimuli from the environment through their senses to all other parts. Such a mechanism helps them react and adapt to environmental changes that manifest with variations of plant electrical potential¹.

In recent years, by using machine learning techniques, several studies have revealed the existence of signal patterns in the monitored plant electrophysiology that could identify the plant’s health status in the presence of either biotic or abiotic stimuli^{2,3,4,5,6,7,8,9}. Such studies demonstrate the potential of the monitored plant electrical response to enable an automated plant health alert system that could optimize today’s agricultural practice regarding yield and sustainability.

However, few of these studies^7,8,9 explore plant electrophysiological signals acquired outside a laboratory, i.e., under typical greenhouse growing conditions, which brings a more direct impact on the everyday agricultural routine. They rely on traditional machine learning models using local signal features extracted from relatively small windows of several seconds to 30 min. One of their main aims was to study the signal characteristics and the temporal extent to which these characteristics could discriminate the state when the plant is growing in normal conditions from the stressed state caused by an applied stimulus, such as drought, nutrient deficit, or pest attack. They report accuracies of more than 80% for distinguishing these two classes.

Nevertheless, calculating features could be time-consuming and introduce an important computational cost. Moreover, as a single feature portrays a specific aspect of the signal, it is an important simplification of the raw data.

Deep Neural Networks (DNNs) have taken a dominant place in the classification field with the emergence of novel deeper architectures and access to high computing power. In addition to their high performance, another advantage of the Deep Learning (DL) methods is the ability to automatically learn classification targets from the raw input data without needing a preceding preprocessing step to generate features⁵.

Among numerous fields of application, DL techniques possess an emerging but substantial impact on improving and optimizing current agricultural practices, from production monitoring and management to robust decision support^{10,11,12,13,14}. For instance, the use of Convolutional Neural Networks (CNNs) is the prevalent method for automated and accurate detection and classification of diverse plant diseases^15,16,17. However, this diagnostic-assisting methodology is mainly based on digital images, not a measure obtained directly from the plant and, therefore, could detect the presence of the disease once the symptoms are visible.

The DL techniques have been scarcely explored for classifying the electrophysiological signal recorded from the plants. One of the reported studies in this field⁶ used two-dimensional (2D) CNNs applied to images representing the Visual Rhythm of the acquired signal¹⁸. Due to the limited data, the related results were not as satisfactory. More recently, another approach proposes the application of 1D CNNs to augmented data obtained from the original recordings using a Conditional Generative Adversarial Network⁵. With this approach, the accuracy for identifying salt tolerance in wheat seedlings reached approximately 93%. Nevertheless, as previously stated, both of these studies explore signals acquired in controlled laboratory conditions using a Faraday cage.

The present study aims to apply DL techniques to electrophysiology signals acquired from tomato plants growing in typical production conditions, i.e., greenhouse, to explore the ability of these advanced end-to-end classification methodologies to identify the presence of stress caused by the nitrogen deficit in the provided nutrition solution. An additional objective of the presented study is to compare the DL techniques, in terms of accuracy and inference time, against the current state-of-the-art (SOA) approaches based on more classical machine learning algorithms that require precalculated features.

Methods

Experimental design

The experiment for plant data collection was conducted by Agroscope, it took place at the research station in Conthey (Switzerland) starting on the 12th of July 2019. It included 16 tomato plants (Solanum Lycopersicum) from the commercial variety Admiro (De Ruiter) growing in a soilless manner using coconut fiber substrate. In the present study, all methods complied with the relevant guidelines and regulations. In particular, the variety used in our research is not genetically modified and does not represent a risk to the environment.

The experiment was destined to determine the effect of the deficit of the macroelement nitrogen. Normal irrigation with a complete nutrient solution was applied to each plant from the beginning of the experiment. Two-thirds of the regular nitrogen quantity was cut from the alimentation on July 18th. Visual symptoms, such as thinner stems and light-green leaves, were observed five days after the nitrogen deprivation.

The electrophysiological signal was recorded with the 8-channel device PhytlSigns (Vivent SA, Gland, Switzerland) and eight pairs of electrodes. As described in the previous studies^7,8,9, the recorded signal represents the difference in the electrical potential measured between the stem, to which a ground electrode was attached, and a higher branch connected to the active electrode. It was stored at a sampling rate of 500 Hz.

Dataset

The collected data consists of time series of univariate samples expressing the electrophysiology of each plant throughout 15 days of recordings. The first four days, corresponding to the period before the deprivation of nitrogen, represent each plant’s normal, pre-stimulus state, whereas the ten days after the application of the stimulus the stressed state.

For a more straightforward presentation, the plants from the described experiment will be further referred to with their identifiers, namely B0–B7 and C0–C7, where each letter denotes the recording device, whereas the number represents the respective channel.

Initial visual data inspection showed that different segment lengths of the recorded signal portray the normal and the stressed state differently, as presented in Fig. 1. Namely, the differences between these states are more evident for windows of a few seconds than for segments of longer duration, such as several minutes or hours.

DL architectures

A recent study¹⁹ presents an extensive evaluation of the performance of nine different and commonly used DL architectures for time-series classification applied on 81 specific baseline datasets. Hence, this study indicates the DL architectures most likely to achieve high accuracy for the data considered in the presented work. Among the nine, four architectures providing the highest accuracy for the largest number of the studied datasets were chosen to be tested. The following subsection briefly describes these four architectures and their implementation.

Multi-layer Perceptron

The Multi-layer perceptron (MLP) is a feed-forward architecture consisting of fully-connected layers with non-linear activation functions²⁰. The used implementation integrates a dropout technique²¹ at each layer for regularization and rectified linear unit (ReLU)²² as the activation function preventing gradient saturation when the network gets deep. The network ends with a SoftMax layer to output the classification confidences for each class.

Fully-convolutional network

The fully-convolutional network (FCN) resides only on locally connected layers, decreasing the number of parameters to tune; therefore, such architecture requires less time to train²³. FCN is typically used in the semantic segmentation task, where they aggregate two paths: the first, called the downsampling path, performs the extraction of data information and their interpretation, whereas the second, the upsampling one, serves for localization²³.

As the carried-out task is classification, the used implementation encloses only the downsampling path that extracts data features at different levels of abstraction. The final classification is done with a dense layer. Moreover, the network represents a union of several blocs, where each integrates a convolutional layer followed by a batch normalization layer²⁴ and a ReLU activation function.

Residual Network architecture

Residual Network architecture (ResNet)²⁵ it’s a specific type of network formed from residual blocks, each incorporating skip connections to move over various, which enables the extension of the maximum depth of DNNs without degrading the accuracy level of the model. Such architecture allows the layers to learn the identity function considerably easier and to find an alternative path for the gradient flow, which addresses the problem of vanishing gradient in very deep structures.

The ResNet used in this work stacks three residual blocks followed by a global average pooling layer and a SoftMax layer.

Encoder architecture

The Encoder for time series is a CNN-based architecture that was established with the aim of building an encoder network able to learn representations that are generalizable for different types of data that were not used for the training²⁶.

In the used implementation, the CNN consists of three convolutional blocks with max-pooling layers between them. Each of these blocks is formed by a 1D convolution, followed by an instance normalization layer²⁷, a parametric rectified linear unit (PReLU) activation²⁸, and a dropout layer, as represented in in Fig. 2. After the last convolutional block, half of the filters are introduced as input to a time-wise SoftMax activation, that act as attention weights for the other half of the filters (Fig. 2). Finally, the result of the attention mechanism for all filters is passed through an instance normalization and a dense layer, with a SoftMax activation. Instance normalization was found to ease the training and to provide more consistent value ranges in the encoder’s output.

Building of DL classification models

Before training the described DL architectures, the stored raw signal was filtered at 50 Hz and 100 Hz to eliminate potential power-source perturbance. The application was carried out using the official implementation provided on the respective Github repository²⁹ on a machine with a GPU NVIDIA Quadro RTX 5000, CUDA 11.4, Windows 10 Enterprise, using tensorflow-gpu 2.5.0 and Python version 3.7.7.

Following the usual practice, the dataset was initially divided into a train set used for building the model and a test set for evaluating the model. To avoid eventual bias in the evaluation^8,9, the data separation was done with respect to the plants. Namely, the samples from the plants B0, B5, and C5 formed the test set, whereas the samples of the remaining 13 plants constituted the train set. This choice of plants was randomly made.

Four days of recordings were taken from each plant to build these two sets. The first two represent the period of normal growing conditions, i.e., the normal state, whereas the remaining 48 h portray the stage when visual symptoms appeared due to the lack of nitrogen—the stressed state.

Choice of architecture

Within the first step of this study, preliminary modeling with the four architectures presented previously was performed to identify the most suitable algorithm, in terms of performance and accuracy, for the classification of the given plant electrophysiological data.

As previously described, all these architectures differ between them and involve a different number of parameters and layers. Hence, one should expect that the time each one would take to train is different. To make a fair comparison, the time for training for all four models was limited to 1 h and 30 min.

Signal windows with a length of 4 s were the input of each architecture, as initial exploration has shown that several seconds provide enough information to discriminate the plant states. Additionally, in preparatory analyses, similar metrics were obtained for windows of length 1 s, 8 s, and 16 s. Given that the sampling frequency is 500 Hz, each window consists of 2000 samples of the raw filtered signal.

Selection of hyperparameters

Several workflow designs and model parameters were explored to enable an optimized learning process during the training step.

Normalization type. Through a visual exploration of the data, it was observed that the time series of each plant represents a different range of values, independently of the stressed state. Therefore, using the data without normalization could engender learning of spurious relations between the scale of a plant and the target. Another reason to normalize the data is to get better numerical stability in the training. As in the previous step, the window length used for this analysis was 4 s. The following approaches were compared:
- No scaling. To have a baseline of how the model would learn without normalization, the first approach involved training with input windows without any preprocessing after filtering.
- Scaling per plant array. A typical way of normalizing numeric data is scaling the input values in the range between 0 and 1, using that data’s global min and max values. The scaling employing the extrema values represents the min-max normalization. Before performing this normalization on an individual plant array, the extrema for that plant were first calculated.
- Scaling per window. Another way to normalize the data is to employ the min-max normalization method on each input window individually. Such scaling would modify the structure of the time series but could also help the network not to overfit regardless of the temporal evolution of the time series. Additionally, this approach does not require knowledge about the time series data outside the given window. This normalization was done in all windows representing the data of a plant.
- Subtract mean per window. The scaling performed individually on each window modifies its range and variance, which could be an important discriminative feature of the time series. To study this, another normalization approach involved only subtracting the mean value from each respective window without scaling it. In a similar manner as in the previous case, the normalization was performed for the entire data of a plant.
Window length. The length of the signal sample is closely related to the extent of the discriminative information provided by the data. Consequently, the choice of window length affects the model’s performance. The selection of window lengths to explore was based on the visual data observations previously stated (Fig. 1), suggesting that the discriminative information is portrayed by windows of a few seconds. Namely, the selected candidates for tunning this parameter were 1 s, 4 s, and 16 s. Additionally, to examine the performance of longer windows with less granularity, a window length of 30 s downsampled by a factor of 3 was also considered.

Simplification of the signal

Although the initial idea of the study was to explore the original raw signal, a visual exploration of the data resampled at a lower rate showed more evident differences between the plant’s normal and stressed states. To highlight these differences, we integrated another preprocessing step involving smoothing and resampling the signal. The employed methodology was empirically established and involved the following steps:

Take windows of S seconds following a trade-off of the length to be sufficiently large to include the short signal oscillations and sufficiently small to reduce the tendency to overfitting. Such a trade-off could be established for the given data when S is set to 16 s.
Perform a rolling median on the signal with a window of N samples. This step simplifies the signal by eliminating high-frequency variations, potentially either noise or information without a discriminating value, which could perturb the model-learning process. For the performed analyses, N was chosen to be 50, as it leads to a smoothed signal that preserves the general structure or the envelope of the original signal.
Downsample by a factor of F. As the output of the previous step is a smoother signal, downsampling could be applied at this stage without losing the signal shape and structure. When F is set to 10, it enables such a result.

Figure 3 visually represents the performed signal transformation. The steps preceding the downsampling help smooth the signal sufficiently, which is imperative for obtaining an accurate representation of its lower-frequency dynamic.

Combination of the predictions

The particularity of the analyzed temporal series representing the electrophysiological signal of tomato plants is the absence of sudden changes from the normal state to the stressed one and vice versa. Consequently, the plant state for each sample is very likely to be the same as the neighboring samples. Hence, by combining consecutive predictions, one should expect their confidence to increase without interfering with the model providing these predictions.

The process of implementing this idea relies on two choices:

Method to combine the predictions. Different functions could be used for combining the consecutive predictions. However, in this study, only the mean and median of the network’s output confidences will be considered for simplicity.
Length of the prediction sequence. To make the system causal, the previous L predictions are combined to obtain the prediction for the current window. To assess the importance of the length, the chosen L values are 10 and 1000 predictions as they represent a signal of several minutes or several hours, respectively.

To summarize, the entire data underwent several preprocessing steps: notch filtering, simplification, windowing, and normalization. The model was built on the training data, and its application to the testing set provided the prediction on unseen data. The prediction of N consecutive samples was further combined, which generated an updated prediction. Figure 4 shows a diagram representing the proposed workflow.

Leave-one-out cross-validation

To get an unbiased evaluation of the model’s performance on an unseen set, a leave-one-out cross-validation (LOOCV) was performed. As the dataset encloses 16 plants, a total of 16 distinct models were built. The training set for each model included 15 different plants, and the model was evaluated on the remaining one. Prior to this analysis, the signal of each plant underwent the simplification transform described previously.

Comparison with the state-of-the-art

The DL-based classifier was compared with the recently proposed approach by Najdenovska et al.⁸ for classifying plant electrophysiological signal based on features extracted locally from the raw data. This approach uses the XGBoost³⁰ algorithm to build the classification model.

The features-extraction step requires choosing the length of the window in which the features will be calculated. The number of windows regulating the extent of the feature space is also a parameter to be selected for optimizing the classification outcome. In total, 34 features representing temporal, frequency, or time-frequency signal characteristics were calculated in each window⁸. Following preliminary studies, the highest accuracy was obtained for 15 windows with a length of 60 s, which aligns with previously reported findings⁸.

To make a reliable performance comparison, the chosen SOA approach was built and evaluated on the same train and test set as the end-to-end DL-based one. The XGBoost parameters, such as the number of trees, the max tree depth, the regularization terms lambda and alpha, and the subsampling fraction, were tuned using a custom cross-validation process where each fold was represented by the samples calculated from one plant forming the training set, as described by Najdenovska et al.⁸.

Besides the accuracy, another metric to compare these approaches is the inference time that would allow for assessing if it is feasible to predict in real-time with these models. The inference processing for Neural Networks is similar to the training as they process data in batches, which are batches of windows in the present study. These batches are processed at once for both training and inference processing. However, for training, they are all used to perform one step. Current practice often takes advantage of this to optimize the inference time when performing a prediction for more than one window, as passing one batch of N windows through the Neural Networks is faster than passing N batches of one window. Regarding the SOA approach, along with the employment of the XGBoost model, the features’ calculation is also part of the inference process.

Results

Building of classification models

The framework for building the classifier was set following the results for different tested scenarios.

Choice of DL architecture

Although 1 h and 30 min might appear as a short interval to train a DL model, the performed preliminary training was not a computing-demanding task, which allowed all four models, based on a different architecture to converge within that interval. Moreover, the encoder-based model could complete one epoch for the given time, while the other three went up to two. In fact, the encoder-based model involved a considerably larger number of parameters to learn. Table 1 lists the number of parameters used by each model, respectively. However, in terms of accuracy, the model built with the encoder architecture outperformed the other three (Table 1); therefore, this architecture will be used in the further analysis.

Table 1 Training and testing accuracies for the different proposed architectures.

Full size table

The pronounced differences between the training and testing accuracy of the built models (Table 1) indicate their tendency to overfit, which was an expected outcome since the training did not include any optimization for improving and generalizing the learning process.

Selection of normalization approach

The data standardization analyses demonstrated that different types of normalization affect the model performance differently. Table 2 summarizes the obtained results. More precisely, compared to the case without normalization, using any normalization approach decreases the difference between the training and the testing accuracies and improves the model performance. Despite similar results, the highest average testing accuracy is obtained with approaches that only consider the values in the given window. Additionally, the difference between the training and the testing accuracy is smaller for the min-max normalization applied to the window’s values than for the approach involving the subtraction of the mean values, which sets the choice for the normalization approach employed in the subsequent analyses. Furthermore, such an approach necessitating only the values of a given window is more easily employable in actual growing conditions.

Table 2 Accuracies of the models built using different normalization approaches.

Full size table

Window length

Models built with windows of lengths four times smaller and four times larger than the length taken in the previous analyses (4 s) gave similar results but without achieving better classification accuracy than that obtained with the length initially tested (4 s). Such results, provided in more detail in Table 3, suggest that the window length of 4 s could be considered a local optimum for these analyses.

Table 3 Accuracies of the models built using different window lengths.

Full size table

However, an improvement of around 2% was observed with windows of length 30 s downsampled by a factor of 3 (Table 3). This finding suggests that data with less granularity could portray more enhanced discriminative information that would furthermore improve the model performance, which is in line with previous visual observations leading to the analysis using the simplified signal.