Genetic algorithm optimized node deployment in IEEE 802.15.4 potato and wheat crop monitoring infrastructure

This proposal investigates the effect of vegetation height and density on received signal strength between two sensor nodes communicating under IEEE 802.15.4 wireless standard. With the aim of investigating the path loss coefficient of 2.4 GHz radio signal in an IEEE 802.15.4 precision agriculture monitoring infrastructure, measurement campaigns were carried out in different growing stages of potato and wheat crops. Experimental observations indicate that initial node deployment in the wheat crop experiences network dis-connectivity due to increased signal attenuation, which is due to the growth of wheat vegetation height and density in the grain-filling and physical-maturity periods. An empirical measurement-based path loss model is formulated to identify the received signal strength in different crop growth stages. Further, a NSGA-II multi-objective evolutionary computation is performed to generate initial node deployment and is optimized over increased coverage, reduced over-coverage, and received signal strength. The results show the development of a reliable wireless sensor network infrastructure for wheat crop monitoring.


Problem definition
This section presents real-time WSN deployment setup details in wheat and potato crops, i.e., Experiment-1 and Experiment-2. Further, the problem has been analyzed to develop an effective NDS by discussing experimental observations. Experimentation setup. The  The period of observation started on 28 June 2019 , with sowing of kufri-jawahar potato and ended on 30 October 2019 . The soil composition, and adopted potato and wheat plantation strategy is presented in reports by Panigrahi et al. 31 and Devi et al. 32 , respectively. Sensor architecture. The devised sensor node n i prototype is shown in Fig. 1. The n i control-unit is designed using CC2538 wireless Microcontroller System-On-Chip for 2.4-GHz IEEE 802.15.4 and is powered by two parallel-connected Panasonic CR1632 − 3V lithium coin cells 33 . The CC2538 transceiver output power, which has a receiving sensitivity −97 dBm in low-gain mode, is programmed to 7 dBm and is connected to a 3 dB gain planer inverted F-antenna 34 . The alignment of the XY-plane of the antenna in deployed n x i is orthogonal to the XY-plane of T x | x ∈ {wheat, potato} . A node is equipped with a resistive soil moisture sensor, photoresistor, and TMP36 temperature sensor. To protect the nodes from environmental hazards, they are enclosed in a PVC enclosure and installed on top of a hollow aluminum tube, which, upon deployment in T x , gives the n x i antenna an elevation of 32 cm from ground. Path-loss model and RSSI evaluation. The CC2538 has built-in RSSI functionality, which calculates an 8-bit signed digital value and can be automatically read from the received frame or incoming packet. The RSS value captured by RSSI is a 2s-complement signed number on a logarithmic scale with 1-dB steps. An offset is added to RSSI value to find actual signal power P accurately, i.e, P = RSSI − offset(dB) . The RSSI offset value in contiki for CC2538 is set to 73 dB . The measured RSSI readings for wheat crop in sowing and flowering stages are shown in Fig. 2. The Discrete Cosine Transform (DCT) interpolation technique has been used to develop the RF map from the RSS-sample dataset 35 . The employed path-loss model is the Log-Normal Shadowing and is represented by Eq. 2.
(2) PL d (ni,nj) =PL d (ni,no) + 10η log 10 d (ni,nj) d (ni,no) + χ σ www.nature.com/scientificreports/ where PL d (ni,no) is the reference path loss between node n i and n o at distance d (ni,no) , η is the path loss exponent and χ σ is a Gaussian distributed random variable with zero mean and σ standard deviation. For the calculation of η , the collected empirical measurements of RSS are analyzed using Eq. 4.
where P rec d l (n ref ,n i ) is the received power and P est d l (n ref ,n i ) is the estimated power at distance d l . The value of η is identified by minimizing the mean square error between P rec d l (n ref ,n i ) and P est d l (n ref ,n i ) presented in Eq. 4. The RSS samples are taken from database acquired through continuous and periodic measurements. Continuous measurements were collected using sensor nodes deployed in Experiments 1 and 2 with a 30 minutes sampling rate. The periodic measurements were collected at intervals of 4 days using the HSA2030 Spectrum analyzer, operating in zero-span mode. The data sets are classified based on the distance d (ni,n ref ) from a reference node n ref in different growing stages of the target crop. The selected n ref is placed at the nucleus of T x | wheat, potato , and the remaining n x i is positioned around n ref using approach by Wu et al. 11 . To obtain η , the RSS data is analyzed by evaluating a total of 1500 and 1600 samples in Eq. 4. The η calculated for two crops in different growing stages is presented in Table 1. The range P r lim d (ni,nj) attained by CC2538 sensor node with −97 dBm sensitivity in the free space environment, after incorporating the losses incurred by the enclosure, is measured to be ≈ 63 m .  www.nature.com/scientificreports/ The Outage probability Prob P r d (ni,nj) < P r lim d (ni,nj) of 93% at −75 dBm with margin P r d (ni,nj) − P r lim d (ni,nj) of 8 dBm is employed and is identified using formulation presented in Eq. 5.
With a sensitivity of −75 dBm in the free-space environment, the measured transmission range of node n i is identified to be ≈ 48 m.
Software tool: The prepossessing of collected RSSI measurement is done using Hadoop and Spark environment. The environment is setup on an Intel Xeon Processor E5 Family workstation which is running an Ubantu 18.4 operating system. The path loss model is implemented using Python programming language and the results are generated using MATLAB numeric computing environment. Experiment observations. In both the experiments, nodes were deployed before the crop sowing stages using the strategy presented by Wang et al. 36 . The key observation in Experiment-2 over wheat crop was the reduced RSSI measurement over time, eventually leading to link drop and network dis-connectivity. This phenomenon has been observed to occur in three stages of wheat growth, namely terminal-spikelet initiation, heading, and physiological maturity. The network dis-connectivity caused by link drop was due to an increase in signal attenuation from growth in vegetation density. Since the initial deployment was done on the fact that η = 1.85 , the receiving sensitivity of −75 dBm between n i and n j was estimated to be at a distance of ≈ 48 m . However, factors such as vegetation height, density, and plating strategy have affected η . When measured using Eq. 4 in maturity stage, it is found to be η = 5.93 , which is ≈ 8 m at −75 dBm receiving sensitivity for uninterrupted link communication. On the contrary, the RSSI measurement for potato crop through out the season was consistent.The measured η in the sowing and maturity stage was found to be, 1.83 and 2.76. The Kufri Jawahar potato plant, which can attain elevation up to 34 cm at maturity, has little effect on the transmitted signal of n wheat i , as it is located at the height of 32 cm from the ground and is out of the vegetation canopy. Summary: Two conclusions were drawn based on the observation made in Experiment 1 and 2 and are as follows: • An NDS developed for crop X with η x , if used in crop Y with η y , may cause network over-coverage in the case of η x < η y and network dis-connectivity if η x > η y . • The path loss coefficient η z in target crop T z should be identified before the NDS z formulation.
In addition to the work done in Ndzi et al. 24 , Ding et al. 18 and Olasupo et al. 19 we have collected RSS samples for a crop cycle in a multi-hop communication scenario for better path loss modeling in potato and wheat crop. Next to this, we have developed an optimal node placement strategy to deploy a practical real-time crop monitoring infrastructure by integrating derived PLC to NSGAII-NDS.

NSGA-II based NDS optimization (NSGAII-NDS)
In this section, a multi-objective crop dependent node deployment strategy NDS z is proposed. The flow-chart of proposed model is presented in Fig. 3. The outcome is a set of n x i lat , n x i lon | n x i ∈ N x , x ∈ wheat, potato , optimized using elitist Non-dominated Sorting Genetic Algorithm (NSGA-II) on coverage, over-coverage, and RSS. The NSGA-II is widely used in many application scenarios due to diversity in solution and more desired convergence near the true Pareto optimal set 37 .

Chromosome representation. Genetic Algorithm (GA)based optimization is comprised of chromosome
Chr x representing a possible solution NDS z and Population Pou T x , which is a collection of these chromosomes. The Chr x in GA is derived from the phenotype, a single design factor composed of a domain of values. In the given scenario, node's latitude and longitude value n x i lat , n x i lon and transmission range T N x are Phenotype design factors that are mapped to Genotype using Real value Encoding with Binary Codes (REBC). The phenotype-genotype relationship for NDS wheat is shown in Table 2. Genomes in a binary string form are converted to real value using the REBC reverse mapping rule. The REBC mapping schema transforms a continuous real-val- and is derived as follows: and is derived as follows: The range of T x and n x i lat , n x i lon design variables, X PHE min , X PHE max | X PHE min < X PHE max , is derived from Table 2. The initial values of the design variables are randomly generated within the predefined range and then optimized over coverage, over-coverage and RSS in sorting, mutation and crossover phases of the NSGA-II.
Multi-objective functions . The   www.nature.com/scientificreports/ www.nature.com/scientificreports/ Third objective O RSS , Eq. 10, aims to increase the received signal strength between two sensor nodes. The O RSS ensures the connectivity by identifying the distance at which the RSSI remain consistent throughout the lifetime of network. The PL d (ni,nj) is the path loss between node n x i and n x j , and is formulated in Eq. 2.
Initial population generation. The Initial Population Generation (IPG) initiates the NSGA-II operation by seeding a set of possible solutions from a universe of solutions in between the lower and upper bound of the design variables. The minimum and maximum range of design variables required for IPG operations were identified and are presented in Table 2. In order to avoid premature convergence, the diversity in IPG needs to be maintained, and this has been achieved by the heuristic initialization of the population, followed by a probabilistic distribution. The approach's fundamental design components, i.e., search space, number of individuals, problem difficulty and fitness functions, and influencing solution diversity, have been taken into account during the IPG process. The former seeding approach avoids premature convergence of T N x and n x i lat , n x i lon optimization. The population's diversity has been evaluated at three levels, i.e., gene level, chromosome level, and population level. The gene-level diversity formulation in Eq. 11 is a bias measure P bias t presented by Diaz-Gomez et al. 38  I t 2 + 1 2 , features a more diverse population generation. We have employed the Continuous Uniform probabilistic Distribution (CUD) for population initialization, given that the generated chromosomes satisfy the constraint. The Probabilistic distribution function of a uniform distribution over the interval X PHE min , X PHE max is given as: where X PHE min and X PHE min are the minimum and maximum range of the generated random variable. The mean and variance of CUD function are given as  www.nature.com/scientificreports/ The chr x [i] sorting operation in Eq. 15, starts with identifying all individuals on the first non-dominant front and fixes their rank by 1. Then the chr x [i] belonging to the second non-dominant front is identified and set to rank 2. This process continues until all fronts have been identified. To filter I , binary tournament selection has been used. This approach randomly selects two chromosomes chr x [i] and chr x j and compares them based on rank and crowding distance. If the ranks are different, the one with the lowest rank is chosen. If they are of same rank, one with a higher crowding distance chr x CD (i, r) is selected. The process is continued until N out of I chromosomes is selected. Initially, the crowding distance of the first and last chr x [i] of the front is set to infinity. Crossover and mutation operation. A crossover operation combines two chromosomes to produce a new offspring. In the proposed work, chromosomes selection by crossover is based on the roulette wheel probabilities 39 . This greatly increases the likelihood of optimal solution selection and is based on fitness quality. To generate the combined solutions, a random crossover approach is applied. Which uses a single point strategy to produce two offsprings chr . The composition of other offspring is:  to implement NDS z , and hence the approach Technique for Order Performance by Similarity to Ideal Solution (TOPSIS) has been used 40 .

Results and discussion
The proposed NSGA-II based node location optimization is performed and the chromosomes are compared over objective functions, coverage, over coverage, and RSS. A measure of the percentage of area covered and over-covered is presented to illustrate the effectiveness of individual chromosomes. A comparison with approach DT-NDS developed by Wu et al. 11 , is performed over metric RSS and coverage in target area T x | x ∈ {wheat, potato} . The NSGAII-NDS outcome (n x i lat , n x i lon ) | ∀n x i ∈ N x obtained after the TOPSIS operation was employed in the target areas. The measurements were obtained for coverage and over-coverage in ] . The measurements for T wheat were collected at sowing and maturity stages, and are presented by Fig. 4a,b. Similarly, for the DT-NDS, the measurements in T wheat are presented in Fig. 5a,b. The RSS measurement in DT-NDS suffers more degradation than in NSGAII-NDS. Since the initial deployment in DT-NDS was done in bare land, the distance between the two nodes was more due to η being equal to 13, and this increased the likelihood of additional RSS degradation. In the floral-initiation, terminal-spikelet-initiating and heading stages, the counter value of NSGAII-NDS is higher than DT-NDS. However, in the Grainfilling period, the DT-NDS has experienced network disconnectivity due to node isolation inception. On the other hand, η based NSGAII-NDA strategy accounted for the possible signal degradation and outage probability threshold, resulting in a reliable IEEE 802.4.15 2.4GHz infrastructure for wheat crop monitoring.
The wheat crop under observation is of the MP-3173 variety, which is in the category of medium height vegetation and was planted with an optimal row spacing of 22 cm . Furthermore, according to Köppen's climate classification, the plantation location is a humid subtropical climate, with the highest and lowest temperature recorded from June 2019 to April 2020 was 49 • C and 1 • C , respectively . The problem formulation and results of NSGAII-NDA may vary if the crop is grown in a different geographical area with a different variety or plantation strategy. For example, if the wheat sowing is delayed, a closer spacing of 15-18 cm is practiced, resulting in increased density per square meter. The change in density may affect the calculated path loss coefficient. To develop a comprehensive node placement strategy, path loss coefficient needs to be identified in all possible combinations of factors that can affect the receiving capability of two transceivers. Following the former goal, future works will be directed toward the collection of path loss coefficient measurements in different wheat crop varieties in different geographical regions. Furthermore, sensor nodes n x i in WSN had a homogeneous transmission range T x and could be extended to a heterogeneous T x implementation. The integration of a self-adjusting T x strategy into NSGAII-NDA can reduce over-coverage in the early stages of the plantation. This can be achieved by gradually increasing the T x of nodes with an increase in PLC.

Conclusion
This article proposes a reliable NSGA-II optimized Node Deployment Strategy (NDS) in the IEEE 802.15.4 wireless infrastructure for potato and wheat crop monitoring. The relationship between vegetation cover and signal attenuation for 2.4 GHz radio frequencies has been analyzed in detail through real-time experimentation. The results of the experiment led to two significant findings; First, when the monitoring infrastructure for the wheat crop uses the NDS that was originally developed to monitor the potato crop, faces network dis-connectivity due to increased signal attenuation which is caused by growth in vegetation cover. The second finding is inferred from the first conclusion and states that it is necessary to identify a Path Loss Coefficient (PLC) in the target crop before developing NDS. The PLC has been identified at various growing stages of potato and wheat crop through empirical measurement campaigns. The implementation of the derived PLC in Lognormal path loss shadowing model was subsequently integrated into proposed NSGAII-NDS to optimize NDS over coverage, over-coverage, and Received Signal Strength (RSS). The significant difference between the NSGAII-NDS and the existing NDS strategy is that the PLC for a crop to be monitored is accounted before deployment, eliminating the possibility of a link break between two sensor nodes due to increased vegetation cover. www.nature.com/scientificreports/