Design and development of an open-source framework for citizen-centric environmental monitoring and data analysis

Cities around the world are struggling with environmental pollution. The conventional monitoring approaches are not effective for undertaking large-scale environmental monitoring due to logistical and cost-related issues. The availability of low-cost and low-power Internet of Things (IoT) devices has proved to be an effective alternative to monitoring the environment. Such systems have opened up environment monitoring opportunities to citizens while simultaneously confronting them with challenges related to sensor accuracy and the accumulation of large data sets. Analyzing and interpreting sensor data itself is a formidable task that requires extensive computational resources and expertise. To address this challenge, a social, open-source, and citizen-centric IoT (Soc-IoT) framework is presented, which combines a real-time environmental sensing device with an intuitive data analysis and visualization application. Soc-IoT has two main components: (1) CoSense Unit—a resource-efficient, portable and modular device designed and evaluated for indoor and outdoor environmental monitoring, and (2) exploreR—an intuitive cross-platform data analysis and visualization application that offers a comprehensive set of tools for systematic analysis of sensor data without the need for coding. Developed as a proof-of-concept framework to monitor the environment at scale, Soc-IoT aims to promote environmental resilience and open innovation by lowering technological barriers.


Scientific Reports
| (2022) 12:14416 | https://doi.org/10.1038/s41598-022-18700-z www.nature.com/scientificreports/ long-term sustainability of low-cost sensor solutions for environmental monitoring. The predominant focus of most of the studies has been on data collection and analysis. This could be partly because most of these sensor studies are conducted in regions that have significant resources and infrastructure 45 . This paper addresses these challenges by describing the design, implementation, and potential impact of a social, open-source, and citizen-centric IoT (Soc-IoT, pronounced as 'Society') framework Fig. 2. Soc-IoT is proposed as an open-source 46 environmental monitoring and data analysis framework that encourages collective  www.nature.com/scientificreports/ and participatory action as well as social impact. It comprises of two key components that are specifically designed and developed to address the issues raised previously in the paper. The first component is the CoSense Unit which is a modular and open-source environment sensing device that can provide consistent and reliable air quality data. It has been thoroughly tested and validated in a real-world setting and evaluated by co-locating with a Swiss government environmental monitoring station. The carbon footprint and energy usage of these lowcost gadgets are also examined to determine the CoSense Unit's environmental sustainability. The framework's second core component is the exploreR, an open-source RShiny-based data analysis and visualization application. The app is intended to lower technological obstacles, particularly those connected to programming, by allowing citizens and specialists alike to examine and interpret sensor data in a useful way. To address the critical issue of collaborative environmental sensing, the entire framework is designed to establish an innovative ecosystem that encourages cooperation, sustainable practices, and inclusivity.

Methods
This section describes the methodology behind the design of the proposed Soc-IoT framework. The following paragraphs provide a detailed overview of the system architecture, sensor prototype, and data analysis application.
System architecture. The Soc-IoT framework is based on the principle of open-source hardware and software. Figure 3 shows the system architecture of the proposed framework. It comprises four major components: • Data Acquisition Layer: This layer consists of the sensors that are responsible for sensing the environmental variables monitored by the CoSense Unit. The current version of the CoSense Unit consists of a Sensirion SPS 30 PM sensor that can sense PM1, PM2.5, and PM10. The Enviro+ board for Raspberry Pi is used to monitor temperature, pressure, humidity, light intensity, noise, and gas concentration (NO2, NH3, and CO). As the codes for these sensors are open-source, the users can easily reprogram the sensors based on their requirements as well as examine and verify the sensors without any complications. More details about the hardware components are available in the next section. • Data Processing and Communication Layer: This layer is responsible for processing and integrating data from different sensors and communicating it to the data storage layer. A Raspberry Pi Zero handles all the functions related to data processing and communication. The Wi-Fi module of Raspberry Pi Zero is used to create an access point that allows a continuous flow of data from the Raspberry Pi Zero to the data storage layer. Different data transmission protocols were considered for data transmission. The current version of the CoSense Unit uses the Hyper Text Transfer Protocol (HTTP) due to its high transmission reliability and infrastructure 40,47 . • Data Storage Layer: This layer is responsible for securely storing the data. The current version of the framework allows two storage options. Either the data can be directly transmitted to the ThingSpeak database or the user can save the data locally on the SD card that comes with the Raspberry Pi. This is beneficial in case of unavailability of the internet to send the data to the ThingSpeak cloud. The users can simply upload the data from the SD card to their data stream at a later stage. This also provides more control to the users over their data. If the users prefer not to share their data, they can opt out of making their data stream public and use the data from the stream and the SD card for their information. • Application Layer: The data from the storage layer is used to create applications that are used to make sense of the raw data. This includes data streams, visualizations, and data analysis applications. The Soc-IoT framework includes two core applications: (1) ThingSpeak dashboard that allows a user to create data streams, visualize data, and use Matlab functions to perform data analysis. (2) An R-based application that allows a user to do data processing, analysis, visualization, and performs Machine Learning (ML) on the data. Section 3 includes more details about the applications.
Hardware implementation. Despite the fact that the quality of one's environment has a significant impact on one's health, most people are unaware of it 48 . The majority of harmful pollutants, for example, are colorless and odorless, making it difficult to determine their actual levels. As a result, having an efficient system that quantifies pollution levels and provides feedback is critical. Objective measurements and easily understood visualizations could assist people in consciously processing -and, if necessary, adjusting -air quality, lighting, and noise levels. In other words, objective measurements are required to induce behavioral change. The CoSense Unit is the hardware component of the Soc-IoT framework that is responsible for indoor and outdoor environmental monitoring. It has been designed using state-of-the-art sensors and a single board computer.   49,50 . The SPS30 is capable of monitoring PM1, PM2.5, PM4, and PM10 using the light-based scattering principle. The current version of the CoSense Unit is programmed to monitor PM1, PM2.5, and PM10. In addition to the SPS30 sensor, a sensor array called Enviro Plus that has sensors like BME280 (temperature, humidity, pressure), MICS6814 analog gas sensor (NO2, NH3, and CO), LTR-559 light and proximity sensor, and a MEMS microphone (noise) is also added to the CoSense Unit. It also includes the ADS1015 analog to digital converter for converting data from the analog gas sensor and a color LCD. The data produced by the analog gas sensor is in kOhms, which is not the standard unit for gas concentration monitoring. The sensor program converts it into parts per million (ppm) to get an indicative value. Due to a lot of conversion processes, it is difficult to precisely validate it with a regulatory or industry-grade monitor. Nevertheless, the values from the gas sensor can be used as indicative values for understanding how the concentration is changing in a given environment, as highlighted by many studies 51,52 . Enviro Plus is particularly efficient due to its small size, seamless sensor integration, and compatibility with single board computers like Raspberry Pi. The CoSense Unit uses a Raspberry Pi Zero to communicate with the sensors using the General-Purpose Input Output (GPIO) ports. As Raspberry Pi has multiple GPIO ports, it allows flexibility to add more sensors based on the requirement of a user. Figure 4 shows the detailed view of the CoSense Unit with components and annotations. All the components are housed within a 3D-printed enclosure. The CoSense Unit is powered using a USB cable to provide a 5V supply. The users have a choice to use an adapter or a power bank for powering the Raspberry Pi. This allows the device to be used flexibly for mobile or stationary environmental monitoring.

Software implementation. The CoSense Unit uses a Raspberry Pi Zero to communicate with sensors
and handles tasks related to network creation, data transmission, and storage in an SD card. Figure 5 shows the flowchart of the CoSense Unit source code. The CoSense Unit source code is written in Python programming language 53 and uses standard sensor libraries to communicate with the sensors. As shown in Fig. 5, once the Raspberry Pi is powered on, it goes into the set-up mode. The Wi-Fi module of the Raspberry Pi goes into the Access Point (AP) mode and allows the user to connect to the device's Wi-Fi network. Once this connection is successful, the users are redirected to a web interface that allows them to connect to a secure Wi-Fi network. The device automatically saves the Wi-Fi credentials that allow the device to connect to the saved Wi-Fi network in case of a reboot. In case no Wi-Fi network is available, the device goes into offline mode. In either case, the sensors are put in active mode following the connection test. The sensors stay awake for 30 seconds and do the measurement. The measured data is stored in the Raspberry Pi's SD card in CSV format. When the device is in an online mode, an HTTP connection is created and the measured data is sent to the ThingSpeak server using the GET request. Once the acknowledgment is received from the server, the connection is closed. To secure the data transmission, private keys are generated by ThingSpeak before a data stream can be created. The LCD screen shows the data values from the sensors. The availability of online and offline modes allows continuous sensing of data. It is also useful in case environmental monitoring needs to be done in a remote location without internet connectivity. The current version of the prototype measures data every 5 minutes and goes to sleep mode after the measurement. The users can change the sampling frequency based on their needs.

Results and discussion
This section describes the criterion that was used to validate and evaluate the performance of the CoSense Unit, specifically focusing on PM2.5 concentration. The results are followed by a discussion to understand how the prototype works in real-life conditions. This section also looks into the design and development of the data analysis and visualization application and how the proposed setup compares with existing environmental monitoring infrastructures.

Sensor validation.
Sensor validation is a key step in the development of environmental monitoring infrastructure. There are different ways to perform quality assurance and control of a sensing unit. This study followed a standard approach for validating the sensor by looking at the inter-sensor variability and comparing the sensor output with the official air quality monitoring station [54][55][56] . Field co-location During the summer of 2021, two CoSense Units were tested in the field in Zurich, Switzerland. To analyze the accuracy of the sensors and evaluate the performance, two units were collocated at one of the sites of the National Air Pollution Monitoring Network (NABEL). NABEL monitors air quality at 16 sites in Switzerland. For this study, the sensor units were collocated at the NABEL station in Dubendorf. Figure 6a shows the location of the test site. Figure 6b shows the actual setup of CoSense Units for colocation at the NABEL reference monitoring station. The station is located in a suburban location. The area is densely populated with a network of heavily used roads and railway lines. The field test was conducted between 4 June 2021 and 8 June 2021. The PM2.5 was sampled every five minutes and it was averaged for 1 h to maintain consistency with the PM2.5 data obtained from the reference monitor. Overall, the data was compared for 100 h. Figure 6c presents a line plot that compares the data obtained from two CoSense Units (denoted by Sensor 1 and Sensor 2) and the reference monitor. It can be observed that the CoSense Units can match the variations recorded by the reference monitor. This highlights that the CoSense Unit can successfully capture sudden variations in PM2.5 concentration in a real-world environment. The average error between the PM2.5 recorded by the reference monitor and Sensor 1 was 1 µg/m 3 . In the case of Sensor 2, it was 1.2 µg/m 3 .
The error value is very low and shows high accuracy and reliability of the data sensed by the CoSense Units. Figure 6d shows the empirical cumulative distribution function (CDF) to understand the PM2.5 measurement offset between the reference monitor and the two sensors. It can be observed that more than 85% of the observations have an offset below 5 µg/m 3 . A statistical summary of the co-located data is presented in Table 1. The statistical parameters show strong similarity between the data obtained from the reference monitor and two CoSense Units.
Inter-unit variability Inter-unit variability is an important method to measure the similarity of data produced by the same sensor units. It is a useful metric that has been widely used to measure the data reproducibility of sensor units 40,54 . For this study, two CoSense Units were collocated and the PM2.5 data were analyzed to understand the similarity in data reported by the two units. The study was conducted between 3 August 2021 and 31 August 2021. Figure 7 shows the line plot based on the data obtained from two units. The data from both the units show a similar trend, except for some outliers. The data was sampled every 5 minutes. For analysis, the data were aggregated to hourly data. Two units were compared for a total of 681 h. As observed in Table 2, the data from the two units showed high similarity. The comparison showed similarities in the observed mean and standard error. Strong linearity was observed over the entire range of hourly averaged PM2.5 data.
Sensor sustainability analysis As discussed earlier in the Introduction, the environmental sustainability of IoT devices is also a critical component when discussing resource efficiency. Most sensors-related studies usually look into the power consumed by the sensors to address the environmental sustainability of low-cost sensor technology. This work looks at environmental sustainability through a different lens by examining the energy consumption of the IoT device as well as understanding the carbon footprint of the sensor code. To the best of our knowledge, there is no work within air quality monitoring literature that looks into this aspect of sensors. This can potentially help in promoting sensor code optimization as well as resource-aware IoT deployment. For this study, the focus was on two parameters: Emissions (Emissions as CO 2 -equivalents, kg of CO 2 emitted per kilowatt-hour of electricity) and Energy Consumed (power consumed in kilowatt-hours). A CoSense Unit with a To put these values in context, watching Netflix for half an hour produces 0.4 kg of CO 2 57 , and running an air purifier for 12 h would use 0.60 kilowatt-hours 58 . These values can give us an idea about how properly designed and optimized sensors can potentially be used in a sustainable way for monitoring the environment in the long run.

Data analysis and visualization.
A key part of any IoT infrastructure is an intuitive and efficient data analysis and visualization platform. IoT devices produce a massive amount of data and to make sense of such that it is important to have user-friendly platforms that can be easily used by experts as well as non-experts. Soc-IoT framework provides two options to visualize and analyze sensor data. The first option uses the in-built data analysis and visualization feature of the ThingSpeak platform. It allows the users to visualize data in real-time, create interactive graphs, set alerts, and statistically analyze the data using MATLAB functions. In addition to this, another non-sensor-specific sensor data analysis and visualization application called exploreR is proposed.
exploreR is an open-source online application that has been developed using the Shiny package in the R programming language. RShiny package has been widely used in recent years to create interactive applications for data analysis and visualizations [59][60][61] . Such applications have been used as a motivation to create exploreR that Figure 6. (a) Red dot on the map shows the field test location, (b) Co-location setup at NABEL monitoring station, (c) Line plot of PM2.5 data obtained from two CoSense Units located with the reference monitor at NABEL station, and (d) CDF of the difference between the PM2.5 values recorded by the reference monitor and two sensors (S1 and S2). www.nature.com/scientificreports/ is designed to reduce the technical barriers especially related to coding when it comes to analyzing and visualizing citizen-generated data. The next few paragraphs explain the design and architecture of the exploreR application. Design and architecture exploreR is designed as an intuitive and easy-to-use sensor data analysis and visualization. The application Graphical User Interface (GUI) is designed in a way that guides the user during the analysis process. Figure 8 shows a snapshot of the GUI of the exploreR application. The left column of the GUI (Fig. 8a) holds the main functions that expand once the user decides to use them for data analysis. Figure 8b and c shows different functions supported by the exploreR application. The application framework is designed in a way that follows a series of steps that cover the complete cycle of data input, pre-processing, visualization, and analysis. Figure 9 shows the schematic representation of the exploreR pipeline.
While designing exploreR, one of the objectives was to create an application that would facilitate usability for people from diverse backgrounds. Different integrated workflows within the application allow the user to meaningfully interpret the data without any need for coding. Here is a summary of functions supported by the current version of the application: • Data Processing: The application accepts the data in CSV format and allows the users to filter rows/columns as well as view data summary and plot the raw data. The plots are generated using Plotly which is an interactive graphing library. The generated plots can easily be analyzed using the inbuilt functions like zoom-in/ zoom-out, rescaling, among others. The users can save the generated plots in PNG format. • Outlier Detection: The users can use sophisticated statistical and machine learning methods like k-Nearest Neighbour, ARIMA, and Artificial Neural Networks (ANN) to perform anomaly and outlier detection. Data reliability is an important topic that is widely discussed in low-cost sensor literature 55,62,63 . The outlier detection function allows the user to look for anomalies, plot them and later clean them using state-of-the-art methods. • Gap Filling: This function allows the users to fill gaps due to missing data or gaps that are generated after removing the outliers in the previous stage. The current version of the application supports two methods: linear interpolation and Kalman filter. These methods have been used due to their widespread use in sensor literature as well as overall accuracy 64,65 .   66 . The users can also create box plots and histograms to perform a visual analysis of data. The plots can be downloaded as files in PNG format. • Data Forecasting: exploreR also has features that can be used for more advanced analysis and understanding of the air quality data. The application allows users to use advanced machine learning algorithms to perform data forecasts. PM2.5 forecast is a major challenge as has been widely studied by researchers in atmospheric science, environment monitoring, and computer science domains. The data forecast functions allow the users to use methods ranging from simple to more complex to analyze which method performs well. The current version supports methods like Linear Regression (LR), Random Forest (RF) Model, XGBoost, and ANN. The reason for selecting these models is their widespread use in time-series forecasting research 66,67 .
Having multiple models allow users to compare model performance and potentially use those findings for creating real-time forecasting applications. The forecasting results can be viewed in the application as well as downloaded in CSV format. Sometimes the data may be too granular or not granular at all. This can lead to an imbalanced time series and adversely impact the overall analysis. To address this challenge, exploreR allows the users to downsample the data to daily, weekly, monthly and yearly data. The user can either use the sum or mean to aggregate the data. The aggregated data can be downloaded in CSV format.
exploreR is a major component of the Soc-IoT framework and is aimed at the easy analysis of sensor data as well as assisting citizen scientists, policymakers, researchers from non-programming backgrounds to perform data analysis. Furthermore, exploreR facilitates the easy export of figures and files that can be used for reporting data, publications, and data dissemination.
Comparison with existing applications. To understand how this application contributes to the field of open-source sensor data analysis, exploreR is compared with similar air quality sensor data analysis applications and softwares 61,[68][69][70][71] . Different applications and softwares have been proposed over the years, with each of them having some strengths and weaknesses. Most of the applications are usually designed for the data from a specific sensor. It works well for data from particular sensors, but with data from different IoT devices, it might not work well. This is mainly due to different data formats as well as the organization of the data. Similarly, with programming-intensive tools, users who are technically experienced can easily analyze the data but it becomes difficult in case the user has no background in programming languages. Keeping these points in mind, exploreR is designed as a non-sensor-specific application that doesn't require any prior knowledge of programming. This allows the users to analyze data from different sensors with ease and without worrying about technical  Table 3 compares exploreR with other existing open-source tools and softwares that have been widely used for analyzing air quality data obtained using low-cost sensors. Most of the existing solutions are designed keeping in mind specific sensors and user groups. The comparison highlights that exploreR successfully combines features that allow the analysis of data from different sensors without any need for programming.
Discussion. Soc-IoT improves the accessibility to environmental data and promotes community engagement by capitalising on the recent advancements and developments of low-cost environmental monitoring sensors as well as open-source data analysis packages. It represents a novel opportunity for the citizens as well as the researchers to monitor environment using the CoSense Unit that is built using"off-the-shelf " hardware. The exploreR application on the other hand allows detailed and reproducible analysis of sensor data. Such an opensource tool can potentially bridge the gap between experts and non-experts as well as allow citizen scientists to add context while analyzing their data, which is often missing when data is evaluated by a third party. Soc-IoT has been designed as a citizen-centric platform where users can benefit from hands-on experience when it comes to using environmental monitoring sensors. The open-source nature of the framework allows for continuous development of the Soc-IoT framework while also encouraging wider community participation in environmental monitoring tasks. The methodology used for CoSense Unit validation is representative of widely used quality assurance and quality control methods for low-cost sensors. Despite the challenges of using lowcost sensors, the CoSense Unit performed well in terms of data quality when compared to data from official air quality monitoring stations. In terms of sensor sustainability, the CoSense Unit can be utilized for resourceaware IoT deployment, which not only considers the IoT device's energy usage but also ensures that the sensor code consumes as little energy as possible. This also opens up the possibility of supplementing the official environmental monitoring system with a low-cost environmental sensing framework. As highlighted in a recent study 72 , technological complexity and limited interaction between key stakeholders are some of the key barriers to participatory citizen science. The modular and transparent nature of Soc-IoT framework allows it to be used for participatory citizen science activities that could promote citizen engagement and allow communities and decision-makers to collaborate on major environmental issues.

Conclusion and future work
Leveraging the growth in the IoT and its interplay with sustainable practices and open-source principles, this paper proposes Soc-IoT, a proof-of-concept framework for citizen-centric environmental monitoring. The framework promotes accurate and efficient environmental monitoring by integrating open-source hardware and software. The CoSense Unit is built with readily available low-cost hardware components that can be used by researchers, citizens, and the maker community to create their own sensing devices. Because of the ease of access and low cost of these hardware components, the CoSense Unit can also be used in locations that have limited resources and budget for environmental monitoring. The performance and accuracy of the CoSense Units is extensively evaluated by co-locating them at an official air quality monitoring station equipped with reference-equivalent instrumentation in Dubendorf, Switzerland. Additionally, quality assurance was performed by studying the inter-unit variability. With a modular design, easy assembly, and intuitive data analysis interface, the Soc-IoT framework can assist in air pollution exposure assessment as well as comprehensive analysis of air quality data. The exploreR application is designed to reduce technical barriers, particularly those related to programming. It offers both experts and non-experts a wide range of data analysis and visualization functionalities that support visual inspection of data, data cleaning, and detailed data analysis.
The core part of the framework focuses on enhancing embedded spatial intelligence where citizen empowerment meets smart environments and sustainable design.The proposed framework has the potential to foster collaboration among a wide range of stakeholders, including scientists, policymakers, and citizens and maker community. The CoSense Unit's reliability and accuracy enable it to potentially complement official environmental monitoring networks. The extensible and open-source nature of Soc-IoT framework would encourage others to use it as a development platform rather than reinventing everything from scratch. To strengthen the science-policy-society interface, the Soc-IoT framework can also be used to facilitate co-creation and Citizen Table 3. Comparison of exploreR with other air quality data analysis tools and softwares. www.nature.com/scientificreports/ Science activities. Besides supporting data democratization, it can be used to create an environment where citizens' opinions, observations, and expertise are valued and used to facilitate a dialogue with decision-makers. The framework presented in this paper demonstrates the feasibility of using open-source low-cost IoT technology in environmental monitoring applications. Therefore, the work presented here can be used for future research. Although the environmental sustainability of IoT devices has been considered as part of this work, there are other aspects that could not be considered due to time constraints, such as the social and economic sustainability of IoT. Future work will look into the social and economic viability of the technological solutions discussed in this paper. Another future research direction would be to investigate data-related issues such as user data security and privacy, as well as to evaluate various privacy preservation techniques to protect user data. In order to improve the system's scalability, future research will also look into dynamic calibration and edge analytics. Additional enhancements will be made to the data analysis tool, including improvement in the user interface and the addition of more functionalities. A key part of the future works would include conducting field experiments in collaboration with the research as well as the citizen science community to analyze the usability of the device for promoting environmental awareness.

Code availability
The sensor code, STL files for 3D printing as well as the code for explorR application are available freely at Github https:// github. com/ sachi t27/ Soc-IoT. The exploreR application can be accessed using this link https:// sachi tmaha jan. shiny apps. io/ explo reR/.