Development and application of a method to classify airborne pollen taxa concentration using light scattering data

Although automated pollen monitoring networks using laser optics are well-established in Japan, it is thought that these methods cannot distinguish between pollen counts when evaluating various pollen taxa. However, a method for distinguishing the pollen counts of two pollen taxa was recently developed. In this study, we applied such a method to field evaluate the data of the two main allergens in Japan, Chamaecyparis obtusa and Cryptomeria japonica. We showed that the method can distinguish between the pollen counts of these two species even when they are simultaneously present in the atmosphere. This result indicates that a method for automated and simple two pollen taxa monitoring with high spatial density can be developed using the existing pollen network.


Methods
Light scattering data of C. obtusa and C. japonica were obtained using the AME system, which was first introduced in Miki et al. 36 . In the system, each pollen count was derived by solving the following equation: where a , b , c , d are the light scattering intensities used as integration intervals, P is the representative probability density as a function of the light scattering intensity of pollen taxa ( α, β ), N is the number of sampled pollen grains, p is the number of the signals in the range of the integration intervals, and n is the total number of sampled pollen grains in the integration interval.
Here, P is the Gaussian function: The representative probability densities of C. obtusa and C. japonica were determined from 1500 light scattering data points for these species. The pollen was restored by Yamatronics, and the forward light scattering intensities of the two taxa were used for analysis. The integration intervals, ( a-b ) and ( c-d ), were set as the two points of voltage levels separated by 10 mV between 500 and 600 mV, and c and d were every set of two points of voltage levels separated by 10 mV between 600 and 700 mV, which are The number of each pollen grain was calculated based on every set of the integration intervals (a-b and c-d), and the calculation results were output if they were not negative. The number of pollen grains most frequently output as the calculation results were adopted as the number of pollen grains calculated by the AME system. Additionally, if the forward light scattering data of C. obtusa and C. japonica had the same variances and averages, obtaining meaningful outputs is not possible if the forward light scattering if the two species showed significant differences. Thus, the F test and t test. were used to test if the light scattering of the two species were significantly different.
Evaluation test. In addition to the light scattering data used to determine the representative light scattering probability, C. obtusa pollen and C. japonica pollen were injected into the KH-3000-01 following the guidelines of the Japanese Ministry of the Environment to obtain light scattering signals without contamination (Fig. 1). Five test data were obtained by randomly choosing light scattering data from each taxon.
Application for actual field data. Pollen data were sampled according to the instructions of the Durham sampler and KH-3000-01 at Funabashi, where one of the pollen monitoring sites in Japan is located (Fig. 2), over 2 weeks from March 24 to April 6, 2012 (Fig. 3). Sampling by the Durham sampler was performed following the standardised protocol (PAAA and IAA protocol) of the NPO Pollen Information Association. The volumetric pollen concentrations evaluated by the KH-3000-01 and pollen deposition data evaluated by the Durham sampler were compared. Assuming that the pollen deposition speed of C. obtusa and C. japonica are constant, the relationship between pollen deposition and the airborne pollen concentration are constant. Thus, the correlation coefficients between the Durham sampler and AME system directly indicate the applicability of the AME system for pollen concentration estimation. Data from this sampling period were chosen because the period is the main season during which both C. obtusa and C. japonica are present. To avoid any impact of dust data on the light scattering data, data with side scattering intensities within 400-1400 mV were selected.
Because the KH-3000-01 samples air at a rate of 4.1 L min −1 , the number of signals from the KH-3000-01 was converted to the airborne pollen concentration (m −3 ) using Eq. (3): The confidence coefficient calculated using Eq. (4) is introduced as the criterion for determining the validity of the calculation results derived by the AME system.
where L is the number of outputs, N is the total number of calculation results (3,025 in this experiment), P is the number of outputs that show the estimated number of pollen grains, and C is the total airborne pollen concentration calculated by the system which is the total of the estimated concentrations of C. obtusa and C. japonica each day. The confidence coefficient indicates the uniqueness of the outputs from the system. If each confidence (1) (3) www.nature.com/scientificreports/ coefficient of C. obtusa and C. japonica was below 2.0, the AME system's calculation result was considered as invalid because the system was "not confident".

Results
We obtained 1500 points of raw light scattering data of C. obtusa and C. japonica. The representative light intensity data of C. obtusa and C. japonica were obtained by fitting the forward light scattering intensity distribution to the Gaussian function (Fig. 4). The coefficients of the Gaussian functions for the light scattering of each taxon produced by curve fitting were derived as follows: Using the F test, the F boundary value, F value, and p value were derived as 1.09, 3.10, and below 0.05, respectively. In the t test, the t boundary value, t value, and p value were derived as 1.65, − 24.72, and below 0.05, respectively. Hence, the variance and average light scattering intensities of C. obtusa and C. japonica showed significant differences even though they overlapped.
Evaluation test. The evaluation test results showed that the AME system accurately calculated the number of pollen grains for each taxon (Fig. 5). Table 1 shows the actual and estimated numbers of C. obtusa and C. japonica. These results revealed that the AME system could distinguish the sampled number of pollen grains of C. japonica and C. obtusa with high accuracy when there was no dust contamination.
Application to field data. According to the confidence coefficients derived from Eq. (4), the system was not confident about the results obtained on March 23 and April 2 ( Table 2). When the confidence coefficient of one taxon was low, the other taxon tended to be low (Fig. 6). When the outputs were evaluated on a scatter plot with alpha = 0.01 (Fig. 7), a higher the confidence coefficient was found to correspond to a more unique number of pollen grains. Additionally, when the confident coefficient was below 2.0, the uniqueness of the output was low (March 24 and April 2). When the calculation results were compared with the number of pollen grains sampled by the Durham sampler, the determination coefficients between the Durham sampler and number of signals from KH-3000-01 was 0.20 (C. obtusa) and 0.45 (C. japonica). The determination coefficients between P C.obtusa : (µ, σ ) = (408, 145), P C.japonica : (µ, σ ) = (662, 257). www.nature.com/scientificreports/ the Durham sampler and AME system were 0.35 (C. obtusa) and 0.77 (C. japonica) (Fig. 8). Thus, the correlation coefficients were significantly improved through the analysis.

Discussion
We found that the number of pollen grains of two pollen taxa can be distinguished using simple light scattering intensity data, even if the light scattering data overlaps. Although the integration intervals were fixed at one set of values (600, 800, 300, and 500 mV) by Miki et al. 36 , the output of the calculation fluctuated even when the integration intervals were fixed. Hence, adopting the most frequent output calculation results as the estimated number of pollen grains appears to be effective. The test evaluation results showed that improving the device to better prevent dust from being sampled would improve the accuracy of the AME system. However, the AME system calculates the number of pollen grains based on the number of signals within the signal level of the integration interval. Hence, the number of airborne dust particles itself does not affect the accuracy of the AME system, but dust with optical characteristics similar to that of pollen grains should be excluded from the data to improve accuracy. The reason that the system worked better in calculating C. japonica than C. obtusa is thought to be that the KH-3000-01 equipment was originally designed only to detect C. japonica. Thus, updating the device can improve the analysis of C. obtusa. The sampling efficiencies of the KH-3000-01 and Durham samplers for C. obtusa and C. japonica are expected to differ from each other and from the correlation coefficient between the actual airborne pollen concentrations and pollen sampled on the Durham sampler. Hence, the AME system may not function better for C. japonica than for C. obtusa.
Although C. japonica and C. obtusa have approximately the same size (30 μm), the light scattering of the two species were significantly different. Previous studies indicated that the light scattering characteristic is strongly www.nature.com/scientificreports/ related to the morphology and surface spectrum of the pollen grains 35 , 42 . Thus, the AME system seems to be applicable to various pairs of pollen species with different sizes, shapes, and surface spectrums. This is also suggested in Miki et al. 36 showing that the light scatterings of different pollen taxa are often significantly different. Some studies focused on automated identification and counting using machine learning, which is expected to lead to highly accurate results [2][3][4] . However, as inexpensive and robust pollen samplers are currently being widely used, an automated system that can distinguish two different types of samples pollen grains taxa based on the simple light scattering intensity can be established as a nationwide automated multi-taxon pollen counting network by applying the system improved in this research to the existing network infrastructure. In addition, although the main two types of pollen in Japan were analysed, the AME system can be applied to the main taxa in other countries and regions by modifying the conditions of the system such as the integration intervals and light scattering probability density. Moreover, the system is applicable regardless of the pollen seasons.

Conclusion
Field data on the airborne concentrations of two pollen taxa can be separately evaluated from simple light scattering data. This result indicates that a nationwide automated, inexpensive, and robust system with the potential to classify pollen concentrations from multiple taxa with high spatial density can be established.   www.nature.com/scientificreports/

Code availability
The code used in this study is available from the corresponding author on reasonable request.