Correlation of publication frequency of newspaper articles with environment and public health issues in fire-prone peatland regions of Riau in Sumatra, Indonesia

Forest fires in peatlands emit pollutants to the atmosphere, affecting public health. Though air quality data and epidemiology information are helpful in the management of the environment and public health, they are not always available. We aimed to reveal the utility of newspaper articles for estimating the public health concerns posed by air pollution. Using the database of a local newspaper, Riau Pos, in Riau Province of Sumatra, Indonesia, we have studied spatiotemporal correlations between publication frequency of newspaper articles mentioning search terms relevant to health issues and Fire Radiation Power (FRP) of peatland fires. FRP from one of the NASA satellite databases has been used as an index of air pollution caused by peatland fires. Visibility data for airport operation were also used as an index of particulate matter concentrations. The study regions are primarily the fire-prone Riau Province and nearby areas in the central area of Sumatra, Indonesia, from 2009 to 2018. Newspaper articles related to public health are found to be associated with FRP and visibility, or regional air pollution.


Introduction
A tropical peatland is an ecosystem that stores a large amount of terrestrial carbon over millennia. Indonesian peatlands store~7% of the global peat carbon pool (Page et al., 2011). In recent decades, the frequency of forest fires across the tropics has increased (Global fire emissions database, 2020). Forest fires in Indonesia tend to occur annually in the dry season, even in years of non-El Niño Southern Oscillation, primarily by anthropogenic activities for developing agriculture and plantations (Murdiyarso and Adiningsih, 2007). Peatlands are concentrated in the lowland regions of Sumatra and Kalimantan in Indonesia. Smoke developed there is a major cause of transboundary haze pollution. Air quality during burning periods exceeds the maximum level on the Pollutant Standard Index. This air quality impacts on human health, especially on respiratory illnesses because fire smoke contains hazardous air pollutants by the emission of carbon monoxide, nitrogen oxides, sulfur oxide and organic compounds from underground burning of peat (Naeher et al., 2007). Frankenberg et al. (2004) reported statistical evidence linking the 1997 forest fire to increased difficulty in daily living activities and negative impacts on respiratory and general health in Sumatra and Kalimantan. Kim et al. (2017) utilized the satellite-based aerosol index from a wide region of Indonesia, finding that the episodic shock of air pollution was significantly linked to clinical depression in women in that region. Agustian et al. (2020) coupled global air quality proxies with the Indonesia Family Life Survey data for 2014, showing that the prevalence of health outcomes with the denominator of adults was higher in the southern area of Sumatra (3.3%) and Central Kalimantan (4.3%) than the national average (2.9%). All studies provided statistical evidence of the negative impacts of air pollution on chronic diseases and human welfare.
The impact of fires on air quality is not limited to the haze source regions alone as pollutants can be transported to the atmosphere over a hundred km away. Population exposure to haze pollution depends on the geospatial location of fire hotspots and the extent of burning of peatlands (Heil et al., 2007). The GEOS-Chem chemical transport model has shown that the southeastern region in Sumatra is heavily contaminated for the population-weighted regions (Kim et al., 2015). With the recent progress in informatics, contents acquired from social and mass media have become one of the new tools of disseminating information about air pollution. Sociological information is complementary for describing the extent and range of health impacts of haze dispersion in regions where environmental monitoring and health surveillance data are unavailable. Using posts to Sina Weibo on smoke and haze, the air quality index was estimated (Jiang et al., 2015;Tao et al., 2016). In California wildfire cases, social media data were used to map burned areas (Sachdeva and McCaffrey, 2018). Crowdsourced data are useful for estimating the haze pollution from wildfire smoke affecting particulate matter (PM) to predict air quality impacts on a statewide level (Kent and Capello, 2013). Social media are data sources for disease surveillance, such as infectious diseases and asthma (Broniatowski et al., 2013;Ram et al., 2015). It was also supplementary and complementary information for haze disaster management in Riau, Indonesia, according to a correlation calculation with tweets and hotspots (Kibanov et al., 2017). The Google Trends data had a linear time-series pattern with the official dengue fever disease report for Indonesia during 2012-2016 (Husnayain et al., 2019). For monitoring outbreaks of dengue fever in India, newspaper articles were useful in providing possible proxy data (Zhang et al., 2019). Newspaper information on social issues also helps in the development of public concern. Newspaper information is reliable because accuracy, quality and newsworthiness of articles are ensured by editorial review processes.
To quantify its disease burdens and develop mitigation strategies, there is an urgent need for exposure and health outcomes data on a local scale. The exposure estimates from scientific measurements can then be used to study the health effects of air pollution in urban populations. To investigate this relationship and develop air quality management strategies, small-scale spatial variations in air pollution levels require small-scale health outcome evaluation. The planning and implementation of the air quality management policy need to be executed accordingly. According to Agustian et al. (2020), this is even more important in Indonesia where, with the existence of local autonomy, the authority to develop such policies is placed with local/city governments. Hence, extracting health outcome data fit for the aforementioned purpose by analyzing local newspapers is also challenging.
This study reveals that the health impact of peatland fires can be estimated by choosing appropriate search terms in the analysis of spatiotemporal distributions. The easy availability of mass media data makes it appealing for peatland-fire responders and public health professionals. Since newspaper articles on agricultural economy of peatland can be an indicator of public concern about commercial logging and plantations, such search terms have also been examined in this study.

Methods
Newspaper database. This study investigated the spatiotemporal characteristics of newspaper articles relevant to public health. Respiratory illness is the most important health issue in fireprone regions of Sumatra. Figure 1 shows a map of the fire radiative power (FRP) that accumulated in the central region of Sumatra for a decade from January 2009 to December 2018. FRP in units of power, W, is the rate at which the actively burning fire is emitting radiative energy at the time of observation. FRP is a measure of a fire release rate of thermal radiation, and is strongly related to the rate of fuel consumption and trace gas and aerosol emission (Wooster et al., 2005;Freeborn et al., 2008). FRP is, therefore, an estimator for the amounts of material being emitted to the atmosphere from that fire. The integrated FRP is high in the eastern and southern regions of Sumatra in Fig. 1. FRP within a 100 km radius around Pekanbaru peaked yearly in 2012-2014, and monthly around February and June, as shown in Fig. S1. As Riau Province has been heavily affected by the haze caused by peatland fires, an archive dataset in the PDF form for 2009-2018 was purchased from a local newspaper company, Riau Pos (https://epaper.riaupos.co/hubungi-kami.html). The documents are missing three time-frames; January 2009, October−December 2015, and December 2018. Riau Pos typically publishes 40 pages daily with mentioning international, domestic and local news in the categories of society, business, politics, sports, food and editorials. It covers the Riau Province and nearby regencies, such as Siak, Kampar, Bengkalis, Rokan Hulu, Indragiri Hulu, Kepulauan Meranti, Pelalawan, Indragiri Hilir, Rokan Hilir and Kuantan Singingi. Big cities covered by it are Pekanbaru (state capital), Padang, Bengkalis, Siak, Pelalawan, and Dumai. The total number of PDF pages examined was~140,000, corresponding to 700,000 articles over the decade.
Selection of search terms for analysis. Using the software described below, newspaper articles on public health, air pollution, peatland fires and agricultural economy were selected. The searched terms are penyakit akut (acute illness), penyakit pernapasan akut (acute respiratory illness), kerusakan asap (smoke damage), polusi udara (air pollution), kebakaran gambut (peatland fire), penggunaan lahan (land use) and perkebunan kelapa sawit (oil palm plantation). Although some articles containing these search terms are not directly associated with fire events, they were not filtered based on their framing. This procedure for a popular search term reduced the temporal correlation between FRP and the article number because it appears even during a fireless season, e.g., on editorial opinions and review articles unrelated to fire events. This situation is specific to newspaper articles and is different from social media analysis because tweets usually synchronize with fire events. The number of other search terms listed in the Supplementary Information (SI) Section 1 was too low or too high for analysis. In the latter case, these articles appear throughout the year and are not associated with fire events.
Fire Radiation Power and airport visibility data. FRP is related to the rates of fuel consumption and emission of trace gases and aerosols, which is an appropriate index for surface fire and peat fire (ground fire) (Wooster et al., 2005;Freeborn et al., 2008). Since the correlation between monthly FRP and PM 2.5 is strong with a coefficient of determination R 2 = 0.98 for fires in 2015 around Sumatra and Borneo (Yin et al., 2020), FRP is an appropriate index to estimate the haze exposure to residents. The FRP data were obtained from the Fire Information for Resource Management System of the NASA Database (https:// firms.modaps.eosdis.nasa.gov/download/). Since the newspaper articles were unavailable for the aforementioned time-frames, FRP data during the same time-frames were excluded from the analysis. Three-day FRP data of June 18-20, 2013, were also excluded for being highly irregular. The visibility for airport operation is reciprocally proportional to aerosol density that increases by fires (Griffin, 1980;Field et al., 2009). Ground-based monitoring data of airport visibility were obtained from Badan Meteorologi, Klimatologi, dan Geofisika (BMKG; Meteorological, Climatological and Geophysical Agency of Indonesia, http:// aviation.bmkg.go.id/web/station.php).
Selection of study regions. The primary study region is Riau Province in the central part of the east coast of Sumatra Island. This province has the highest hotspot occurrence in Sumatra, as shown in Fig. 1. Approximately 5500 peat fire hotspots were detected in Riau Province during 2014, while 7892 across Sumatra according to Kibanov et al. (2017). Zaini et al. (2020) reported that among residents exposed to peatland fires in Riau province in 2015, respiratory symptoms were reported in 71.4% subjects and lung function was impaired in 72.6%, mostly with mild obstruction and restriction. Non-respiratory symptoms were reported by 84.7% subjects. In the years of 2015−2017 the acute infection respiratory illness was the top cause for outpatient visits to clinics/ hospitals in Riau province. In 2015, the high FRP year, over 40% of outpatients to hospitals suffer from this illness, while about 20% in 2017. (Dinas Kesehatan Provinsi Riau, 2020) To determine search terms in newspapers for appropriately expressing a spatiotemporal variation of public health condition affected by fires and air pollution, we connect in time N page containing the search terms with the FRP data of the region within a 100 km radius centered at Pekanbaru, the capital city of Riau Province. Kibanov et al. (2017) investigated the spatial characteristics of tweets using a mapping process based on the distance between the position of a tweet and a hotspot. People living closer to a hotspot are more affected by haze than those living farther away. The popularity of tweets on haze and related health distributed at distances of 0-200 km. Notable spikes were present around 100 km. Based on this report, a 100 km radius area around the study city is chosen for FRP data. As typical wind speed is 2−3 m/s (90-130 km/half-day) during the daytime, a 100 km radius is a reasonable distance. Since the advection of airmass from the southern regions of Sumatra occurs, the airmass mixing might affect the correlation between N page and FRP that is limited within 100 km radius.
Search engine for articles of newspaper database in the form of PDF files. A search engine for terms in PDF files was used. Details of the software are described in SI Section 2. Briefly, the search engine functions per page, i.e., the search tool counts "one" when one article with a certain search term appears on a page. However, when two or more articles with the same term appear on the same page, the engine does not count more than "one". The article page number, N page , containing a high frequency search term could level-off with the increase of the article number, N art . The restricted number of articles due to space limitations in a copy of the daily newspaper was also considered. This level-off effect will be discussed below.

Results
Correlation between FRP and the number of article page containing search terms. Using the search engine described above, the total numbers of article pages containing search terms, "acute illness", "smoke damage", "air pollution" and "peatland fire" are 575, 984, 857, and 1618 pages during the decade, respectively. The annual N page is shown in Fig. 2, along with the annual FRP distribution. Among these search terms, Fig. 3a-c show the correlations between the annual N page of "acute illness"/ "smoke damage"/"air pollution" and FRP on a linear scale without time-lag, resulting in correlation coefficients, R = 0.82/ 0.64/0.79, respectively. With a 1-year lag in the N page distributions, the correlation coefficients are R = 0.70/0.78/0.73, respectively. The changes with and without a lag are small. These search terms can be used as potential indices of public health concern posed by air pollutants and fire haze.
The total number of article pages containing "peatland fire" is 1618 pages. The R value for this search term is −0.09 without time-lag (Fig. 3d). Since this result appears unreasonable, this search term is reanalyzed as follows. The total page number 1618 is much greater than the numbers of the other search terms, suggesting saturation in the analysis. Then, the analysis is transformed from annual to monthly because fire-prone periods are mainly in February and June−August, as will be described below and in SI Section 3. Figure 4a shows the level-off effect in the distribution of N page , yielding a linear correlation with R = 0.43 for zero time-lag. To explain the saturation effect, we hypothesize a logarithmic function for the page number distribution. This hypothesis, specific to this analysis, is verified by the model calculation described in SI Section 3. The calculation suggests that when N art is large, the level-off effect in N page results in a log count: where m is the saturation parameter that depends on the total page number per daily copy and the article number limit per page. The level-off effect is stronger with a smaller m, as discussed in the model calculation. The correlation between the logarithmically scaled FRP and the linearly scaled N page is shown in Fig.  4b, in which the solid line is a best-fit for R = 0.43 with zeromonth lag and is essentially the same as the R value in Fig. 4a. R decreases to 0.15 with one-month lag in N page. These results indicate that the saturation due to the present methodology reduces the R value but this search term correlates with fire events.
Next, we have analyzed the monthly correlation between the decade-integrated FRP and N page (total 187 pages) of "acute respiratory illness", as shown in Fig. 5. In Riau Province, the tropical equator weather is characterized by two fire-prone periods, February and June-August. Two seasonal peaks appear in both the figures for N page and FRP. The R values with lags of FRP ranging of zero, one and two months are 0.62, 0.74 and 0.28, respectively. A 1-month lag yields the best temporal correlation. One-month lag observed here suggests that newsworthiness of public health increases with a lag of a few weeks after fire events (Xu et al., 2016).
Correlation between FRP and number of article pages containing "acute respiratory illness" geo-tagged with a city name: spatial patterns inferred from newspaper articles. Figure 6 shows the annual FRP and N page of "acute respiratory illness" + "Pekanbaru", where an appreciable correlation is observed. The correlation coefficients between FRP and N page are R = 0.63 and 0.78 with a time-lag of 0 and 1 year in FRP, respectively. The slightly higher R value with the 1-year lag suggests that the newsworthiness of risk perception is partly developed from public experience in the previous year. Considering district tendencies, Pekanbaru, Bengkalis, Siak, Pelalawan, Dumai and Duri in Riau Province, and Padang and Bukittinggi in West Sumatra Province are searched. These cities have over 100,000 inhabitants. Since the BMKG stations are located only in three cities, newspaper information is useful for public health management in other cities. Table 1 shows the decade-integrated N page for "acute respiratory illness" + "city name" and their ratios over city population. The six cities located in Riau Province have higher ratios than those of the two cities in West Sumatra. The statistics on pneumonia cases for children under five in 2018 and 2019 reported that the percentages are Siak > Dumai, Pekanbaru, Pelalawan > Bengkalis. (Dinas Kesehatan Provinsi Riau, 2020) This order is in fair agreement with the ratios in Table 1.
To confirm the correlation between those ratios and air quality data, airport visibility data for 2012-2019 at three BMKG stations (Pekanbaru, Padang, and Rengat) are examined to provide spatiotemporal information on the haze contamination of air. Figure 7 shows the reversal visibility data at the three stations, corresponding to the optical density of polluted air or haze level (Griffin, 1980). The temporal profile at the Pekanbaru BMKG station peaks in 2014 and 2015, synchronizing with the peaks in the annual N page for "acute respiratory illness" as seen in Fig. 6a. Regarding visibility, the eastern side of the central region of Sumatra, Pekanbaru and Rengat, suffered heavily from haze compared to the western side region, Padang. These observations support the results of newspaper analysis in Table 1, i.e., the N page of Pekanbaru is seven times higher than that of Padang.
Public concern about fire causes. Since media attention and public concern cycles are linked, risk perception in relation to   peatland fire is investigated with the search terms relevant to fire destruction as well as an agricultural economy, i.e., "land use" and "oil palm plantation". The total page numbers for the decade are 3446 and 4881 pages, respectively. The R values between the annual N page and FRP are 0.59 and 0.71, respectively, indicating that the newspaper identifies well the fire cause, namely, the conversion of forests and peatland into palm oil plantations through slash-and-burn techniques induces severe fire effects. Because these articles cover various newspaper categories such as the economy, politics, living and environment, the level-off effect is weak for a large N page as is mentioned in the model calculation in SI Section 3, i.g., the maximum page number of these search terms per daily copy is large. In this case, the saturation parameter m in Eq. (1) becomes large.

Discussion
Here, we have found appreciable correlations between the index of Fire Radiation Power and the annual article page numbers of "acute illness"/"smoke damage"/"air pollution"/"acute respiratory illness". An appreciable correlation is also observed between the fire index and the article page number of "acute respiratory illness" + "Pekanbaru" (state capital name). The six cities located in Riau Province of the central region of Sumatra are characterized by the higher article ratios per population for "acute respiratory illness" than the two cities in West Sumatra province. Higher FRP or concentration of particulate matter are observed in Riau Province in visibility measurements. Concerning the risk perception in relation to peatland fire, the search terms for the fire destruction are strongly correlated with fire events.
Public health and Fire Radiation Power via mutual correlations with particulate matter. Peatland fires emit organic aerosols, which are the primary components of smoke-related surface PM concentrations. Peatland fires are classified as surface and peat fires. Surface fires are detected by satellite observations as hotspots, which typically burn during the daytime. Peat fires detected by thermal analysis of satellite images continue to burn for~2 weeks, emitting a large amount of PM to air (Atwood et al., 2016). Studies conducted in air-polluted regions have found that acute increases in ambient air pollution led to increases in the hospital and doctor visits for respiratory infections (North et al., 2019). Review papers have reported consistent evidence of associations between wildfire smoke exposure with general respiratory morbidity and with exacerbations of asthma and chronic obstructive pulmonary disease (Reid et al., 2016;Uda et al., 2019). Epidemiological evidence Table 1 Regional dependence of page numbers of articles mentioning both "acute respiratory illness" and "city name".  and plausible toxicological mechanisms suggest an association between wildfire smoke exposure and respiratory illness. A multivariate analysis was performed on the severity of respiratory problems caused by PM from the 1997 haze disaster in Sumatra (Kunii et al., 2002). As the correlation between FRP and monthly average of PM 2.5 is strong for fires in Sumatra (Yin et al., 2020), FRP is considered an appropriate index for estimating haze exposure to residents and respiratory illness. In our study, the newspaper article volume on public health has a strong relationship with FRP and PM concentration. MODIS hotspot measurement by NASA, has been adopted as an index of peatland fire intensity or air pollution in tropical regions. In this study, we use FRP as an index because the hotspot number tends to be lower than the real number. One of the causes is the thick smoke that blocks fire signals from reaching the MODIS sensor, in addition to cloud constraints, fire area, and the time factor of the satellite trajectory compared with the timing of the fire. Even so, a strong correlation was reported between the weekly numbers of hotspots and tweets on "haze-health" in Riau Province, 2014, i.e., between the number of detected hotspots on a linear scale and the weekly number of conversations in a logarithmic scale, confirming that people have exponentially increased conversations on "haze-health" after a fire (Kivanov et al., 2017). Regarding the publication frequency of newspaper articles, Figs. 3, 5, and 6 suggest an appreciable correlation between FRP and the search terms of "acute illness", "smoke damage", "air pollution" and "acute respiratory illness".
Spatial variances in newspaper article numbers on public health. This section discusses the spatio-relationship between FRP and N page in the context of health issues. Identification of unhealthy conditions of geospatial locations from mass media would be useful for the prioritization of disaster response and humanitarian action during haze disasters. The results described above suggest that "acute illness" and "acute respiratory illness" are useful search terms, reflecting the public health impact of air pollution and smoke damage from peatland fires. In the Tweeter content analysis, tweets about wildfire and smoke predicted public health impacts (Sachdeva and McCaffrey, 2018). The assessment of population-level exposure in the United States was explored using daily posts from Facebook (Ford et al., 2017). Comparing the PM 2.5 levels, the best correlations reported were for "air quality", "wildfire" and "smoke" since wildfire smoke was the source of the highest variability in surface PM 2.5 concentrations. The search terms "haze" and "health" were relevant to residents across Sumatra, becoming popular terms in social media (Kibanov et al., 2017). A study of local newspapers in India showed a significant correlation between the number of dengue fever cases and the number of associated articles from local major newspapers (Zhang et al., 2019).
As for the regional characteristics of public concern regarding "acute respiratory illness", the ratios of N page divided by population are shown in Table 1, which are much higher in Riau Province than in West Sumatra. The FRP is also high in the eastern region (Fig. 1). Figure 7 indicates the association of the search terms with the PM. The large ratios, 11-18, are independent for five most populated eastern cities, Pekanbaru, Siak, Pelalawan, Dumai and Rengat, because haze spreads all over the eastern side of the central region of Sumatra (Kim et al., 2015). Close to the eastern coast, one-third of the land is underlain by peatlands (Sze et al., 2019). The eastern region is characterized by high FRP as shown in Fig. 1, and suffers from the haze from underground peat burns during the fire periods (Marlier et al., 2015a). The air quality in the fire season was the worst in the eastern cities, Pekanbaru, Siak, Bengkalis and Dumai (Kibanov et al., 2017). Among these cities, Siak, Pelalawan and Dumai have no BMKG stations. Local information on acute illnesses can be obtained from newspaper articles, even if haze data from weather stations are unavailable.
Newspaper articles on the causes of peatland fires. Palm oil is an important vegetable oil in terms of production quantity and Indonesia is the world's largest producer (Koh and Ghazoul, 2010). The demand for palm oil is driving deforestation to clear land for new plantations. The clearance and maintenance of lands are managed mainly through prescribed fire (Marlier et al., 2015b). In the evaluation of the social factors behind the 2015 extreme fire event in Sumatra, a variable of agricultural economy, such as the proportion of plantation land holdings, is one of the factors in explaining fire count at the regency-level (Sze et al., 2019). In combating peatland fires, high levels of acknowledgment and engagement are needed for changes toward more sustainable behaviors. In this aspect, newspaper media plays an important role in shaping public perceptions of fire causes. The newspaper search terms "land use" and "oil palm plantation" appear under various article categories. The appreciable R values suggest that newsworthiness of the fire causes, namely, the conversion of forests and peatlands into palm oil plantations through slash-and-burn techniques, enhances the changes in people's behavior toward more sustainable actions. Newspaper media are influenced by public perception and support public involvement. Thus, government or planning authorities are expected to increase the dissemination of information on the prevention of fires through newspaper articles.

Conclusions
Newspaper epidemiology information is not generated with the primary purpose of doing epidemiology but a useful source in the analysis of public health and environmental contexts. (Salathé, 2018). This study has revealed the utility of articles from a local newspaper, Riau Pos, for filling in estimates of "acute illness (penyakit akut)" and "acute respiratory illness (penyakit pernapasan akut)" related to air pollution. Newspaper analysis can be a potential tool for public health sectors for estimating health impacts from deteriorated air quality due to smoke haze. To solve the problems caused by fire and particulate matter, the local government needs to find a way of managing peatlands. That will require strengthening governance and accountability. Newspapers play an important role in informing people about the initiatives taken to manage such social issues. The interdisciplinary scientific framework to evaluate public health implications is publicly available through a decision support tool of mass media. As the development of digital databases in Indonesia, interdisciplinary research like this study will become popular in media research. The present methodology will contribute to the issue of forest fires globally because forest fires are not only a problem in Indonesia but also occur in many other countries

Data availability
The datasets supporting the conclusions of this article are included in the article and Supplementary Information. If readers would like further information about data, please contact the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.