An annual time series of weekly size-resolved aerosol properties in the megacity of Metro Manila, Philippines

Size-resolved aerosol samples were collected in Metro Manila between July 2018 and October 2019. Two Micro-Orifice Uniform Deposit Impactors (MOUDI) were deployed at Manila Observatory in Quezon City, Metro Manila with samples collected on a weekly basis for water-soluble speciation and mass quantification. Additional sets were collected for gravimetric and black carbon analysis, including during special events such as holidays. The unique aspect of the presented data is a year-long record with weekly frequency of size-resolved aerosol composition in a highly populated megacity where there is a lack of measurements. The data are suitable for research to understand the sources, evolution, and fate of atmospheric aerosols, as well as studies focusing on phenomena such as aerosol-cloud-precipitation-meteorology interactions, regional climate, boundary layer processes, and health effects. The dataset can be used to initialize, validate, and/or improve models and remote sensing algorithms. Measurement(s) ion concentration • concentration of water-soluble element • particulate matter • size distribution • mass concentration of black carbon Technology Type(s) ion chromatography • inductively coupled plasma mass spectrometry • Micro-Orifice Uniform Deposit Impactors (MOUDI) • Multi-wavelength Absorption Black Carbon Instrument (MABI) • gravimetric analysis Factor Type(s) geographic location Sample Characteristic - Location National Capital Region Measurement(s) ion concentration • concentration of water-soluble element • particulate matter • size distribution • mass concentration of black carbon Technology Type(s) ion chromatography • inductively coupled plasma mass spectrometry • Micro-Orifice Uniform Deposit Impactors (MOUDI) • Multi-wavelength Absorption Black Carbon Instrument (MABI) • gravimetric analysis Factor Type(s) geographic location Sample Characteristic - Location National Capital Region Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12037197

located at the Ateneo de Manila campus in Quezon City, Philippines. The site was segregated from surrounding urban areas, including a major roadway, by a grove of trees circling the campus. However, it was clearly impacted by local urban emissions and long-range transport based on results from the first six months of data collected [16][17][18] . Sampling took place on the 3 rd floor of the MO office building, which was approximately 85 m above sea level. Figure 1 shows a timeline of sampling, which occurred in four identified seasons: the 2018 southwest monsoon/ wet season   19,20 , a transitional period , the northeast monsoon/dry season (26 October 2018-10 June 2019) 21 , and the 2019 southwest monsoon/wet season (11 June-7 October) 22,23 . The southwest monsoon is characterized by relatively high temperatures, high humidity, frequent and heavy rainfall, and winds coming predominantly from the southwest. The northeast monsoon is characterized by moderate rainfall, low humidity, lower temperatures, and winds affecting the eastern side of the country. The characteristics of the monsoons listed above are general traits, but the major determining factor is rainfall. The measured temperature, humidity, and rainfall during sampling period collected at MO ranged from 25.4-30.2 and 24.2-30.9 °C, 59-94 and 54-85%, and 0-78.4 and 0-32.6 mm for the southwest and northeast monsoons, respectively, with average values of 27.6 and 27.7 °C, 72 and 64%, and 18.8 and 2.1 mm. Although the focus of this data descriptor is the size-resolved PM composition dataset, additional instrumentation co-located at MO during CHECSM is summarized in Table 1.

Fig. 1
Timeline of size-resolved aerosol measurements at the Manila Observatory. Light blue boxes represent the southwest monsoon/wet seasons, the light green box represents the transitional period, and the orange box represents the northeast monsoon/dry season. Dark colored boxes represent MOUDI sampling periods and black boxes represent parallel MOUDI sampling periods. gravimetric analysis could be performed. A total of 66 sets were collected; 11 of the sets were collected using the simultaneous sampling approach, 54 of the sets were analyzed using ion chromatography (IC; Thermo Scientific Dionex ICS-2100 system), 47 of the sets were also analyzed using triple quadrupole inductively coupled plasma mass spectrometry (ICP-QQQ; Agilent 8800 Series), and 1 set (MO25) was collected as a special microscopy set. Additional MOUDI sets were collected on aluminum substrates for microscopy analysis using a Scanning Electron Microscope (SEM); however, these sets are not included in the dataset presented here. For more information on these sets, please refer to Cruz et al. 16 .
The MOUDIs were set up in such a way to reduce both particle losses and blockage of the inlet. The inlet tubing connecting the MOUDI to ambient air was constructed of stainless steel. The tubing was bent meticulously with a large radius such that there were no kinks. The inlet of the tubing was oriented downwards to prevent water from entering the MOUDI. To further avoid debris from getting into the inlet, a funnel with a mesh covering was attached securely to the downward facing tube opening exposed to ambient air. The temperature differential between the outside air and the tubing was either negligible or the tubing was slightly warmer than the outside air, thus reducing the possibility of thermal deposition. As the average relative humidity measured onsite over the sampling period was approximately 68% ranging from 54-94% throughout the year, the diameters of sampled particles correspond to wet rather than dry diameters and particle bounce was not significant 25 . This is additionally supported by particle morphology characterization showing evidence of halo areas, indicative of the particles being saturated when impacting onto the substrates [26][27][28] , surrounding particles in both the fine and coarse size ranges 16,17 . Pre-Sampling processing. The Teflon substrates were prepared prior to use by soaking each substrate face for a minimum of 12 hours in ~7.6 cm of Milli-Q (18.2 MΩ-cm) water in a laminar flow hood and/or covered container. Once each substrate face was soaked, the substrates were removed and placed in methanol cleaned Petrislides (Millipore), which were left slightly open in a laminar flow hood to dry any water residue. Once the substrates were dry, the Petrislides were closed and sealed using Parafilm to ensure the substrates were devoid of any particles or gases that could deposit on them.
Post-Sampling processing. Figure 2 summarizes the post-sampling process to reach the final dataset. After sampling was completed, substrates were first cut in half using ceramic scissors so one-half could be used for extraction in Milli-Q water and the other half could be stored in a freezer at −20 °C for future analyses. Ceramic scissors that were cleaned with methanol were used to cut the substrates in half in order to prevent contamination of heavy metals from other cutting instruments (e.g. metal scissors). The ceramic scissors were subsequently cleaned with methanol after each cut. Substrate extractions were performed using 8 mL of Milli-Q water (18.2 MΩ-cm) in cleaned 15 mL polypropylene centrifuge tubes that were sonicated for 30 minutes at 25-30 °C. Samples were extracted in this temperature range to ensure all targeted organics would solubilize. Additionally, during sampling, temperatures ranged from 28.7-45.7 °C; therefore, any volatile species were expected to be gone prior to the point of sampling and well before extractions took place. There have been other papers that performed similar extractions with temperatures up to 60 °C [29][30][31][32][33] . Sonicated solutions were then decanted into two different containers for analysis: (i) 0.5 mL polypropylene vial with a filter cap for analysis via IC, and (ii) a polypropylene centrifuge tube for analysis via ICP-QQQ. The remainder of the solutions were then stored in a refrigerator at 0 °C. Blank substrates were also processed in a similar manner to serve as background control samples. The motivation behind using water for extractions was owing to the importance of the results for health effects and toxicological studies, radiative effects, atmospheric residence time, nucleation efficiency, and bioavailability 34-39 . Ions. Cationic and anionic water-soluble PM speciation and quantification was conducted using a 2 mm IC system at a flowrate of 0.4 mL min −1 . The cationic species measured were Na + , NH 4 + , Mg 2+ , Ca 2+ , dimethylamine (DMA), trimethylamine (TMA), and diethylamine (DEA) using an eluent of methanesulfonic acid. The anionic species measured were methanesulfonate (MSA), pyruvate, adipate, succinate, maleate, oxalate, phthalate, Cl − ,  Table 3. MOUDI sample set operating data. The table includes average flowrates, total sample run time, average operating temperature of the MOUDI cabinet, relative humidity (RH), and the days of the week sampling occurred. The start/end times varied between 13:00 and 15:00 local time for standard sets and 5:00 local time for dual gravimetric/IC sets. Sets with a label of (G) are gravimetric sets and the set labeled (AL) was collected for SEM analysis. All other sets were only measured with IC and/or ICP-QQQ. www.nature.com/scientificdata www.nature.com/scientificdata/ NO 3 − , and SO 4 2− using an eluent of potassium hydroxide (KOH). A 30-minute instrument method was used for both anion and cation columns with a 5-minute equilibration period giving a total of 35 minutes per sample. The columns used were the Dionex IonPac AS11-HC 250 mm and CS12A 250 mm models for anion and cation analysis, respectively. The suppressors used were a Dionex AERS 500e and a CERS 500e for anions and cations, respectively. For anions, the eluent started at 2 mM, ramped up to 8 mM from 0 to 20 minutes, and then ramped up from 8 to 28 mM from 20 to 30 minutes using a suppressor current of 28 mA. For cations, the eluent started at 5 mM, was isocratic from 0 to 13 minutes, ramped up from 5 to 18 mM from 13 to 16 minutes, and finally was isocratic at 18 mM from 16 to 30 minutes using a suppressor current of 22 mA. The recoveries, limits of detection (LOD), and limits of quantification (LOQ) for these species can be found in Table 4.

Elements.
Water-soluble elements were speciated and quantified using ICP-QQQ after being acidified in 2% nitric acid. The elements quantified were: Ag, Al, As, Ba, Cd, Co, Cr, Cs, Cu, Fe, Hf, K, Mn, Mo, Nb, Ni, Pb, Rb, Se, Sn, Sr, Ti, Tl, V, Y, Zn, and Zr. The recoveries, LOD, and LOQ for these species can be found in Table 5. For species that were measured by both IC and ICP-QQQ (Na, Mg, K, and Ca), duplications were not included in the dataset. IC measurements are provided for Na, Mg, and Ca, while ICP-QQQ measurements for K are provided due to potential contamination from the eluent (i.e., KOH) used in the IC. The exception to this is for sets MO57-MO65 where K from the IC was used due to lack of ICP-QQQ data.
Gravimetric. Gravimetric analysis was performed using a Sartorius ME5-F microbalance with a sensitivity of ±1 μg. The microbalance was located in a temperature and humidity-controlled room at 20-23 °C and 30-40% relative humidity with an airlock buffer. Clean substrates were weighed prior to sample collection and then weighed again after sampling ended. Before weighing took place, the filters were equilibrated in the room for at least 24 hours. After the equilibration time, each substrate was passed near a 210 Po antistatic tip for 30 seconds to minimize measurement bias due to electrostatic charge at the surface of the substrate. Each substrate was weighed twice, once initially and then again 24 hours later. If the difference between these two weighings exceeded 10 μg, the substrate was weighed again 24 hours later and this process was repeated until the difference between weighings was less than 10 μg. The percent standard deviations for the weighings before and after sampling, respectively, were relatively negligible, with the highest being 0.005%. The PM mass was derived from the difference of the average substrate weight after sampling minus the average substrate weight before sampling. The standard deviation of the change in weight was then calculated for each PM substrate using the following error propagation equation: where SD d is the standard deviation of the difference, SD b is the standard deviation of the substrate before sampling, and SD a is the standard deviation of the substrate after sampling. The percent standard deviation across all stages and sets averaged out to be approximately 7%.
Black carbon. The subsequently weighed substrates were then analyzed using a Multi-wavelength Absorption Black Carbon Instrument (MABI; Australian Nuclear Science and Technology Organisation). The MABI is an optical instrument used to quantify the mass concentration of black carbon by detecting the absorption for seven   where ϵ is the mass absorption coefficient, A is the substrate collection area, V is the volume of air sampled, I 0 is the measured light transmission through the blank substrate, and I is the measured light transmitted through the sample substrate. The mass absorption coefficient was provided in the MABI manual, collection area was retrieved based on impaction rings on the substrates, volume was calculated from flowrate and sample time, and light transmission was produced directly from the MABI.
Data processing. IC and ICP-QQQ areas were converted to concentrations using Excel sheets formatted to use calibration curves, unit operations, and sampling information. The concentration files were then organized using an assortment of MATLAB codes to produce the data into the published state with gravimetric and black carbon data. Excel and MATLAB processing files are available upon request.

Data Records
The dataset, located on figshare 40 , is in a specialized format used by the National Aeronautics and Space Administration (NASA) for field data, which is referred to as the ICARTT file format. The file name consists of the associated campaign, instrument used, sampling method, start date, revision number, and the end date. The format includes data notes in a README tab. These notes include the data principal investigator (PI), affiliated institution, mission name, the start date of data collection, the last data revision date, the number of variables, data flags, sampling platform and location, instrument information, brief description of the data, and revision log. The revision log states what revision the data is currently on and lists the previous revisions and their relative status. Additional tabs include the MOUDI stage cutpoints and size ranges, uncertainties and LODs, and the variable list and units. Data include ions, elements, gravimetric weights, and MABI measurements separated by stages in air equivalent mass concentrations (µg m −3 ). Note that the reported data are in air equivalent concentrations and typically are converted to dC/dlog D p to properly look at the size distributions.   Table 5. Same as Table 4 but species were quantified using ICP-QQQ (elements). Species marked with '-' in their respective recovery and standard deviation columns were not measured for recovery purposes. LODs and LOQs in ppt are aqueous concentrations while LODs and LOQs in μg m −3 are air equivalent concentrations.
www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation A number of experimental and data processing techniques were implemented to validate and better characterize the final data. The flowrate for each set was measured using a flowmeter (Mesa Labs Definer 220 series) three times both prior to and after each sampling period. The overall average of these values was used as the flowrate for each set. Additionally, pressures for each stage were measured at the beginning and end of sampling to ensure there was no significant change in the pressure drop. To keep the flowrate as close to 30 L min −1 as possible, the MOUDI nozzle plates were removed and cleaned regularly and especially if the flowrate dropped below 27 L min −1 . The nozzle plates were cleaned by soaking the plates in either a methanol-water solution or in pure methanol for 24 hours or more. They were then removed and rinsed with methanol, followed by placement in a clean area to let the methanol evaporate. However, towards the ending of the sampling campaign the flowrate dropped to about 24 L min −1 and subsequent cleanings did not alleviate the problem. The issue was likely due to one or both of the lower nozzle plates (0.056 and 0.1 μm cutpoint diameters) being heavily clogged with the black carbon rich air and unable to be cleared without a more aggressive cleaning method.
Chromatogram peaks were automatically drawn by the IC and ICP-QQQ system software. However, for the IC only, the operator would view each chromatogram to adjust peak areas and add in missing species. LOD and LOQ were calculated using 3 S a b −1 and 10 S a b −1 methods, respectively, where S a is the standard deviation of the response and b is the slope of the calibration curve 41 . Recoveries were calculated by taking the ratio of the mass of a specific measured species to the known amount of that species in that sample 42 . Recoveries for IC and ICP-QQQ were all above 93% with repeatability ranging from 2% to 18% (Tables 4 and 5). During data analysis, dC/dlog D p plots (stages 2-11) were examined to ensure a normal distribution was obtained. If the first (stage 2) or last stage (stage 11) was higher than the next (stage 3) or previous stage (stage 10), respectively, then that stage was not considered for a particular set and viewed as having unreliable data. If a species was not measured for a stage, a value of −9999 was inputted. Similarly, if a species was below the LOD for a stage, a value of −8888 was inputted. A summary of the relative number of data points either missing (i.e., no ICP-QQQ data for last seven sets) or below the LOD for a specific species and stage can be seen in Table 6.
A charge balance was also performed by converting each species to moles, multiplying by their respective charges, and then summing up all cations and all anions in a stage. It should be noted that only IC species, with the exception of K from ICP-QQQ, were used to measure the overall charge balances. The reasons for these are (i) the majority of the ICP-QQQ species are transition metals which have varying oxidation states and, without pH measurements, the proper charge cannot be assigned, and (ii) the majority of these species are very low in concentration and do not significantly affect the overall charge balances. All the stages were then plotted per set and a trend line was applied to test if there was a linear correlation. The charge balance R 2 values in Table 7 reveal strong linear correlations (> 0.90), verifying that the data are valid. Additionally, Fig. 3 shows the overall charge balance for every set. All of the sets agree with the trend, with the exception of set 24, which can be seen deviating from the rest of the data. This set coincided with New Year's fireworks, which produce a large amount of anionic species such as sulfate and nitrate as well as cationic metals, such as Fe and Cu. The combination of large anionic concentrations and the presence of cationic metals not included in the calculation lead to a charge balance slope below unity (i.e. more anions than cations).

Usage Notes
The data provided can be used to conduct various studies to improve understanding of regional PM effects and implications. The dataset can be synchronized up with the other CHECSM instruments set up by the Air Quality Dynamics-Instrumentation and Technology Development (AQD-ITD) laboratory, the AErosol RObotic NETwork (AERONET) station 43 , and meteorological and precipitation chemistry data collected by MO (Table 2).
There are a host of previous (7 SouthEast Asian Studies (7-SEAS) 2010-2018; Biomass-burning Aerosols & Stratocumulus Environment: Lifecycles and Interactions Experiment (BASELInE) 2013-2015) and ongoing (CAMP 2 Ex) research activities in southeast Asia from which this dataset can provide additional context. The dataset also has relevance for all global regions in that process-level understanding can be improved using a dataset with such a wide range of pollution scenarios in one of the most polluted cities of the world with diverse meteorological characteristics.
A few papers have been produced using portions of this dataset already. Cruz et al. 16 looked at size-resolved PM composition during the 2018 southwest monsoon season and conducted positive matrix factorization (PMF) to identify PM sources, which were attributed to aged PM, sea salt, combustion emissions, vehicular/resuspended dust, and waste processing emissions. Braun et al. 18 presented case examples of long-range transport of PM from east and southeast Asia, such as biomass burning from the Maritime Continent and transport from continental East Asia. They also presented examples of different transport pathways of pollution to the study site which yielded concentration differences for species such as K, Rb, Ba, V, Pb, Mo, and Sn. AzadiAghdam et al. 17 analyzed sea salt PM in Metro Manila and found that sea salt concentrations varied during the wet season and appeared to be contaminated by crustal and anthropogenic sources. Building off these limited examples using just a subset of the overall dataset, there are a significant number of topics that this dataset can be used to address, such as the following: • Impacts of PM on regional climate, clouds, and monsoon activity by (i) comparing PM composition to other cities around the world with and without monsoon seasons, (ii) combining the dataset with meteorological data from satellites and models to understand influences on aerosol composition via mechanisms such as photochemical processing, and (iii) relating surface PM concentrations to AOD from AERONET and satellite sensors to examine the vertical nature of aerosol in the region as has been done in other regions (e.g. ref. 44 ).
www.nature.com/scientificdata www.nature.com/scientificdata/ • Removal of PM via wet deposition by looking at what species are most effectively scavenged using precipitation data (e.g. refs. 45,46 ). • Aqueous processing of PM by looking at the changes of PM concentrations in the dry vs the wet season and additionally as a function of cloud coverage and aerosol liquid water amounts (e.g. refs. 47,48 ). • Source apportionment of PM by (i) observing seasonal changes in emissions (e.g. ref. 49   Ammonium 0(28) 0(37) 0(11) 0(8) 0(6) 0(2) 0(1) 0(0) 0(0) 0(0) 0(10) 0(8)  Table 6. Summary of the number of data points either missing (outside parenthesis) or below the LOD (inside parenthesis) for a given species and MOUDI stage. Note that there were a total of 54 possible data points for each species and stage. These counts exclude gravimetric and microscopy sets where chemical analysis was not performed. Refer to Table 3 for cutpoint diameters and diameter ranges.
• Impacts of extreme events on regional PM by examining (i) sets where holidays occurred (e.g. New Year's) and (ii) sets influenced by typhoons, which have been shown to impact aerosol in the general region, such as was shown in previous studies in Taiwan 57 . • Public health implications related to PM by examining the characteristic size distributions of species posing negative effects such as heavy metals and their general prevalence in Metro Manila.  Table 7. Slope and coefficient of determination (R 2 ) of the water-soluble charge balance for each MOUDI set. Values above 1 indicate there is an anion deficit. Only IC species and K from ICP-QQQ are taken into consideration for the charge balance calculations.