Zero-shot learning of aerosol optical properties with graph neural networks

Lamb, K. D.; Gentine, P.

doi:10.1038/s41598-023-45235-8

Download PDF

Article
Open access
Published: 31 October 2023

Zero-shot learning of aerosol optical properties with graph neural networks

Scientific Reports volume 13, Article number: 18777 (2023) Cite this article

898 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Black carbon (BC), a strongly absorbing aerosol sourced from combustion, is an important short-lived climate forcer. BC’s complex morphology contributes to uncertainty in its direct climate radiative effects, as current methods to accurately calculate the optical properties of these aerosols are too computationally expensive to be used online in models or for observational retrievals. Here we demonstrate that a Graph Neural Network (GNN) trained to predict the optical properties of numerically-generated BC fractal aggregates can accurately generalize to arbitrarily shaped particles, including much larger ($10\times$) aggregates than in the training dataset. This zero-shot learning approach could be used to estimate single particle optical properties of realistically-shaped aerosol and cloud particles for inclusion in radiative transfer codes for atmospheric models and remote sensing inversions. In addition, GNN’s can be used to gain physical intuition on the relationship between small-scale interactions (here of the spheres’ positions and interactions) and large-scale properties (here of the radiative properties of aerosols).

Improving air quality assessment using physics-inspired deep graph learning

Article Open access 27 September 2023

Predicting wind-driven spatial deposition through simulated color images using deep autoencoders

Article Open access 25 January 2023

Mapping Saturn using deep learning

Article 29 April 2019

Introduction

Carbonaceous aerosols such as black carbon (BC) are important short-lived climate forcers^1,2. To understand their impact on climate, accurate predictions of the optical properties of absorbing aerosols such as BC are needed in atmospheric models and observational retrievals: for estimating the top-of-the atmosphere radiative effects of black carbon³ and the impact of aged soot on cloud formation⁴, for the calculation of the mass absorption coefficient of BC deposited on snow⁵, for estimating the relative shortwave heating rates for different types of combustion aerosols⁶, for calculating particle-to-gas heat transfer to interpret laser-induced incandescence signals⁷, for accurate inversions of imaging nephelometers⁸, for constraining the index of refraction of biomass burning aerosols⁹, and for interpreting the optical properties of aerosols deposited on filters^10,11. Accurate calculations of carbonaceous aerosol optical properties are also important for observational retrievals in other planetary atmospheres, as these aerosols may play a role in the radiative balance of e.g. the middle atmosphere of Jupiter¹².

BC particles in the atmosphere have a variety of sizes, shapes, and chemical compositions, all of which impact their optical properties (Fig. 1). BC’s optical properties depend on both the morphology of the primary (bare) BC particle, as well as its internal mixing with other materials (coatings) through the condensation of gas phase species during atmospheric aging. Both combustion conditions¹³ and atmospheric aging¹⁴ impact the morphology of these aerosols, which are fractal-like aggregates, typically embedded within (internally mixed) or attached to other aerosol components. The complex morphology of bare BC is generally not parameterized in models, although modeling bare BC as a sphere biases radiative forcing estimates, with too little warming by absorption and too much cooling by scattering¹⁵. Internal mixing is modeled using a Mie Theory core-shell model, which approximates the bare BC portion as an absorbing “core”, with a concentric sphere of “coating” material with an index of refraction characteristic of the internally mixed material. Several recent papers have demonstrated this Mie Theory core-shell approximation leads to an over-prediction of BC absorption in models by as much as a factor of 2^13,16. In addition, not only are more accurate calculations of BC optical properties needed to better constrain models to observations, but models need to be capable of representing the heterogeneity of optical properties in diverse aerosol populations^13,16.

While models and observational retrievals have generally relied on Mie Theory, more accurate methods to predict the optical properties for arbitrarily shaped particles such as the Multiple Sphere T-Matrix Method (MSTM)^17,18, the discrete dipole approximation (DDA)^19,20, and the Generalized Multiple-Particle Mie (GMM) Theory^21,22 have been developed. These methods approximate BC fractal aggregates as clusters of spheres (Fig. 1) and provide analytical solutions to the time-harmonic Maxwell’s equations for the multiple sphere system. However, these approaches are computationally expensive, often requiring hours or even days to compute the optical properties of single aerosol particles with complex morphologies²³. To mitigate this computational bottle-neck, pre-calculated databases of fractal aggregate optical properties using these exact analytical methods have recently been created^23,24,25,26, but such approaches are limited to linear interpolation within the data-bases’ optical and morphological properties. There is still significant uncertainty about the fundamental properties of BC from different emission sources and under different combustion conditions, and the additional complexity of internal mixing with non-absorbing and absorbing materials during atmospheric aging² would require these databases to cover a very large parameter space to accurately represent the range of conditions for BC aerosols observed in the atmosphere. Moreover, observational inversions of BC have greater uncertainty when performed with only a subset of possible parameters.

Machine learning offers a promising approach for reducing computational bottle-necks by speeding up numerically-intensive aspects of atmospheric models^27,28. As such it could offer an efficient alternative approach to compiling pre-computed databases for BC’s optical properties. However machine learning methods are traditionally strongly dependent on the data they are trained with, and struggle to generalize beyond the training distribution. One previous study investigated a machine learning approach to predicting BC’s optical properties from its morphological parameters and index of refraction using a support vector machine (SVM) trained on accurate MSTM calculations but could not accurately predict the optical properties of aggregates with morphological parameters beyond those used in the initial training data set²⁹. Other brute force approaches such as neural networks (NN) or random forests (RF) will similarly struggle to generate realistic BC properties outside of the training datasets.

Here we show the optical properties of bare BC with complex morphology can be accurately predicted with a graph neural network (GNN) by representing BC fractal aggregates as networks of interacting spheres. GNN’s are recently developed machine learning algorithms that learn on graph-structured data sets, allowing models to directly include arbitrary relational information^30,31. These models have shown great promise in predicting the large-scale properties of structured physical science data-sets such as molecules^32,33, protein-protein interaction networks³⁴, and glasses³⁵. GNN’s have demonstrated skill in predicting complex global features of physical systems through learning simpler local physics³⁶; here we demonstrate that through including local information about BC’s structure, BC’s global properties can be inferred. Importantly, because GNN’s learn models for specific substructures (i.e. the nodes and their relationships with their neighbors in the graph), they are able to immediately generalize to graphs with arbitrary numbers of nodes; we exploit this feature of GNN’s to predict the optical properties of BC aggregates that are significantly larger than those used in the training data set. This zero-shot learning (where models can immediately generalize to samples not represented in their original training data) paves the way towards new, flexible parameterizations of aerosol microphysical properties and serves as a template for the use of GNN’s in the Earth sciences.

BC fractal aggregates as networks

Physical properties of bare BC Primary (bare) BC particles are fractal-like aggregates with geometries that can be described according to a statistical scaling rule as

$$\begin{aligned} N_{s} = k_{f}\left( \frac{R_{g}}{a}\right) ^{D_{f}} \end{aligned}$$

(1)

where a is the primary particle mean radius, $k_{f}$ is the fractal pre-factor, $D_{f}$ is the fractal (Hausdorff) dimension, $N_{s}$ is the number of primary spheres, or monomers, in the aggregate, and $R_{g}$ is the radius of gyration, defined as

$$\begin{aligned} R_{g}^{2} = \frac{1}{N_{s}}\sum _{n=1}^{N_{s}}(\textbf{r}_{i}-\textbf{r}_{0})^{2} \end{aligned}$$

(2)

where $\textbf{r}_{i}$ and $\textbf{r}_{0}$ denote the ith monomer center and the center of mass of the cluster, respectively (assuming all monomers have the same mass³⁷). In addition to the aggregate geometry, the basic physical properties of these particles follow this scaling law³⁸. As a consequence of their fractal nature, aggregates are self-similar on different length scales. The fractal dimension $D_{f}$ can be thought of intuitively as the shape-filling capacity of the aggregate; aggregates with smaller fractal dimensions are “fluffier”, while aggregates with larger fractal dimensions are denser. The fractal prefactor $k_{f}$ of the aggregate is related to the packing of spheres into space and the anisotropy of the aggregate, with more “stringy” aggregates having smaller values of $k_{f}$, and more isotropic and collapsed aggregates having larger values of $k_{f}$^39,40.

The fractal-like nature of these aerosols is a result of their formation from gas-phase precursors through the aggregation and growth of hydrocarbon clusters during incomplete combustion, although this process is not yet completely understood⁴¹. The initial morphology depends on both the combustion conditions and the emission source, with different observational methods also impacting the retrieved parameters¹⁵. After their initial formation during combustion, atmospheric aging (due to cloud processing or the condensation of gas phase species) leads to these aerosols becoming more compact, causing $D_{f}$ to increase over time. This aging is expected to lead to a decrease in their top of the atmosphere radiative effects¹³. Previous work has shown that $k_{f}$ determines the compactness of aggregate branches, although little is understood about $k_{f}$’s evolution over time¹⁵.

Numerically-generated fractal aggregates

To investigate how fractal aggregate particles can be modeled as networks of interacting spheres, we numerically generated fractal aggregates with $N_{s}$ spheres using a cluster-cluster algorithm⁴² based on the one described in³⁸, which uses a Monte Carlo approach to randomly generate aggregates with a specified fractal dimension $D_{f}$ and fractal pre-factor $k_{f}$. We generate Cartesian coordinates for the monomers in the aggregate in dimensionless coordinates by scaling by a factor of $k=\frac{2\pi }{\lambda }$, where $\lambda$ is the wavelength of the incident light.

Characteristic length scale

The characteristic length scale of a network with N nodes is $C = Log(N)$⁴³. Here we want to develop a method for rendering fractal aggregates as graphs, with the assumption that the monomers in the aggregate should be represented by nodes in the graph. To represent fractal aggregates as graphs, monomers with center positions closer together than the characteristic length scale C of a network with $N_{s}$ nodes,

$$\begin{aligned} C=X_{v}Log(N_{s}) \end{aligned}$$

(3)

are connected, where $X_{v}=ka$ is the monomer size parameter (Fig. 2). We multiply the length scale by $X_{v}$ to give a consistent number of edges independent of the size parameter of the aggregate, such that aggregates with the same fractal parameters but different size parameters would be encoded within the same graph structure. An example of the resulting undirected graph structure and adjacency matrix for two different aggregates with different fractal dimensions but the same number of monomers is shown in Fig. 3a–d. This scaling encodes the density of edges in local neighborhoods relative to the fractal dimension of the aggregate, irregardless of the actual size of the aggregate. The total number of edges in the graph is then proportional to both $N_{s}$ and $D_{f}$ (Fig. 3e), with the average degree of nodes increasing relative to $D_{f}$ (SI Fig. S4a). The degree distribution of nodes also depends on the fractal pre-prefactor $k_{f}$ (SI Fig. S4b).

GNN model for BC optical properties

Accurate solutions for the electromagnetic scattering and absorption properties for multiple sphere clusters (as BC aggregates are typically modeled) is computationally expensive because a full-wave optics treatment is needed. In the general case, spheres interact with one another, and the total scattering field component is a superposition of the components radiated from each sphere in the system⁴⁴. While the solution for the continuity equation at the surface of each sphere in the system can be solved analytically by expanding the incident and scattered fields from each sphere in terms of vector spherical wave functions, this approach generates a very large system of coupled linear equations that must be solved iteratively⁴⁵. Additional details about the formal solution are given in Supplementary Information S1.

While this approach provides a fully analytical solution for light scattering from the multiple sphere cluster, the computational time for these brute-force approaches scale significantly with $N_{s}$ and $X_{v}$ as they do not take into account specific details of BC’s topological structure, which could lend itself to model order reduction. Filippov et al.³⁸ previously explored the relationship between the morphology of BC and their aggregate physical properties using the Rayleigh-Debye-Gans (RDG) approximation and found that aggregates with similar fractal parameters also have similar physical properties. Recent work in²³ found empirical relationships between the optical properties of aggregates and their morphological parameters using extensive MSTM calculations. Machine learning offers an alternative approach for learning relevant predictors without the need for human-defined features; GNN’s in particular can learn features that correspond to the relationship between the nodes (the individual spheres) and the large-scale physical properties of the aggregates.

GNN’s are particularly attractive as emulators of MSTM because they provide strong relational inductive bias, which typically means that algorithms require less training data than fully connected neural networks or convolutional neural networks to make skillful predictions. Since MSTM is relatively slow (and methods such as DDA are approximately $10\times$ slower than MSTM), it is non-trivial to develop large training data sets for machine learning algorithms. Second, the non-trivial topological structure of these aerosols is directly related to the complexity of their optical properties’ calculation, as the radiation incident on each individual monomer is a function of the position and orientation of all of the other monomers in the aggregate, with the neighboring monomers likely to have the most significant influence. The GNN approach of framing this problem from the perspective of message-passing between neighboring nodes is directly analogous to the electromagnetic scattering and absorption problem for the multiple sphere cluster. Finally, as discussed in the introduction, GNN emulation of physical simulators has been shown to generalize to new, previously unseen realizations of physical systems (so-called zero-shot performance)³¹. To bridge the gap between the very accurate physical information gained in process level studies of individual aerosol properties, and the understanding of how populations of these aerosols evolve in atmospheric models, we need either much better approximation methods or much faster methods to accurately calculate aerosol properties. GNN emulators that quickly and accurately generalize to new configurations could provide an online approach to estimate the optical properties of populations of aerosols in atmospheric models.

To investigate the connection between BC’s fractal structure and its optical properties, we trained a GNN to predict the optical properties of BC aggregates, using the values from an analytical solution for the electromagnetic scattering and absorption properties (from MSTM) as ground-truth (Fig. 2). GNN’s propagate information between nodes, capturing both the topological information about the graph structure and aggregating the node features. We tested several different approaches for the propagation rule, including a simple graph convolution network⁶², a graph convolutional network³⁰, and an Interaction Network (IN)³¹. We found the IN gave the best performance for predicting both the integral and angle-resolved optical properties. The IN (Fig. 2) is based on message passing, where nodes send and receive messages along edges from their neighbors. The messages are aggregated for each node and the nodes are updated based on the central node features and the messages received from neighboring nodes. Graph level predictions are made by aggregating the updated node embeddings from all the nodes in the graph using a graph pooling operation (the pooling in Fig. 2). We test summation, averaging, and maximum pooling as graph pooling operations, and find that summation works best. After the graph pooling operation, a final linear transformation is performed to transform aggregated node embeddings to the target predictions (the graph readout in Fig. 2). Here we predict the total extinction, scattering, and absorption efficiencies $\langle Q_{ext}\rangle$, $\langle Q_{scat}\rangle$, $\langle Q_{abs}\rangle$, the asymmetry parameter, g, and the angle-resolved elements of the scattering phase matrix $S_{ij}(\theta )$ for the orientation-averaged case. (See Methods for discussion of aerosol optical properties and data sets). Further details of the GNN approaches tested in this work are proved in the SI.

For each training example, we input $X_{v}$, the real part of the index of refraction $Re(n_{k})$ (since we consider only cases where the imaginary part is $Im(n_{k})=1-Re(n_{k})$), and the dimensionless coordinates of each sphere as node features. As edge features, we use the distance between neighboring spheres. We trained the model using 15,314 aggregates from the training data set, as the training loss did not significantly decrease with additional samples (SI Fig. S13); training data sets as small as 3000 aggregates showed reasonable generalization performance. The training data set consisted of aggregates with a small number of monomers ($N_{s} < 100$). We tested the model on an independent test set of 7656 aggregates with the same distribution of parameters as the training data set ($N_{s} < 100$). We further investigated the generalizability of the model on an independent zero-shot test set of 440 aggregates that were significantly larger $(100< N_{s} < 1000)$ than the ones the model was trained on. An additional 440 large aggregates were used as a zero-shot validation data set to determine which model architecture provided the best zero-shot performance (SI Figs. S9–S12). While the model weights were not directly trained on this zero-shot validation data set, the hyperparameters for the best model architecture were determined from performance on this data set; thus, we use an independent zero-shot test data set to evaluate generalization performance. $N_{s}=100$ was chosen as the maximum size for aggregates in the training data set as smaller maximum sizes increased the bias in the zero-shot performance (SI Fig. S14). The zero-shot test data set was evenly distributed among the aggregate parameters (Fig. S2) to provide an estimate of generalization performance across the full parameter space.

Figure 4 shows the IN predictions compared to the actual values for $\langle Q_{ext}\rangle$, $\langle Q_{scat}\rangle$, $\langle Q_{abs}\rangle$, and g with the training data shown in blue, the test data set of smaller aggregates in yellow (top row, Fig. 4a–d) and the zero-shot test data sets shown in orange (bottom row, Fig. 4e–h). Figure 5 shows the predictions for the $S_{11}(\theta )$ element of the scattering phase matrix for several different aggregates in the test data sets of smaller aggregates (Fig. 5a) and larger aggregates (Fig. 5c). For the test data with the same distribution of parameters as the training data set ($N_{s} < 100$), the model predictions were very close to the true values. For the zero-shot test data set, predictions for both integral and angle resolved optical properties were reasonable across the entire range of size parameters ($X_{v}$=0.1 to 1.0), indices of refraction $n_{k} = 1.4+0.4i$ to $n_{k}=2.0+1.0i$, and fractal parameters. For the prediction of $S_{11}$, both the magnitude and functional form were well-approximated across the range of parameters in the test set, although the model did deviate slightly more from the true values for larger $N_{s}$ and $X_{v}$ (e.g. the green line in Fig. 5c). Predictions for the entire angle-resolved scattering phase matrix elements $S_{ij}(\theta )$, for $j \ge i$, were also reasonable (See SI Fig. S16).

Table 1 gives the mean absolute percentage error (MAPE) for the predictions from the IN model for the integral optical properties and the asymmetry parameter for the training, test, and zero-shot test data sets. We use MAPE as a metric to assess the performance of the model predictions because this metric is independent of the size of the data sets. MAPE can be interpreted in terms of relative error, which means that the performance on the test data set and zero-shot test data set are directly comparable; metrics such as MSE depend on the absolute magnitude of the integral optical properties, which differ between the test and zero-shot test data sets. We find that the predictions for $Q_{ext}$ and $Q_{abs}$ are within 2% of the true value for the training and test data sets, and within 4% of the true values for the zero-shot test data set. $Q_{scat}$ and g have more significant deviations between the true and predicted values for the IN model (Table 1, 2nd column), but this is mainly due to a bias in the predictions for the IN model for the smallest $X_{v}$ values, because the magnitude of $Q_{scat}$ is so small. At larger $X_{v}$, the model performance is within 2–9% of the true value for the training and test data sets, and within 4–8% for the zero shot test data set. The IN model generally performs best as larger size parameters for the predictions of the integral optical properties. The bias for smaller values of $X_{v}$ may be improved by training on each $X_{v}$ separately or alternatively, by using methods such as meta-learning⁴⁶.

Table 1 MAPE values for GNN model prediction of integral optical properties and asymmetry parameter.

Full size table

In addition to generalizability, the IN model demonstrated physical consistency in its predictions for the aggregate optical properties. The 3 scattering efficiencies are not independent, as $\langle Q_{sca} \rangle +\langle Q_{abs} \rangle =\langle Q_{ext} \rangle$. The model directly inferred this dependency for both the training and test sets without imposing this as a constraint. Additionally, integrating $S_{11}$ over the solid angle is equivalent to g⁴⁴,

$$\begin{aligned} g = \frac{1}{2}\int S_{11}(\theta )cos(\theta ) d\Omega = \frac{1}{2}\int _{0}^{\pi } S_{11}(\theta )cos(\theta ) sin(\theta ) d\theta \end{aligned}$$

(4)

Without explicitly imposing this integral constraint, the model predictions were consistent with this constraint (Fig. 5b, $N_{s}<100$, and Fig. 5d for $N_{s}>100$ ).

Analysis of the GNN predictions

To understand how the IN model predicts the optical properties of BC fractal aggregates, including those much larger than the model was trained on, we emphasize that the graph input for the model does not directly include $D_{f}$ or $k_{f}$ as features but rather the fractal structure is implicitly encoded as the interactions between the neighboring spheres. The previously used SVM approach to predict BC’s optical properties included $N_{s}$, $D_{f}$, and $k_{f}$ as features to predict $\langle Q_{ext}\rangle$, $\langle Q_{scat}\rangle$, $\langle Q_{abs}\rangle$, and g²⁹. Since the network structure in the IN approach is directly learned from the sphere positions, and the model is learned at the node level, the IN approach can generalize beyond its initial training set for these morphological parameters to unseen configurations.

The generalization of the IN model to a range of $D_{f}$ is an important feature, as it is challenging to find approximations that are valid across fractal dimension⁴⁷. Because the IN model learns about the local neighborhood of each sphere, it is able to more accurately estimate the impacts of screening on absorption and scattering than the RDG approximation^44,47, an approach often used to approximate the optical properties of BC aggregates in a computationally efficient manner as an improvement on the equivalent sphere model. RDG assumes that individual monomers only interact with the incident electromagnetic field (neglecting multiple interactions), which can lead to absorption being under-predicted by 10–20%, and significantly under-predicting g by more than a factor of 10¹⁵. The IN model effectively learns, in an unsupervised manner, a simplified sphere level model that more fully captures the complexity of the optical properties of the full analytical solution¹⁵. As noted earlier, the best performance for the IN model used summation for graph pooling, which is physically consistent with the node level model learning the Mie theory solution for the individual spheres in the aggregate, given their interaction with neighboring spheres.

The optical properties of aggregates in this regime can be modeled with the assumption of a fairly shallow graph model (for the IN model a single layer performed best; for the GCN little improvement was seen beyond 3 or 4 layers, Fig. S5), suggesting that the majority of the structure influencing the optical properties of aerosols in this regime can be approximated from local interactions. We also investigated using a length scale of $C=X_{v}Log(N_{s})/Log(Log(N_{s}))$ (characteristic of scale-free networks) to form graphs from aggregates⁴³, rather than Eq. 3. This length scale has the advantage that the degree of each node scales less quickly with $N_{s}$, but the IN model performed worse in this case. This indicates that including a larger local neighborhood at each layer (Eq. 3) is more informative for the model.

Discussion and outlook

The network approach presented here provides a new framework for understanding the microphysical relationship between the morphological properties of BC and its larger scale physical properties. Here we have chosen to focus on the prediction of optical properties for numerically generated fractal aggregates, as the generation of these aggregates from combustion processes and their transformation during atmospheric aging is not yet completely understood. However, applying network theory to atmospheric aerosols suggests new directions for thinking about the generation of these fractal aggregates through combustion processes due to the connection between complex networks and percolation theory⁴⁸. Here we have used a cluster-cluster algorithm, although previous work has noted that the morphology of numerically generated fractal aggregates depends not only on the parameters ($N_{s}$, a, $D_{f}$, and $k_{f}$) defining the shape of the aggregate, but also on which algorithm is used to generate the sphere positions (e.g. diffusion-limited aggregation or diffusion-limited cluster aggregation)^38,39. The network approach provides a new framework from which to understand how realistically numerical algorithms reproduce the properties of aerosols formed during incomplete combustion through comparison of their network characteristics⁴³. This approach may also be useful for inferring 3 dimensional structure of aggregates from 2 dimensional transmission electron microscope (TEM) images of these aerosols^10,11, since it relates the relative positions of spheres to their overall morphological features; 2D methods have previously been shown to systematically underestimate the fractal dimension of BC⁴⁹. Recent methods such as graph cumulants could provide sophisticated approaches to describe substructures of graphs (such as motifs or cliques used to describe clustering)⁵⁰. Since any particular network observation is a single realization of an underlying generative process (in this case, the generation of primary aerosol particles from combustion sources), this framework could allow for an unbiased estimator of the variance of the propensity for specific graph substructures as a result of this generative process. These estimators could be used to compare specific aggregate-generating algorithms to observations of real fractal aerosol particles to assess the realism of the algorithms.

As a proof of concept we have trained a GNN to predict the optical properties of bare BC fractal aggregates with a range of different fractal parameters. This study demonstrates that modeling aerosol fractal aggregates as networks of interacting spheres provides morphological information that allows the machine learning model to extrapolate far beyond their initial training data set. This approach may also be useful for other fractal systems found in nature, such as turbulence, vegetation, or river networks.

BC in the atmosphere is typically internally mixed. The GNN approach provides an obvious extension to internally mixed aerosols (Fig. 1), as the thickness of coatings and their indices of refraction or organic fraction could be included as additional node-level features (in the thinly coated case) or graph-level features (for the thickly coated case). Other factors influencing the optical properties of aggregates such as “necking” between overlapping monomers could be included as edge features. Because atmospheric aerosol retrievals rely on orientation averaged parameters, models for predicting the scattering phase function should be equivariant under rotations. Recently developed equivariant machine learning methods^51,52,53,54 may provide improved prediction of the orientation averaged optical properties.

Uncertainty in BC direct radiative climate effects is attributable to multiple factors, including BC’s emissions, lifetime, atmospheric processing, and optical properties^1,2,55; the GNN approach could help resolve this uncertainty by improving both the interpretation of BC observations and by allowing BC’s morphology to be accurately represented in atmospheric models in a computationally efficient manner. As a greater understanding of BC’s physical properties from different source contributions and atmospheric aging pathways becomes available through laboratory and observational studies^13,14,16, the major remaining hurdle to accurately representing BC in models will be computational.

While previous exact analytical methods have computational wall-times scaling from hours to days for larger aggregates, inference is on the order of < 0.3 s per aggregate for the trained GNN model (On a CPU– see SI Fig. S15). The computational time for these exact analytical methods has precluded exact calculations of aerosol optical properties being used in models or observational retrievals. CELES, a CUDA-accelerated version of MSTM capable of running on a GPU, demonstrated a factor of 1.5–6 times speed up over MSTM, but was still too slow to be implemented online in models⁵⁶. The significantly faster time-scale for the GNN model, as well as its generalizability to arbitrarily shaped aggregates compared to more standard ML methods, has the potential to transform existing model parameterizations for BC. For MSTM computational wall times scale with $N_{s}$, $X_{v}$, and $D_{f}$; while the total inference and memory scales with $N_{s}$ and $D_{f}$ in the GNN approach, it is no longer a function of $X_{v}$.

We have focused here on the forward problem of predicting the optical properties of BC given an assumed single particle morphology; however such an approach may also be useful for the inverse problem, i.e. inferring the morphology given the scattering phase function and integral optical properties. This approach could also provide insight into other physical properties which require detailed information about particle morphology³⁸, such as energy and heat transfer between aggregates and the surrounding gas needed to develop physical models of laser-induced incandescence^7,57. Radiative transfer calculations for mineral dust and ice crystals also rely on detailed information about particle morphology, suggesting that the GNN approach would be useful for modeling their optical properties as well. This approach could mitigate several long-standing issues with model parameterizations and observational retrievals for these species, by providing flexible parameterization of arbitrarily shaped aerosol and cloud particles that are fast enough to be deployed online in atmospheric models.

Finally, these methods have potential for new applications of machine-learning assisted materials discovery^58,59. Proposed geo-engineering approaches to mitigate global or regional impacts of climate change, such as stratospheric aerosol injection, marine cloud brightening, or precipitation enhancement, rely on the development of novel aerosol materials. Generative graph models could be used to determine optimal aerosol morphologies resulting in physical properties specific to these applications at a fraction of the cost of traditional numerical methods⁶⁰.

Methods

Numerical aggregate properties

Cartesian coordinates for the positions of spheres in aggregates were determined using a cluster-cluster algorithm^38,42. This cluster-cluster algorithm starts with primary clusters of size $N_{c}$, and then randomly agglomerates these clusters pairwise into larger clusters. The algorithm repeats this process for multiple levels until it has combined all clusters into one larger cluster, which satisfies the scaling laws given by Eq. (1). Primary clusters of size $N_{c}=3,4,5,7,9,11,13,15,17,19$ were used to generate aggregates between $N_{s}$ = 8–960 spheres with fractal dimensions between $D_{f}$ = 1.8–2.3. Following²³, we assume a fractal pre-factor of $k_{f}$=1.2 (for the aggregates used in the MSTM calculations). We also investigated the network parameters of aggregates with $k_{f}=$ 1.0–1.5, for a given $D_{f}=1.8$ (Fig. S4b). Aerosols are assumed to consist of isotropic, homogeneous spheres, with size parameters $X_{v} =$ 0.1, 0.3, 0.5, 0.7, 0.9, and 1.0, corresponding to monomer radii between 7–72 nm for incident light at 450 nm and 10–104 nm at 650 nm. For each primary cluster size and fractal dimension, 10 aggregate realizations were randomly generated.

Aerosol optical properties

For radiative transfer applications, the orientation-averaged total scattering $\langle Q_{sca} \rangle$, extinction $\langle Q_{ext}\rangle$, and absorption efficiencies $\langle Q_{abs}\rangle$, as well as the asymmetry parameter $g = \langle C_{sca} cos(\theta ')\rangle$ are typical parameters that are needed ($C_{sca}$ is the scattering cross-section which is related to the efficiency as $Q_{sca}=C_{sca}/(pi*a_{agg}^{2})$, where $a_{agg}$ is the effective radius of the aggregate). The asymmetry parameter relates the amount of forward to back-scattered light. Other parameters relevant for radiative transfer, such as the single scattering albedo (SSA), can be derived from these parameters (SSA = $Q_{sca}/Q_{ext}$). The mass absorption coefficient (MAC) or mass extinction coefficient (MEC) are typically used to relate emissions of these aerosols to their direct radiative effects, and they are sometimes estimated theoretically from $\langle C_{abs}\rangle$ or $\langle C_{ext}\rangle$ with assumptions about particle density.

The scattering phase function relates the incident and scattered Stokes parameters, e.g., it indicates how light scattering from the particle is transformed relative to incident light in terms of its intensity and polarization state⁴⁴. Here we assume initially unpolarized incident light, in which case the $S_{11}$ element specifies the angular distribution of the intensity of scattered relative to incident light. The scattered light is partially polarized, with degree of polarization given by $\sqrt{(S_{21}^{2}+S_{31}^{2}+S_{41}^{2})/S_{11}^{2}}$.

MSTM calculations of bare BC optical properties

To determine the ground-truth optical properties for the BC fractal aggregates generated by the cluster-cluster algorithm we use the Fortran-90 implementation of the multiple-sphere T-matrix code as described in⁴⁵, which can run on a high-performance, parallel based computational platform. This code numerically solves for electromagnetic wave scattering from multiple (non-overlapping) sphere systems for either a fixed or random (orientation-averaged) orientation with respect to an incident plane wave. Here we have focused on calculation of random orientation optical properties, which utilizes the T-matrix procedure developed in¹⁸. We assume indices of refraction consistent with a range of values from the literature for BC at 550 nm: (1.4+0.4i, 1.6+0.6i, 1.8+0.8i, 2.0+1.0i). MSTM calculations were performed for these range of indices of refraction for 57,556 numerically generated aggregates for $N_{s} < 100$; we used randomly chosen aggregates from this data set for the training, validation, and test sets for the model. To test the zero-shot performance, MSTM calculations were performed for 880 aggregates with these parameters in the size range $100< N_{s} < 1000$; we randomly split this data into a zero-shot validation data set to evaluate the model’s performance and an independent zero-shot test data set. A summary of the range of parameters for each data set is given in Table S1. The distribution of parameters among the small ($N_{s}<100$) and large ($N_{s}>100$) aggregates are shown in Figs. S1 and S2, and the integral optical properties calculated with MSTM are shown in Fig. S3.

Graph neural networks

We used Pytorch Geometric⁶¹ to implement the GNN models. Several GNN approaches were tested, including a simple graph convolutional network (SGC)⁶², a graph convolutional network (GCN)³⁰, and an interaction network (IN)³¹ (See Supplementary Information S1 for additional details of the graph models and a comparison of performance metrics among different model parameters and targets). The best performance for the integral optical properties used an IN model with a hidden layer size of 300 for both the node and edge models, and a message size of 100. Both the node and edge models are MLPs with ReLU as non-linear activation function between layers. Aggregation for the edge model is addition, with global mean pooling followed by dropout (p = 0.5) and a linear layer of size 100 as the global aggregation function. For the prediction of $S_{11}$ we found that adding a fully connected node to each graph slightly improved the zero-shot performance. The model architecture was the same as that used to predict the integral optical properties. A batch size of 20 was used (training with a batch size of 2 led to slower training but did not lead to significantly worse performance). For the graph regression task, MSE loss was assumed. We trained the GNN models on a Nvidia RTX 8000 GPU.

Data availability

BC graph data sets are available in an open source repository (10.5281/zenodo.5108834).

Code availability

Code is available at https://github.com/kdlamb/BC-GNN.git.

References

Bond, T. C. et al. Bounding the role of black carbon in the climate system: A scientific assessment. J. Geophys. Res. Atmos. 118(11), 5380–5552 (2013).
Article ADS CAS Google Scholar
Liu, D., He, C., Schwarz, J. P. & Wang, X. Lifecycle of light-absorbing carbonaceous aerosols in the atmosphere. NPJ Clim. Atmos. Sci. 3(1), 1–18 (2020).
Article Google Scholar
Wu, Y., Cheng, T., Zheng, L. & Chen, H. Black carbon radiative forcing at toa decreased during aging. Sci. Rep. 6, 38592 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Lohmann, U. et al. Future warming exacerbated by aged-soot effect on cloud formation. Nat. Geosci. 13(10), 674–680 (2020).
Article ADS CAS Google Scholar
Schwarz, J., Gao, R., Perring, A., Spackman, J. & Fahey, D. Black carbon aerosol size in snow. Sci. Rep. 3(1), 1–5 (2013).
Article Google Scholar
Moteki, N. et al. Anthropogenic iron oxide aerosols enhance atmospheric heating. Nat. Commun. 8(1), 1–11 (2017).
Article Google Scholar
Michelsen, H., Schulz, C., Smallwood, G. & Will, S. Laser-induced incandescence: Particulate diagnostics for combustion, atmospheric, and industrial applications. Prog. Energy Combust. Sci. 51, 2–48 (2015).
Article Google Scholar
Manfred, K. M. et al. Investigating biomass burning aerosol morphology using a laser imaging nephelometer. Atmos. Chem. Phys. 18(3), 1879–1894 (2018).
Article ADS CAS Google Scholar
Womack, C. C. et al. Complex refractive indices in the ultraviolet and visible spectral region for highly absorbing non-spherical biomass burning aerosol. Atmos. Chem. Phys. Disc. 2020, 1–29 (2020).
Google Scholar
Chakrabarty, R. K. et al. Simulation of aggregates with point-contacting monomers in the cluster-dilute regime. part 1: Determining the most reliable technique for obtaining three-dimensional fractal dimension from two-dimensional images. Aerosol Sci. Technol. 45(1), 75–80 (2011).
Article ADS CAS Google Scholar
Chakrabarty, R. K. et al. Simulation of aggregates with point-contacting monomers in the cluster-dilute regime. part 2: Comparison of two-and three-dimensional structural properties as a function of fractal dimension. Aerosol Sci. Technol. 45(8), 903–908 (2011).
Article ADS CAS Google Scholar
Zhang, X., West, R. A., Irwin, P. G., Nixon, C. A. & Yung, Y. L. Aerosol influence on energy balance of the middle atmosphere of jupiter. Nat. Commun. 6(1), 1–9 (2015).
Article ADS Google Scholar
Wu, Y. et al. The role of biomass burning states in light absorption enhancement of carbonaceous aerosols. Sci. Rep. 10(1), 1–10 (2020).
Google Scholar
Wang, Y. et al. Fractal dimensions and mixing structures of soot particles during atmospheric processing. Environ. Sci. Technol. Lett. 4(11), 487–493 (2017).
Article CAS Google Scholar
Kahnert, M. & Kanngießer, F. Modelling optical properties of atmospheric black carbon aerosols. J. Quant. Spectrosc. Radiat. Transfer 244, 106849 (2020).
Article CAS Google Scholar
Fierce, L. et al. Radiative absorption enhancements by black carbon controlled by particle-to-particle heterogeneity in composition. Proc. Natl. Acad. Sci. 117(10), 5196–5203 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Mackowski, D. W. Calculation of total cross sections of multiple-sphere clusters. JOSA A 11(11), 2851–2861 (1994).
Article ADS Google Scholar
Mackowski, D. W. & Mishchenko, M. I. Calculation of the t matrix and the scattering matrix for ensembles of spheres. JOSA A 13(11), 2266–2278 (1996).
Article ADS Google Scholar
Purcell, E. M. & Pennypacker, C. R. Scattering and absorption of light by nonspherical dielectric grains. Astrophys. J. 186, 705–714 (1973).
Article ADS Google Scholar
Yurkin, M. A. & Hoekstra, A. G. The discrete dipole approximation: An overview and recent developments. J. Quant. Spectrosc. Radiat. Transfer 106(1–3), 558–589 (2007).
Article ADS CAS Google Scholar
Xu, Y.-L. Electromagnetic scattering by an aggregate of spheres. Appl. Opt. 34(21), 4573–4588 (1995).
Article ADS CAS PubMed Google Scholar
Xu, Y.-L. & Gustafson, B. Å. A generalized multiparticle mie-solution: Further experimental verification. J. Quant. Spectrosc. Radiat. Transfer 70(4–6), 395–419 (2001).
Article ADS CAS Google Scholar
Liu, C., Xu, X., Yin, Y., Schnaiter, M. & Yung, Y. L. Black carbon aggregates: A database for optical properties. J. Quant. Spectrosc. Radiat. Transfer 222, 170–179 (2019).
Article ADS Google Scholar
Kahnert, M. Numerically exact computation of the optical properties of light absorbing carbon aggregates for wavelength of 200 nm-12.2 mu m. Atmos. Chem. Phys. 10(17), 8319–8329 (2010).
Article ADS CAS Google Scholar
Smith, A. & Grainger, R. Simplifying the calculation of light scattering properties for black carbon fractal aggregates. Atmos. Chem. Phys. 14, 15 (2014).
Article Google Scholar
Romshoo, B. et al. Radiative properties of coated black carbon aggregates: Numerical simulations and radiative forcing estimates. Atmos. Chem. Phys. Disc. 2021, 1–24 (2021).
Google Scholar
Gentine, P., Pritchard, M., Rasp, S., Reinaudi, G. & Yacalis, G. Could machine learning break the convection parameterization deadlock?. Geophys. Res. Lett. 45(11), 5742–5751 (2018).
Article ADS Google Scholar
Rasp, S., Pritchard, M. S. & Gentine, P. Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. 115(39), 9684–9689 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Luo, J., Zhang, Y., Wang, F., Wang, J. & Zhang, Q. Applying machine learning to estimate the optical properties of black carbon fractal aggregates. J. Quant. Spectrosc. Radiat. Transfer 215, 1–8 (2018).
Article ADS CAS Google Scholar
Kipf, T. N., & Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., & Faulkner, R. et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).
Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., & Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232 (2015).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272, PMLR (2017).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706–710 (2020).
Article ADS CAS PubMed Google Scholar
Bapst, V. et al. Unveiling the predictive power of static structure in glassy systems. Nat. Phys. 16(4), 448–454 (2020).
Article CAS Google Scholar
Xie, T., France-Lanord, A., Wang, Y., Shao-Horn, Y. & Grossman, J. C. Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials. Nat. Commun. 10(1), 1–9 (2019).
Article Google Scholar
Forrest, S. & Witten, T. Jr. Long-range correlations in smoke-particle aggregates. J. Phys. A: Math. Gen. 12(5), L109 (1979).
Article ADS CAS Google Scholar
Filippov, A., Zurita, M. & Rosner, D. Fractal-like aggregates: Relation between morphology and physical properties. J. Colloid Interface Sci. 229(1), 261–273 (2000).
Article ADS CAS PubMed Google Scholar
Sorensen, C. M. & Roberts, G. C. The prefactor of fractal aggregates. J. Colloid Interface Sci. 186(2), 447–452 (1997).
Article ADS CAS PubMed Google Scholar
Heinson, W., Sorensen, C. & Chakrabarti, A. Does shape anisotropy control the fractal dimension in diffusion-limited cluster-cluster aggregation?. Aerosol Sci. Technol. 44(12), i–iv (2010).
Article CAS Google Scholar
Johansson, K., Head-Gordon, M., Schrader, P., Wilson, K. & Michelsen, H. Resonance-stabilized hydrocarbon-radical chain reactions may explain soot inception and growth. Science 361(6406), 997–1000 (2018).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Moteki, N. An efficient c++ code for generating fractal cluster of spheres (v1.1) (2019).
Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002).
Article ADS MathSciNet MATH Google Scholar
Bohren, C. F., & Huffman, D. R. Absorption and scattering of light by small particles. John Wiley & Sons (2008).
Mackowski, D. W. & Mishchenko, M. I. A multiple sphere t-matrix fortran code for use on parallel computer clusters. J. Quant. Spectrosc. Radiat. Transfer 112, 2182–2192 (2011).
Article ADS CAS Google Scholar
Finn, C., Abbeel, P., & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135, PMLR (2017).
Sorensen, C. Light scattering by fractal aggregates: A review. Aerosol. Sci. Technol. 35(2), 648–687 (2001).
Article ADS MathSciNet CAS Google Scholar
Deprez, P., Wüthrich, M. V. Networks, random graphs and percolation. In Theoretical aspects of spatial-temporal modeling, pp. 95–124 (Springer, 2015).
Adachi, K., Chung, S. H., Friedrich, H., & Buseck, P. R. Fractal parameters of individual soot particles determined using electron tomography: Implications for optical properties. J. Geophys. Res. Atmos. 112, D14 (2007).
Gunderson, L. M., & Bravo-Hermsdorff, G. Introducing graph cumulants: What is the variance of your social network? arXiv preprint arXiv:2002.03959 (2020).
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, L., & Riley, P. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219 (2018).
Kondor, R., Lin, Z., & Trivedi, S. Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems, pp. 10117–10126 (2018).
Miller, B. K., Geiger, M., Smidt, T. E., & Noé, F. Relevance of rotationally equivariant convolutions for predicting molecular properties. arXiv preprint arXiv:2008.08461 (2020).
Satorras, V. G., Hoogeboom, E., & Welling, M. E (n) equivariant graph neural networks. arXiv preprint arXiv:2102.09844 (2021).
Wang, R. et al. Estimation of global black carbon direct radiative forcing and its uncertainty constrained by observations. J. Geophys. Res. Atmos. 121(10), 5948–5971 (2016).
Article Google Scholar
Egel, A., Pattelli, L., Mazzamuto, G., Wiersma, D. S. & Lemmer, U. Celes: Cuda-accelerated simulation of electromagnetic scattering by large ensembles of spheres. J. Quant. Spectrosc. Radiat. Transfer 199, 103–110 (2017).
Article ADS CAS Google Scholar
Bambha, R. P. & Michelsen, H. A. Effects of aggregate morphology and size on laser-induced incandescence and scattering from black carbon (mature soot). J. Aerosol Sci. 88, 159–181 (2015).
Article ADS CAS Google Scholar
Moosavi, S. M., Jablonka, K. M. & Smit, B. The role of machine learning in the understanding and design of materials. J. Am. Chem. Soc. 142(48), 20273–20287 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mirhoseini, A. et al. A graph placement methodology for fast chip design. Nature 594(7862), 207–212 (2021).
Article ADS CAS PubMed Google Scholar
De Cao, N., & Kipf, T. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018).
Fey, M., & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds (2019).
Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., & Weinberger, K. Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861–6871, PMLR (2019).

Download references

Acknowledgements

We thank Daniel Mackowski and Victor Garcia Satorras for useful discussion. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. This work was supported by an NSF Collaborative Research grant: HDR Elements: Software for a new machine learning based parameterization of moist convection for improved climate and weather prediction using deep learning 01-OAC 1835769.

Author information

Authors and Affiliations

Department of Earth and Environmental Engineering, Columbia University, New York, USA
K. D. Lamb & P. Gentine

Authors

K. D. Lamb
View author publications
You can also search for this author in PubMed Google Scholar
P. Gentine
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.D.L. designed the study, ran the MSTM code, and implemented the GNN models. K.D.L. wrote the paper, with input from P.G.

Corresponding author

Correspondence to K. D. Lamb.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lamb, K.D., Gentine, P. Zero-shot learning of aerosol optical properties with graph neural networks. Sci Rep 13, 18777 (2023). https://doi.org/10.1038/s41598-023-45235-8

Download citation

Received: 31 August 2022
Accepted: 17 October 2023
Published: 31 October 2023
DOI: https://doi.org/10.1038/s41598-023-45235-8

This article is cited by

Microphysical properties of atmospheric soot and organic particles: measurements, modeling, and impacts
- Weijun Li
- Nicole Riemer
- Alexander Laskin
npj Climate and Atmospheric Science (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.