Abstract
Black carbon (BC), a strongly absorbing aerosol sourced from combustion, is an important shortlived climate forcer. BC’s complex morphology contributes to uncertainty in its direct climate radiative effects, as current methods to accurately calculate the optical properties of these aerosols are too computationally expensive to be used online in models or for observational retrievals. Here we demonstrate that a Graph Neural Network (GNN) trained to predict the optical properties of numericallygenerated BC fractal aggregates can accurately generalize to arbitrarily shaped particles, including much larger (\(10\times\)) aggregates than in the training dataset. This zeroshot learning approach could be used to estimate single particle optical properties of realisticallyshaped aerosol and cloud particles for inclusion in radiative transfer codes for atmospheric models and remote sensing inversions. In addition, GNN’s can be used to gain physical intuition on the relationship between smallscale interactions (here of the spheres’ positions and interactions) and largescale properties (here of the radiative properties of aerosols).
Similar content being viewed by others
Introduction
Carbonaceous aerosols such as black carbon (BC) are important shortlived climate forcers^{1,2}. To understand their impact on climate, accurate predictions of the optical properties of absorbing aerosols such as BC are needed in atmospheric models and observational retrievals: for estimating the topofthe atmosphere radiative effects of black carbon^{3} and the impact of aged soot on cloud formation^{4}, for the calculation of the mass absorption coefficient of BC deposited on snow^{5}, for estimating the relative shortwave heating rates for different types of combustion aerosols^{6}, for calculating particletogas heat transfer to interpret laserinduced incandescence signals^{7}, for accurate inversions of imaging nephelometers^{8}, for constraining the index of refraction of biomass burning aerosols^{9}, and for interpreting the optical properties of aerosols deposited on filters^{10,11}. Accurate calculations of carbonaceous aerosol optical properties are also important for observational retrievals in other planetary atmospheres, as these aerosols may play a role in the radiative balance of e.g. the middle atmosphere of Jupiter^{12}.
BC particles in the atmosphere have a variety of sizes, shapes, and chemical compositions, all of which impact their optical properties (Fig. 1). BC’s optical properties depend on both the morphology of the primary (bare) BC particle, as well as its internal mixing with other materials (coatings) through the condensation of gas phase species during atmospheric aging. Both combustion conditions^{13} and atmospheric aging^{14} impact the morphology of these aerosols, which are fractallike aggregates, typically embedded within (internally mixed) or attached to other aerosol components. The complex morphology of bare BC is generally not parameterized in models, although modeling bare BC as a sphere biases radiative forcing estimates, with too little warming by absorption and too much cooling by scattering^{15}. Internal mixing is modeled using a Mie Theory coreshell model, which approximates the bare BC portion as an absorbing “core”, with a concentric sphere of “coating” material with an index of refraction characteristic of the internally mixed material. Several recent papers have demonstrated this Mie Theory coreshell approximation leads to an overprediction of BC absorption in models by as much as a factor of 2^{13,16}. In addition, not only are more accurate calculations of BC optical properties needed to better constrain models to observations, but models need to be capable of representing the heterogeneity of optical properties in diverse aerosol populations^{13,16}.
While models and observational retrievals have generally relied on Mie Theory, more accurate methods to predict the optical properties for arbitrarily shaped particles such as the Multiple Sphere TMatrix Method (MSTM)^{17,18}, the discrete dipole approximation (DDA)^{19,20}, and the Generalized MultipleParticle Mie (GMM) Theory^{21,22} have been developed. These methods approximate BC fractal aggregates as clusters of spheres (Fig. 1) and provide analytical solutions to the timeharmonic Maxwell’s equations for the multiple sphere system. However, these approaches are computationally expensive, often requiring hours or even days to compute the optical properties of single aerosol particles with complex morphologies^{23}. To mitigate this computational bottleneck, precalculated databases of fractal aggregate optical properties using these exact analytical methods have recently been created^{23,24,25,26}, but such approaches are limited to linear interpolation within the databases’ optical and morphological properties. There is still significant uncertainty about the fundamental properties of BC from different emission sources and under different combustion conditions, and the additional complexity of internal mixing with nonabsorbing and absorbing materials during atmospheric aging^{2} would require these databases to cover a very large parameter space to accurately represent the range of conditions for BC aerosols observed in the atmosphere. Moreover, observational inversions of BC have greater uncertainty when performed with only a subset of possible parameters.
Machine learning offers a promising approach for reducing computational bottlenecks by speeding up numericallyintensive aspects of atmospheric models^{27,28}. As such it could offer an efficient alternative approach to compiling precomputed databases for BC’s optical properties. However machine learning methods are traditionally strongly dependent on the data they are trained with, and struggle to generalize beyond the training distribution. One previous study investigated a machine learning approach to predicting BC’s optical properties from its morphological parameters and index of refraction using a support vector machine (SVM) trained on accurate MSTM calculations but could not accurately predict the optical properties of aggregates with morphological parameters beyond those used in the initial training data set^{29}. Other brute force approaches such as neural networks (NN) or random forests (RF) will similarly struggle to generate realistic BC properties outside of the training datasets.
Here we show the optical properties of bare BC with complex morphology can be accurately predicted with a graph neural network (GNN) by representing BC fractal aggregates as networks of interacting spheres. GNN’s are recently developed machine learning algorithms that learn on graphstructured data sets, allowing models to directly include arbitrary relational information^{30,31}. These models have shown great promise in predicting the largescale properties of structured physical science datasets such as molecules^{32,33}, proteinprotein interaction networks^{34}, and glasses^{35}. GNN’s have demonstrated skill in predicting complex global features of physical systems through learning simpler local physics^{36}; here we demonstrate that through including local information about BC’s structure, BC’s global properties can be inferred. Importantly, because GNN’s learn models for specific substructures (i.e. the nodes and their relationships with their neighbors in the graph), they are able to immediately generalize to graphs with arbitrary numbers of nodes; we exploit this feature of GNN’s to predict the optical properties of BC aggregates that are significantly larger than those used in the training data set. This zeroshot learning (where models can immediately generalize to samples not represented in their original training data) paves the way towards new, flexible parameterizations of aerosol microphysical properties and serves as a template for the use of GNN’s in the Earth sciences.
BC fractal aggregates as networks
Physical properties of bare BC Primary (bare) BC particles are fractallike aggregates with geometries that can be described according to a statistical scaling rule as
where a is the primary particle mean radius, \(k_{f}\) is the fractal prefactor, \(D_{f}\) is the fractal (Hausdorff) dimension, \(N_{s}\) is the number of primary spheres, or monomers, in the aggregate, and \(R_{g}\) is the radius of gyration, defined as
where \(\textbf{r}_{i}\) and \(\textbf{r}_{0}\) denote the ith monomer center and the center of mass of the cluster, respectively (assuming all monomers have the same mass^{37}). In addition to the aggregate geometry, the basic physical properties of these particles follow this scaling law^{38}. As a consequence of their fractal nature, aggregates are selfsimilar on different length scales. The fractal dimension \(D_{f}\) can be thought of intuitively as the shapefilling capacity of the aggregate; aggregates with smaller fractal dimensions are “fluffier”, while aggregates with larger fractal dimensions are denser. The fractal prefactor \(k_{f}\) of the aggregate is related to the packing of spheres into space and the anisotropy of the aggregate, with more “stringy” aggregates having smaller values of \(k_{f}\), and more isotropic and collapsed aggregates having larger values of \(k_{f}\)^{39,40}.
The fractallike nature of these aerosols is a result of their formation from gasphase precursors through the aggregation and growth of hydrocarbon clusters during incomplete combustion, although this process is not yet completely understood^{41}. The initial morphology depends on both the combustion conditions and the emission source, with different observational methods also impacting the retrieved parameters^{15}. After their initial formation during combustion, atmospheric aging (due to cloud processing or the condensation of gas phase species) leads to these aerosols becoming more compact, causing \(D_{f}\) to increase over time. This aging is expected to lead to a decrease in their top of the atmosphere radiative effects^{13}. Previous work has shown that \(k_{f}\) determines the compactness of aggregate branches, although little is understood about \(k_{f}\)’s evolution over time^{15}.
Numericallygenerated fractal aggregates
To investigate how fractal aggregate particles can be modeled as networks of interacting spheres, we numerically generated fractal aggregates with \(N_{s}\) spheres using a clustercluster algorithm^{42} based on the one described in^{38}, which uses a Monte Carlo approach to randomly generate aggregates with a specified fractal dimension \(D_{f}\) and fractal prefactor \(k_{f}\). We generate Cartesian coordinates for the monomers in the aggregate in dimensionless coordinates by scaling by a factor of \(k=\frac{2\pi }{\lambda }\), where \(\lambda\) is the wavelength of the incident light.
Characteristic length scale
The characteristic length scale of a network with N nodes is \(C = Log(N)\)^{43}. Here we want to develop a method for rendering fractal aggregates as graphs, with the assumption that the monomers in the aggregate should be represented by nodes in the graph. To represent fractal aggregates as graphs, monomers with center positions closer together than the characteristic length scale C of a network with \(N_{s}\) nodes,
are connected, where \(X_{v}=ka\) is the monomer size parameter (Fig. 2). We multiply the length scale by \(X_{v}\) to give a consistent number of edges independent of the size parameter of the aggregate, such that aggregates with the same fractal parameters but different size parameters would be encoded within the same graph structure. An example of the resulting undirected graph structure and adjacency matrix for two different aggregates with different fractal dimensions but the same number of monomers is shown in Fig. 3a–d. This scaling encodes the density of edges in local neighborhoods relative to the fractal dimension of the aggregate, irregardless of the actual size of the aggregate. The total number of edges in the graph is then proportional to both \(N_{s}\) and \(D_{f}\) (Fig. 3e), with the average degree of nodes increasing relative to \(D_{f}\) (SI Fig. S4a). The degree distribution of nodes also depends on the fractal preprefactor \(k_{f}\) (SI Fig. S4b).
GNN model for BC optical properties
Accurate solutions for the electromagnetic scattering and absorption properties for multiple sphere clusters (as BC aggregates are typically modeled) is computationally expensive because a fullwave optics treatment is needed. In the general case, spheres interact with one another, and the total scattering field component is a superposition of the components radiated from each sphere in the system^{44}. While the solution for the continuity equation at the surface of each sphere in the system can be solved analytically by expanding the incident and scattered fields from each sphere in terms of vector spherical wave functions, this approach generates a very large system of coupled linear equations that must be solved iteratively^{45}. Additional details about the formal solution are given in Supplementary Information S1.
While this approach provides a fully analytical solution for light scattering from the multiple sphere cluster, the computational time for these bruteforce approaches scale significantly with \(N_{s}\) and \(X_{v}\) as they do not take into account specific details of BC’s topological structure, which could lend itself to model order reduction. Filippov et al.^{38} previously explored the relationship between the morphology of BC and their aggregate physical properties using the RayleighDebyeGans (RDG) approximation and found that aggregates with similar fractal parameters also have similar physical properties. Recent work in^{23} found empirical relationships between the optical properties of aggregates and their morphological parameters using extensive MSTM calculations. Machine learning offers an alternative approach for learning relevant predictors without the need for humandefined features; GNN’s in particular can learn features that correspond to the relationship between the nodes (the individual spheres) and the largescale physical properties of the aggregates.
GNN’s are particularly attractive as emulators of MSTM because they provide strong relational inductive bias, which typically means that algorithms require less training data than fully connected neural networks or convolutional neural networks to make skillful predictions. Since MSTM is relatively slow (and methods such as DDA are approximately \(10\times\) slower than MSTM), it is nontrivial to develop large training data sets for machine learning algorithms. Second, the nontrivial topological structure of these aerosols is directly related to the complexity of their optical properties’ calculation, as the radiation incident on each individual monomer is a function of the position and orientation of all of the other monomers in the aggregate, with the neighboring monomers likely to have the most significant influence. The GNN approach of framing this problem from the perspective of messagepassing between neighboring nodes is directly analogous to the electromagnetic scattering and absorption problem for the multiple sphere cluster. Finally, as discussed in the introduction, GNN emulation of physical simulators has been shown to generalize to new, previously unseen realizations of physical systems (socalled zeroshot performance)^{31}. To bridge the gap between the very accurate physical information gained in process level studies of individual aerosol properties, and the understanding of how populations of these aerosols evolve in atmospheric models, we need either much better approximation methods or much faster methods to accurately calculate aerosol properties. GNN emulators that quickly and accurately generalize to new configurations could provide an online approach to estimate the optical properties of populations of aerosols in atmospheric models.
To investigate the connection between BC’s fractal structure and its optical properties, we trained a GNN to predict the optical properties of BC aggregates, using the values from an analytical solution for the electromagnetic scattering and absorption properties (from MSTM) as groundtruth (Fig. 2). GNN’s propagate information between nodes, capturing both the topological information about the graph structure and aggregating the node features. We tested several different approaches for the propagation rule, including a simple graph convolution network^{62}, a graph convolutional network^{30}, and an Interaction Network (IN)^{31}. We found the IN gave the best performance for predicting both the integral and angleresolved optical properties. The IN (Fig. 2) is based on message passing, where nodes send and receive messages along edges from their neighbors. The messages are aggregated for each node and the nodes are updated based on the central node features and the messages received from neighboring nodes. Graph level predictions are made by aggregating the updated node embeddings from all the nodes in the graph using a graph pooling operation (the pooling in Fig. 2). We test summation, averaging, and maximum pooling as graph pooling operations, and find that summation works best. After the graph pooling operation, a final linear transformation is performed to transform aggregated node embeddings to the target predictions (the graph readout in Fig. 2). Here we predict the total extinction, scattering, and absorption efficiencies \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), the asymmetry parameter, g, and the angleresolved elements of the scattering phase matrix \(S_{ij}(\theta )\) for the orientationaveraged case. (See Methods for discussion of aerosol optical properties and data sets). Further details of the GNN approaches tested in this work are proved in the SI.
For each training example, we input \(X_{v}\), the real part of the index of refraction \(Re(n_{k})\) (since we consider only cases where the imaginary part is \(Im(n_{k})=1Re(n_{k})\)), and the dimensionless coordinates of each sphere as node features. As edge features, we use the distance between neighboring spheres. We trained the model using 15,314 aggregates from the training data set, as the training loss did not significantly decrease with additional samples (SI Fig. S13); training data sets as small as 3000 aggregates showed reasonable generalization performance. The training data set consisted of aggregates with a small number of monomers (\(N_{s} < 100\)). We tested the model on an independent test set of 7656 aggregates with the same distribution of parameters as the training data set (\(N_{s} < 100\)). We further investigated the generalizability of the model on an independent zeroshot test set of 440 aggregates that were significantly larger \((100< N_{s} < 1000)\) than the ones the model was trained on. An additional 440 large aggregates were used as a zeroshot validation data set to determine which model architecture provided the best zeroshot performance (SI Figs. S9–S12). While the model weights were not directly trained on this zeroshot validation data set, the hyperparameters for the best model architecture were determined from performance on this data set; thus, we use an independent zeroshot test data set to evaluate generalization performance. \(N_{s}=100\) was chosen as the maximum size for aggregates in the training data set as smaller maximum sizes increased the bias in the zeroshot performance (SI Fig. S14). The zeroshot test data set was evenly distributed among the aggregate parameters (Fig. S2) to provide an estimate of generalization performance across the full parameter space.
Figure 4 shows the IN predictions compared to the actual values for \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), and g with the training data shown in blue, the test data set of smaller aggregates in yellow (top row, Fig. 4a–d) and the zeroshot test data sets shown in orange (bottom row, Fig. 4e–h). Figure 5 shows the predictions for the \(S_{11}(\theta )\) element of the scattering phase matrix for several different aggregates in the test data sets of smaller aggregates (Fig. 5a) and larger aggregates (Fig. 5c). For the test data with the same distribution of parameters as the training data set (\(N_{s} < 100\)), the model predictions were very close to the true values. For the zeroshot test data set, predictions for both integral and angle resolved optical properties were reasonable across the entire range of size parameters (\(X_{v}\)=0.1 to 1.0), indices of refraction \(n_{k} = 1.4+0.4i\) to \(n_{k}=2.0+1.0i\), and fractal parameters. For the prediction of \(S_{11}\), both the magnitude and functional form were wellapproximated across the range of parameters in the test set, although the model did deviate slightly more from the true values for larger \(N_{s}\) and \(X_{v}\) (e.g. the green line in Fig. 5c). Predictions for the entire angleresolved scattering phase matrix elements \(S_{ij}(\theta )\), for \(j \ge i\), were also reasonable (See SI Fig. S16).
Table 1 gives the mean absolute percentage error (MAPE) for the predictions from the IN model for the integral optical properties and the asymmetry parameter for the training, test, and zeroshot test data sets. We use MAPE as a metric to assess the performance of the model predictions because this metric is independent of the size of the data sets. MAPE can be interpreted in terms of relative error, which means that the performance on the test data set and zeroshot test data set are directly comparable; metrics such as MSE depend on the absolute magnitude of the integral optical properties, which differ between the test and zeroshot test data sets. We find that the predictions for \(Q_{ext}\) and \(Q_{abs}\) are within 2% of the true value for the training and test data sets, and within 4% of the true values for the zeroshot test data set. \(Q_{scat}\) and g have more significant deviations between the true and predicted values for the IN model (Table 1, 2nd column), but this is mainly due to a bias in the predictions for the IN model for the smallest \(X_{v}\) values, because the magnitude of \(Q_{scat}\) is so small. At larger \(X_{v}\), the model performance is within 2–9% of the true value for the training and test data sets, and within 4–8% for the zero shot test data set. The IN model generally performs best as larger size parameters for the predictions of the integral optical properties. The bias for smaller values of \(X_{v}\) may be improved by training on each \(X_{v}\) separately or alternatively, by using methods such as metalearning^{46}.
In addition to generalizability, the IN model demonstrated physical consistency in its predictions for the aggregate optical properties. The 3 scattering efficiencies are not independent, as \(\langle Q_{sca} \rangle +\langle Q_{abs} \rangle =\langle Q_{ext} \rangle\). The model directly inferred this dependency for both the training and test sets without imposing this as a constraint. Additionally, integrating \(S_{11}\) over the solid angle is equivalent to g^{44},
Without explicitly imposing this integral constraint, the model predictions were consistent with this constraint (Fig. 5b, \(N_{s}<100\), and Fig. 5d for \(N_{s}>100\) ).
Analysis of the GNN predictions
To understand how the IN model predicts the optical properties of BC fractal aggregates, including those much larger than the model was trained on, we emphasize that the graph input for the model does not directly include \(D_{f}\) or \(k_{f}\) as features but rather the fractal structure is implicitly encoded as the interactions between the neighboring spheres. The previously used SVM approach to predict BC’s optical properties included \(N_{s}\), \(D_{f}\), and \(k_{f}\) as features to predict \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), and g^{29}. Since the network structure in the IN approach is directly learned from the sphere positions, and the model is learned at the node level, the IN approach can generalize beyond its initial training set for these morphological parameters to unseen configurations.
The generalization of the IN model to a range of \(D_{f}\) is an important feature, as it is challenging to find approximations that are valid across fractal dimension^{47}. Because the IN model learns about the local neighborhood of each sphere, it is able to more accurately estimate the impacts of screening on absorption and scattering than the RDG approximation^{44,47}, an approach often used to approximate the optical properties of BC aggregates in a computationally efficient manner as an improvement on the equivalent sphere model. RDG assumes that individual monomers only interact with the incident electromagnetic field (neglecting multiple interactions), which can lead to absorption being underpredicted by 10–20%, and significantly underpredicting g by more than a factor of 10^{15}. The IN model effectively learns, in an unsupervised manner, a simplified sphere level model that more fully captures the complexity of the optical properties of the full analytical solution^{15}. As noted earlier, the best performance for the IN model used summation for graph pooling, which is physically consistent with the node level model learning the Mie theory solution for the individual spheres in the aggregate, given their interaction with neighboring spheres.
The optical properties of aggregates in this regime can be modeled with the assumption of a fairly shallow graph model (for the IN model a single layer performed best; for the GCN little improvement was seen beyond 3 or 4 layers, Fig. S5), suggesting that the majority of the structure influencing the optical properties of aerosols in this regime can be approximated from local interactions. We also investigated using a length scale of \(C=X_{v}Log(N_{s})/Log(Log(N_{s}))\) (characteristic of scalefree networks) to form graphs from aggregates^{43}, rather than Eq. 3. This length scale has the advantage that the degree of each node scales less quickly with \(N_{s}\), but the IN model performed worse in this case. This indicates that including a larger local neighborhood at each layer (Eq. 3) is more informative for the model.
Discussion and outlook
The network approach presented here provides a new framework for understanding the microphysical relationship between the morphological properties of BC and its larger scale physical properties. Here we have chosen to focus on the prediction of optical properties for numerically generated fractal aggregates, as the generation of these aggregates from combustion processes and their transformation during atmospheric aging is not yet completely understood. However, applying network theory to atmospheric aerosols suggests new directions for thinking about the generation of these fractal aggregates through combustion processes due to the connection between complex networks and percolation theory^{48}. Here we have used a clustercluster algorithm, although previous work has noted that the morphology of numerically generated fractal aggregates depends not only on the parameters (\(N_{s}\), a, \(D_{f}\), and \(k_{f}\)) defining the shape of the aggregate, but also on which algorithm is used to generate the sphere positions (e.g. diffusionlimited aggregation or diffusionlimited cluster aggregation)^{38,39}. The network approach provides a new framework from which to understand how realistically numerical algorithms reproduce the properties of aerosols formed during incomplete combustion through comparison of their network characteristics^{43}. This approach may also be useful for inferring 3 dimensional structure of aggregates from 2 dimensional transmission electron microscope (TEM) images of these aerosols^{10,11}, since it relates the relative positions of spheres to their overall morphological features; 2D methods have previously been shown to systematically underestimate the fractal dimension of BC^{49}. Recent methods such as graph cumulants could provide sophisticated approaches to describe substructures of graphs (such as motifs or cliques used to describe clustering)^{50}. Since any particular network observation is a single realization of an underlying generative process (in this case, the generation of primary aerosol particles from combustion sources), this framework could allow for an unbiased estimator of the variance of the propensity for specific graph substructures as a result of this generative process. These estimators could be used to compare specific aggregategenerating algorithms to observations of real fractal aerosol particles to assess the realism of the algorithms.
As a proof of concept we have trained a GNN to predict the optical properties of bare BC fractal aggregates with a range of different fractal parameters. This study demonstrates that modeling aerosol fractal aggregates as networks of interacting spheres provides morphological information that allows the machine learning model to extrapolate far beyond their initial training data set. This approach may also be useful for other fractal systems found in nature, such as turbulence, vegetation, or river networks.
BC in the atmosphere is typically internally mixed. The GNN approach provides an obvious extension to internally mixed aerosols (Fig. 1), as the thickness of coatings and their indices of refraction or organic fraction could be included as additional nodelevel features (in the thinly coated case) or graphlevel features (for the thickly coated case). Other factors influencing the optical properties of aggregates such as “necking” between overlapping monomers could be included as edge features. Because atmospheric aerosol retrievals rely on orientation averaged parameters, models for predicting the scattering phase function should be equivariant under rotations. Recently developed equivariant machine learning methods^{51,52,53,54} may provide improved prediction of the orientation averaged optical properties.
Uncertainty in BC direct radiative climate effects is attributable to multiple factors, including BC’s emissions, lifetime, atmospheric processing, and optical properties^{1,2,55}; the GNN approach could help resolve this uncertainty by improving both the interpretation of BC observations and by allowing BC’s morphology to be accurately represented in atmospheric models in a computationally efficient manner. As a greater understanding of BC’s physical properties from different source contributions and atmospheric aging pathways becomes available through laboratory and observational studies^{13,14,16}, the major remaining hurdle to accurately representing BC in models will be computational.
While previous exact analytical methods have computational walltimes scaling from hours to days for larger aggregates, inference is on the order of < 0.3 s per aggregate for the trained GNN model (On a CPU– see SI Fig. S15). The computational time for these exact analytical methods has precluded exact calculations of aerosol optical properties being used in models or observational retrievals. CELES, a CUDAaccelerated version of MSTM capable of running on a GPU, demonstrated a factor of 1.5–6 times speed up over MSTM, but was still too slow to be implemented online in models^{56}. The significantly faster timescale for the GNN model, as well as its generalizability to arbitrarily shaped aggregates compared to more standard ML methods, has the potential to transform existing model parameterizations for BC. For MSTM computational wall times scale with \(N_{s}\), \(X_{v}\), and \(D_{f}\); while the total inference and memory scales with \(N_{s}\) and \(D_{f}\) in the GNN approach, it is no longer a function of \(X_{v}\).
We have focused here on the forward problem of predicting the optical properties of BC given an assumed single particle morphology; however such an approach may also be useful for the inverse problem, i.e. inferring the morphology given the scattering phase function and integral optical properties. This approach could also provide insight into other physical properties which require detailed information about particle morphology^{38}, such as energy and heat transfer between aggregates and the surrounding gas needed to develop physical models of laserinduced incandescence^{7,57}. Radiative transfer calculations for mineral dust and ice crystals also rely on detailed information about particle morphology, suggesting that the GNN approach would be useful for modeling their optical properties as well. This approach could mitigate several longstanding issues with model parameterizations and observational retrievals for these species, by providing flexible parameterization of arbitrarily shaped aerosol and cloud particles that are fast enough to be deployed online in atmospheric models.
Finally, these methods have potential for new applications of machinelearning assisted materials discovery^{58,59}. Proposed geoengineering approaches to mitigate global or regional impacts of climate change, such as stratospheric aerosol injection, marine cloud brightening, or precipitation enhancement, rely on the development of novel aerosol materials. Generative graph models could be used to determine optimal aerosol morphologies resulting in physical properties specific to these applications at a fraction of the cost of traditional numerical methods^{60}.
Methods
Numerical aggregate properties
Cartesian coordinates for the positions of spheres in aggregates were determined using a clustercluster algorithm^{38,42}. This clustercluster algorithm starts with primary clusters of size \(N_{c}\), and then randomly agglomerates these clusters pairwise into larger clusters. The algorithm repeats this process for multiple levels until it has combined all clusters into one larger cluster, which satisfies the scaling laws given by Eq. (1). Primary clusters of size \(N_{c}=3,4,5,7,9,11,13,15,17,19\) were used to generate aggregates between \(N_{s}\) = 8–960 spheres with fractal dimensions between \(D_{f}\) = 1.8–2.3. Following^{23}, we assume a fractal prefactor of \(k_{f}\)=1.2 (for the aggregates used in the MSTM calculations). We also investigated the network parameters of aggregates with \(k_{f}=\) 1.0–1.5, for a given \(D_{f}=1.8\) (Fig. S4b). Aerosols are assumed to consist of isotropic, homogeneous spheres, with size parameters \(X_{v} =\) 0.1, 0.3, 0.5, 0.7, 0.9, and 1.0, corresponding to monomer radii between 7–72 nm for incident light at 450 nm and 10–104 nm at 650 nm. For each primary cluster size and fractal dimension, 10 aggregate realizations were randomly generated.
Aerosol optical properties
For radiative transfer applications, the orientationaveraged total scattering \(\langle Q_{sca} \rangle\), extinction \(\langle Q_{ext}\rangle\), and absorption efficiencies \(\langle Q_{abs}\rangle\), as well as the asymmetry parameter \(g = \langle C_{sca} cos(\theta ')\rangle\) are typical parameters that are needed (\(C_{sca}\) is the scattering crosssection which is related to the efficiency as \(Q_{sca}=C_{sca}/(pi*a_{agg}^{2})\), where \(a_{agg}\) is the effective radius of the aggregate). The asymmetry parameter relates the amount of forward to backscattered light. Other parameters relevant for radiative transfer, such as the single scattering albedo (SSA), can be derived from these parameters (SSA = \(Q_{sca}/Q_{ext}\)). The mass absorption coefficient (MAC) or mass extinction coefficient (MEC) are typically used to relate emissions of these aerosols to their direct radiative effects, and they are sometimes estimated theoretically from \(\langle C_{abs}\rangle\) or \(\langle C_{ext}\rangle\) with assumptions about particle density.
The scattering phase function relates the incident and scattered Stokes parameters, e.g., it indicates how light scattering from the particle is transformed relative to incident light in terms of its intensity and polarization state^{44}. Here we assume initially unpolarized incident light, in which case the \(S_{11}\) element specifies the angular distribution of the intensity of scattered relative to incident light. The scattered light is partially polarized, with degree of polarization given by \(\sqrt{(S_{21}^{2}+S_{31}^{2}+S_{41}^{2})/S_{11}^{2}}\).
MSTM calculations of bare BC optical properties
To determine the groundtruth optical properties for the BC fractal aggregates generated by the clustercluster algorithm we use the Fortran90 implementation of the multiplesphere Tmatrix code as described in^{45}, which can run on a highperformance, parallel based computational platform. This code numerically solves for electromagnetic wave scattering from multiple (nonoverlapping) sphere systems for either a fixed or random (orientationaveraged) orientation with respect to an incident plane wave. Here we have focused on calculation of random orientation optical properties, which utilizes the Tmatrix procedure developed in^{18}. We assume indices of refraction consistent with a range of values from the literature for BC at 550 nm: (1.4+0.4i, 1.6+0.6i, 1.8+0.8i, 2.0+1.0i). MSTM calculations were performed for these range of indices of refraction for 57,556 numerically generated aggregates for \(N_{s} < 100\); we used randomly chosen aggregates from this data set for the training, validation, and test sets for the model. To test the zeroshot performance, MSTM calculations were performed for 880 aggregates with these parameters in the size range \(100< N_{s} < 1000\); we randomly split this data into a zeroshot validation data set to evaluate the model’s performance and an independent zeroshot test data set. A summary of the range of parameters for each data set is given in Table S1. The distribution of parameters among the small (\(N_{s}<100\)) and large (\(N_{s}>100\)) aggregates are shown in Figs. S1 and S2, and the integral optical properties calculated with MSTM are shown in Fig. S3.
Graph neural networks
We used Pytorch Geometric^{61} to implement the GNN models. Several GNN approaches were tested, including a simple graph convolutional network (SGC)^{62}, a graph convolutional network (GCN)^{30}, and an interaction network (IN)^{31} (See Supplementary Information S1 for additional details of the graph models and a comparison of performance metrics among different model parameters and targets). The best performance for the integral optical properties used an IN model with a hidden layer size of 300 for both the node and edge models, and a message size of 100. Both the node and edge models are MLPs with ReLU as nonlinear activation function between layers. Aggregation for the edge model is addition, with global mean pooling followed by dropout (p = 0.5) and a linear layer of size 100 as the global aggregation function. For the prediction of \(S_{11}\) we found that adding a fully connected node to each graph slightly improved the zeroshot performance. The model architecture was the same as that used to predict the integral optical properties. A batch size of 20 was used (training with a batch size of 2 led to slower training but did not lead to significantly worse performance). For the graph regression task, MSE loss was assumed. We trained the GNN models on a Nvidia RTX 8000 GPU.
Data availability
BC graph data sets are available in an open source repository (10.5281/zenodo.5108834).
Code availability
Code is available at https://github.com/kdlamb/BCGNN.git.
References
Bond, T. C. et al. Bounding the role of black carbon in the climate system: A scientific assessment. J. Geophys. Res. Atmos. 118(11), 5380–5552 (2013).
Liu, D., He, C., Schwarz, J. P. & Wang, X. Lifecycle of lightabsorbing carbonaceous aerosols in the atmosphere. NPJ Clim. Atmos. Sci. 3(1), 1–18 (2020).
Wu, Y., Cheng, T., Zheng, L. & Chen, H. Black carbon radiative forcing at toa decreased during aging. Sci. Rep. 6, 38592 (2016).
Lohmann, U. et al. Future warming exacerbated by agedsoot effect on cloud formation. Nat. Geosci. 13(10), 674–680 (2020).
Schwarz, J., Gao, R., Perring, A., Spackman, J. & Fahey, D. Black carbon aerosol size in snow. Sci. Rep. 3(1), 1–5 (2013).
Moteki, N. et al. Anthropogenic iron oxide aerosols enhance atmospheric heating. Nat. Commun. 8(1), 1–11 (2017).
Michelsen, H., Schulz, C., Smallwood, G. & Will, S. Laserinduced incandescence: Particulate diagnostics for combustion, atmospheric, and industrial applications. Prog. Energy Combust. Sci. 51, 2–48 (2015).
Manfred, K. M. et al. Investigating biomass burning aerosol morphology using a laser imaging nephelometer. Atmos. Chem. Phys. 18(3), 1879–1894 (2018).
Womack, C. C. et al. Complex refractive indices in the ultraviolet and visible spectral region for highly absorbing nonspherical biomass burning aerosol. Atmos. Chem. Phys. Disc. 2020, 1–29 (2020).
Chakrabarty, R. K. et al. Simulation of aggregates with pointcontacting monomers in the clusterdilute regime. part 1: Determining the most reliable technique for obtaining threedimensional fractal dimension from twodimensional images. Aerosol Sci. Technol. 45(1), 75–80 (2011).
Chakrabarty, R. K. et al. Simulation of aggregates with pointcontacting monomers in the clusterdilute regime. part 2: Comparison of twoand threedimensional structural properties as a function of fractal dimension. Aerosol Sci. Technol. 45(8), 903–908 (2011).
Zhang, X., West, R. A., Irwin, P. G., Nixon, C. A. & Yung, Y. L. Aerosol influence on energy balance of the middle atmosphere of jupiter. Nat. Commun. 6(1), 1–9 (2015).
Wu, Y. et al. The role of biomass burning states in light absorption enhancement of carbonaceous aerosols. Sci. Rep. 10(1), 1–10 (2020).
Wang, Y. et al. Fractal dimensions and mixing structures of soot particles during atmospheric processing. Environ. Sci. Technol. Lett. 4(11), 487–493 (2017).
Kahnert, M. & Kanngießer, F. Modelling optical properties of atmospheric black carbon aerosols. J. Quant. Spectrosc. Radiat. Transfer 244, 106849 (2020).
Fierce, L. et al. Radiative absorption enhancements by black carbon controlled by particletoparticle heterogeneity in composition. Proc. Natl. Acad. Sci. 117(10), 5196–5203 (2020).
Mackowski, D. W. Calculation of total cross sections of multiplesphere clusters. JOSA A 11(11), 2851–2861 (1994).
Mackowski, D. W. & Mishchenko, M. I. Calculation of the t matrix and the scattering matrix for ensembles of spheres. JOSA A 13(11), 2266–2278 (1996).
Purcell, E. M. & Pennypacker, C. R. Scattering and absorption of light by nonspherical dielectric grains. Astrophys. J. 186, 705–714 (1973).
Yurkin, M. A. & Hoekstra, A. G. The discrete dipole approximation: An overview and recent developments. J. Quant. Spectrosc. Radiat. Transfer 106(1–3), 558–589 (2007).
Xu, Y.L. Electromagnetic scattering by an aggregate of spheres. Appl. Opt. 34(21), 4573–4588 (1995).
Xu, Y.L. & Gustafson, B. Å. A generalized multiparticle miesolution: Further experimental verification. J. Quant. Spectrosc. Radiat. Transfer 70(4–6), 395–419 (2001).
Liu, C., Xu, X., Yin, Y., Schnaiter, M. & Yung, Y. L. Black carbon aggregates: A database for optical properties. J. Quant. Spectrosc. Radiat. Transfer 222, 170–179 (2019).
Kahnert, M. Numerically exact computation of the optical properties of light absorbing carbon aggregates for wavelength of 200 nm12.2 mu m. Atmos. Chem. Phys. 10(17), 8319–8329 (2010).
Smith, A. & Grainger, R. Simplifying the calculation of light scattering properties for black carbon fractal aggregates. Atmos. Chem. Phys. 14, 15 (2014).
Romshoo, B. et al. Radiative properties of coated black carbon aggregates: Numerical simulations and radiative forcing estimates. Atmos. Chem. Phys. Disc. 2021, 1–24 (2021).
Gentine, P., Pritchard, M., Rasp, S., Reinaudi, G. & Yacalis, G. Could machine learning break the convection parameterization deadlock?. Geophys. Res. Lett. 45(11), 5742–5751 (2018).
Rasp, S., Pritchard, M. S. & Gentine, P. Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. 115(39), 9684–9689 (2018).
Luo, J., Zhang, Y., Wang, F., Wang, J. & Zhang, Q. Applying machine learning to estimate the optical properties of black carbon fractal aggregates. J. Quant. Spectrosc. Radiat. Transfer 215, 1–8 (2018).
Kipf, T. N., & Welling, M. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Battaglia, P. W., Hamrick, J. B., Bapst, V., SanchezGonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., & Faulkner, R. et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).
Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., & Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232 (2015).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272, PMLR (2017).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706–710 (2020).
Bapst, V. et al. Unveiling the predictive power of static structure in glassy systems. Nat. Phys. 16(4), 448–454 (2020).
Xie, T., FranceLanord, A., Wang, Y., ShaoHorn, Y. & Grossman, J. C. Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials. Nat. Commun. 10(1), 1–9 (2019).
Forrest, S. & Witten, T. Jr. Longrange correlations in smokeparticle aggregates. J. Phys. A: Math. Gen. 12(5), L109 (1979).
Filippov, A., Zurita, M. & Rosner, D. Fractallike aggregates: Relation between morphology and physical properties. J. Colloid Interface Sci. 229(1), 261–273 (2000).
Sorensen, C. M. & Roberts, G. C. The prefactor of fractal aggregates. J. Colloid Interface Sci. 186(2), 447–452 (1997).
Heinson, W., Sorensen, C. & Chakrabarti, A. Does shape anisotropy control the fractal dimension in diffusionlimited clustercluster aggregation?. Aerosol Sci. Technol. 44(12), i–iv (2010).
Johansson, K., HeadGordon, M., Schrader, P., Wilson, K. & Michelsen, H. Resonancestabilized hydrocarbonradical chain reactions may explain soot inception and growth. Science 361(6406), 997–1000 (2018).
Moteki, N. An efficient c++ code for generating fractal cluster of spheres (v1.1) (2019).
Albert, R. & Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002).
Bohren, C. F., & Huffman, D. R. Absorption and scattering of light by small particles. John Wiley & Sons (2008).
Mackowski, D. W. & Mishchenko, M. I. A multiple sphere tmatrix fortran code for use on parallel computer clusters. J. Quant. Spectrosc. Radiat. Transfer 112, 2182–2192 (2011).
Finn, C., Abbeel, P., & Levine, S. Modelagnostic metalearning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135, PMLR (2017).
Sorensen, C. Light scattering by fractal aggregates: A review. Aerosol. Sci. Technol. 35(2), 648–687 (2001).
Deprez, P., Wüthrich, M. V. Networks, random graphs and percolation. In Theoretical aspects of spatialtemporal modeling, pp. 95–124 (Springer, 2015).
Adachi, K., Chung, S. H., Friedrich, H., & Buseck, P. R. Fractal parameters of individual soot particles determined using electron tomography: Implications for optical properties. J. Geophys. Res. Atmos. 112, D14 (2007).
Gunderson, L. M., & BravoHermsdorff, G. Introducing graph cumulants: What is the variance of your social network? arXiv preprint arXiv:2002.03959 (2020).
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, L., & Riley, P. Tensor field networks: Rotationand translationequivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219 (2018).
Kondor, R., Lin, Z., & Trivedi, S. Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In Advances in Neural Information Processing Systems, pp. 10117–10126 (2018).
Miller, B. K., Geiger, M., Smidt, T. E., & Noé, F. Relevance of rotationally equivariant convolutions for predicting molecular properties. arXiv preprint arXiv:2008.08461 (2020).
Satorras, V. G., Hoogeboom, E., & Welling, M. E (n) equivariant graph neural networks. arXiv preprint arXiv:2102.09844 (2021).
Wang, R. et al. Estimation of global black carbon direct radiative forcing and its uncertainty constrained by observations. J. Geophys. Res. Atmos. 121(10), 5948–5971 (2016).
Egel, A., Pattelli, L., Mazzamuto, G., Wiersma, D. S. & Lemmer, U. Celes: Cudaaccelerated simulation of electromagnetic scattering by large ensembles of spheres. J. Quant. Spectrosc. Radiat. Transfer 199, 103–110 (2017).
Bambha, R. P. & Michelsen, H. A. Effects of aggregate morphology and size on laserinduced incandescence and scattering from black carbon (mature soot). J. Aerosol Sci. 88, 159–181 (2015).
Moosavi, S. M., Jablonka, K. M. & Smit, B. The role of machine learning in the understanding and design of materials. J. Am. Chem. Soc. 142(48), 20273–20287 (2020).
Mirhoseini, A. et al. A graph placement methodology for fast chip design. Nature 594(7862), 207–212 (2021).
De Cao, N., & Kipf, T. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018).
Fey, M., & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds (2019).
Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., & Weinberger, K. Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861–6871, PMLR (2019).
Acknowledgements
We thank Daniel Mackowski and Victor Garcia Satorras for useful discussion. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR03089301, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. This work was supported by an NSF Collaborative Research grant: HDR Elements: Software for a new machine learning based parameterization of moist convection for improved climate and weather prediction using deep learning 01OAC 1835769.
Author information
Authors and Affiliations
Contributions
K.D.L. designed the study, ran the MSTM code, and implemented the GNN models. K.D.L. wrote the paper, with input from P.G.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lamb, K.D., Gentine, P. Zeroshot learning of aerosol optical properties with graph neural networks. Sci Rep 13, 18777 (2023). https://doi.org/10.1038/s41598023452358
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598023452358
This article is cited by

MieAI: a neural network for calculating optical properties of internally mixed aerosol in atmospheric models
npj Climate and Atmospheric Science (2024)

Microphysical properties of atmospheric soot and organic particles: measurements, modeling, and impacts
npj Climate and Atmospheric Science (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.