Introduction

Carbonaceous aerosols such as black carbon (BC) are important short-lived climate forcers1,2. To understand their impact on climate, accurate predictions of the optical properties of absorbing aerosols such as BC are needed in atmospheric models and observational retrievals: for estimating the top-of-the atmosphere radiative effects of black carbon3 and the impact of aged soot on cloud formation4, for the calculation of the mass absorption coefficient of BC deposited on snow5, for estimating the relative shortwave heating rates for different types of combustion aerosols6, for calculating particle-to-gas heat transfer to interpret laser-induced incandescence signals7, for accurate inversions of imaging nephelometers8, for constraining the index of refraction of biomass burning aerosols9, and for interpreting the optical properties of aerosols deposited on filters10,11. Accurate calculations of carbonaceous aerosol optical properties are also important for observational retrievals in other planetary atmospheres, as these aerosols may play a role in the radiative balance of e.g. the middle atmosphere of Jupiter12.

BC particles in the atmosphere have a variety of sizes, shapes, and chemical compositions, all of which impact their optical properties (Fig. 1). BC’s optical properties depend on both the morphology of the primary (bare) BC particle, as well as its internal mixing with other materials (coatings) through the condensation of gas phase species during atmospheric aging. Both combustion conditions13 and atmospheric aging14 impact the morphology of these aerosols, which are fractal-like aggregates, typically embedded within (internally mixed) or attached to other aerosol components. The complex morphology of bare BC is generally not parameterized in models, although modeling bare BC as a sphere biases radiative forcing estimates, with too little warming by absorption and too much cooling by scattering15. Internal mixing is modeled using a Mie Theory core-shell model, which approximates the bare BC portion as an absorbing “core”, with a concentric sphere of “coating” material with an index of refraction characteristic of the internally mixed material. Several recent papers have demonstrated this Mie Theory core-shell approximation leads to an over-prediction of BC absorption in models by as much as a factor of 213,16. In addition, not only are more accurate calculations of BC optical properties needed to better constrain models to observations, but models need to be capable of representing the heterogeneity of optical properties in diverse aerosol populations13,16.

While models and observational retrievals have generally relied on Mie Theory, more accurate methods to predict the optical properties for arbitrarily shaped particles such as the Multiple Sphere T-Matrix Method (MSTM)17,18, the discrete dipole approximation (DDA)19,20, and the Generalized Multiple-Particle Mie (GMM) Theory21,22 have been developed. These methods approximate BC fractal aggregates as clusters of spheres (Fig. 1) and provide analytical solutions to the time-harmonic Maxwell’s equations for the multiple sphere system. However, these approaches are computationally expensive, often requiring hours or even days to compute the optical properties of single aerosol particles with complex morphologies23. To mitigate this computational bottle-neck, pre-calculated databases of fractal aggregate optical properties using these exact analytical methods have recently been created23,24,25,26, but such approaches are limited to linear interpolation within the data-bases’ optical and morphological properties. There is still significant uncertainty about the fundamental properties of BC from different emission sources and under different combustion conditions, and the additional complexity of internal mixing with non-absorbing and absorbing materials during atmospheric aging2 would require these databases to cover a very large parameter space to accurately represent the range of conditions for BC aerosols observed in the atmosphere. Moreover, observational inversions of BC have greater uncertainty when performed with only a subset of possible parameters.

Figure 1
figure 1

BC optical properties. Top, from left to right: equivalent volume sphere for bare BC, thinly coated BC, and thickly coated BC particles. Bottom, from left to right: geometry of bare BC, thinly coated BC, and thickly coated BC as used in typical MSTM calculations.

Machine learning offers a promising approach for reducing computational bottle-necks by speeding up numerically-intensive aspects of atmospheric models27,28. As such it could offer an efficient alternative approach to compiling pre-computed databases for BC’s optical properties. However machine learning methods are traditionally strongly dependent on the data they are trained with, and struggle to generalize beyond the training distribution. One previous study investigated a machine learning approach to predicting BC’s optical properties from its morphological parameters and index of refraction using a support vector machine (SVM) trained on accurate MSTM calculations but could not accurately predict the optical properties of aggregates with morphological parameters beyond those used in the initial training data set29. Other brute force approaches such as neural networks (NN) or random forests (RF) will similarly struggle to generate realistic BC properties outside of the training datasets.

Figure 2
figure 2

A schematic of the GNN modeling approach for predicting aerosol optical properties. Accurate calculations of aerosol optical properties from the sphere positions are calculated using MSTM. For the GNN model, graphs are generated from aggregates by connecting spheres closer together than the characteristic length scale, C, of the aggregate (Eq. 3). Embeddings are learned for each node in the graph based on the central node features, the neighboring node features, and the edge features. These node-level embeddings are then aggregated together and a graph level prediction of the optical properties of the aggregate is made.

Here we show the optical properties of bare BC with complex morphology can be accurately predicted with a graph neural network (GNN) by representing BC fractal aggregates as networks of interacting spheres. GNN’s are recently developed machine learning algorithms that learn on graph-structured data sets, allowing models to directly include arbitrary relational information30,31. These models have shown great promise in predicting the large-scale properties of structured physical science data-sets such as molecules32,33, protein-protein interaction networks34, and glasses35. GNN’s have demonstrated skill in predicting complex global features of physical systems through learning simpler local physics36; here we demonstrate that through including local information about BC’s structure, BC’s global properties can be inferred. Importantly, because GNN’s learn models for specific substructures (i.e. the nodes and their relationships with their neighbors in the graph), they are able to immediately generalize to graphs with arbitrary numbers of nodes; we exploit this feature of GNN’s to predict the optical properties of BC aggregates that are significantly larger than those used in the training data set. This zero-shot learning (where models can immediately generalize to samples not represented in their original training data) paves the way towards new, flexible parameterizations of aerosol microphysical properties and serves as a template for the use of GNN’s in the Earth sciences.

BC fractal aggregates as networks

Physical properties of bare BC Primary (bare) BC particles are fractal-like aggregates with geometries that can be described according to a statistical scaling rule as

$$\begin{aligned} N_{s} = k_{f}\left( \frac{R_{g}}{a}\right) ^{D_{f}} \end{aligned}$$
(1)

where a is the primary particle mean radius, \(k_{f}\) is the fractal pre-factor, \(D_{f}\) is the fractal (Hausdorff) dimension, \(N_{s}\) is the number of primary spheres, or monomers, in the aggregate, and \(R_{g}\) is the radius of gyration, defined as

$$\begin{aligned} R_{g}^{2} = \frac{1}{N_{s}}\sum _{n=1}^{N_{s}}(\textbf{r}_{i}-\textbf{r}_{0})^{2} \end{aligned}$$
(2)

where \(\textbf{r}_{i}\) and \(\textbf{r}_{0}\) denote the ith monomer center and the center of mass of the cluster, respectively (assuming all monomers have the same mass37). In addition to the aggregate geometry, the basic physical properties of these particles follow this scaling law38. As a consequence of their fractal nature, aggregates are self-similar on different length scales. The fractal dimension \(D_{f}\) can be thought of intuitively as the shape-filling capacity of the aggregate; aggregates with smaller fractal dimensions are “fluffier”, while aggregates with larger fractal dimensions are denser. The fractal prefactor \(k_{f}\) of the aggregate is related to the packing of spheres into space and the anisotropy of the aggregate, with more “stringy” aggregates having smaller values of \(k_{f}\), and more isotropic and collapsed aggregates having larger values of \(k_{f}\)39,40.

The fractal-like nature of these aerosols is a result of their formation from gas-phase precursors through the aggregation and growth of hydrocarbon clusters during incomplete combustion, although this process is not yet completely understood41. The initial morphology depends on both the combustion conditions and the emission source, with different observational methods also impacting the retrieved parameters15. After their initial formation during combustion, atmospheric aging (due to cloud processing or the condensation of gas phase species) leads to these aerosols becoming more compact, causing \(D_{f}\) to increase over time. This aging is expected to lead to a decrease in their top of the atmosphere radiative effects13. Previous work has shown that \(k_{f}\) determines the compactness of aggregate branches, although little is understood about \(k_{f}\)’s evolution over time15.

Numerically-generated fractal aggregates

To investigate how fractal aggregate particles can be modeled as networks of interacting spheres, we numerically generated fractal aggregates with \(N_{s}\) spheres using a cluster-cluster algorithm42 based on the one described in38, which uses a Monte Carlo approach to randomly generate aggregates with a specified fractal dimension \(D_{f}\) and fractal pre-factor \(k_{f}\). We generate Cartesian coordinates for the monomers in the aggregate in dimensionless coordinates by scaling by a factor of \(k=\frac{2\pi }{\lambda }\), where \(\lambda\) is the wavelength of the incident light.

Characteristic length scale

The characteristic length scale of a network with N nodes is \(C = Log(N)\)43. Here we want to develop a method for rendering fractal aggregates as graphs, with the assumption that the monomers in the aggregate should be represented by nodes in the graph. To represent fractal aggregates as graphs, monomers with center positions closer together than the characteristic length scale C of a network with \(N_{s}\) nodes,

$$\begin{aligned} C=X_{v}Log(N_{s}) \end{aligned}$$
(3)

are connected, where \(X_{v}=ka\) is the monomer size parameter (Fig. 2). We multiply the length scale by \(X_{v}\) to give a consistent number of edges independent of the size parameter of the aggregate, such that aggregates with the same fractal parameters but different size parameters would be encoded within the same graph structure. An example of the resulting undirected graph structure and adjacency matrix for two different aggregates with different fractal dimensions but the same number of monomers is shown in Fig. 3a–d. This scaling encodes the density of edges in local neighborhoods relative to the fractal dimension of the aggregate, irregardless of the actual size of the aggregate. The total number of edges in the graph is then proportional to both \(N_{s}\) and \(D_{f}\) (Fig. 3e), with the average degree of nodes increasing relative to \(D_{f}\) (SI Fig. S4a). The degree distribution of nodes also depends on the fractal pre-prefactor \(k_{f}\) (SI Fig. S4b).

Figure 3
figure 3

Examples of fractal aggregates represented as graphs. Visualization of the graphs (a,c) and adjacency matrices (b,d) for fractal aggregates with the same number of spheres (\(N_{s}\)=288) but different fractal dimensions. \(D_{f}\)=1.8 for (a,b) and \(D_{f}\)=2.3 for (c,d). (e) The number of edges scales with the total number of spheres in the aggregates (\(N_{s}\)) and the fractal dimension of the aggregates (\(D_{f}\)).

GNN model for BC optical properties

Accurate solutions for the electromagnetic scattering and absorption properties for multiple sphere clusters (as BC aggregates are typically modeled) is computationally expensive because a full-wave optics treatment is needed. In the general case, spheres interact with one another, and the total scattering field component is a superposition of the components radiated from each sphere in the system44. While the solution for the continuity equation at the surface of each sphere in the system can be solved analytically by expanding the incident and scattered fields from each sphere in terms of vector spherical wave functions, this approach generates a very large system of coupled linear equations that must be solved iteratively45. Additional details about the formal solution are given in Supplementary Information S1.

While this approach provides a fully analytical solution for light scattering from the multiple sphere cluster, the computational time for these brute-force approaches scale significantly with \(N_{s}\) and \(X_{v}\) as they do not take into account specific details of BC’s topological structure, which could lend itself to model order reduction. Filippov et al.38 previously explored the relationship between the morphology of BC and their aggregate physical properties using the Rayleigh-Debye-Gans (RDG) approximation and found that aggregates with similar fractal parameters also have similar physical properties. Recent work in23 found empirical relationships between the optical properties of aggregates and their morphological parameters using extensive MSTM calculations. Machine learning offers an alternative approach for learning relevant predictors without the need for human-defined features; GNN’s in particular can learn features that correspond to the relationship between the nodes (the individual spheres) and the large-scale physical properties of the aggregates.

GNN’s are particularly attractive as emulators of MSTM because they provide strong relational inductive bias, which typically means that algorithms require less training data than fully connected neural networks or convolutional neural networks to make skillful predictions. Since MSTM is relatively slow (and methods such as DDA are approximately \(10\times\) slower than MSTM), it is non-trivial to develop large training data sets for machine learning algorithms. Second, the non-trivial topological structure of these aerosols is directly related to the complexity of their optical properties’ calculation, as the radiation incident on each individual monomer is a function of the position and orientation of all of the other monomers in the aggregate, with the neighboring monomers likely to have the most significant influence. The GNN approach of framing this problem from the perspective of message-passing between neighboring nodes is directly analogous to the electromagnetic scattering and absorption problem for the multiple sphere cluster. Finally, as discussed in the introduction, GNN emulation of physical simulators has been shown to generalize to new, previously unseen realizations of physical systems (so-called zero-shot performance)31. To bridge the gap between the very accurate physical information gained in process level studies of individual aerosol properties, and the understanding of how populations of these aerosols evolve in atmospheric models, we need either much better approximation methods or much faster methods to accurately calculate aerosol properties. GNN emulators that quickly and accurately generalize to new configurations could provide an online approach to estimate the optical properties of populations of aerosols in atmospheric models.

To investigate the connection between BC’s fractal structure and its optical properties, we trained a GNN to predict the optical properties of BC aggregates, using the values from an analytical solution for the electromagnetic scattering and absorption properties (from MSTM) as ground-truth (Fig. 2). GNN’s propagate information between nodes, capturing both the topological information about the graph structure and aggregating the node features. We tested several different approaches for the propagation rule, including a simple graph convolution network62, a graph convolutional network30, and an Interaction Network (IN)31. We found the IN gave the best performance for predicting both the integral and angle-resolved optical properties. The IN (Fig. 2) is based on message passing, where nodes send and receive messages along edges from their neighbors. The messages are aggregated for each node and the nodes are updated based on the central node features and the messages received from neighboring nodes. Graph level predictions are made by aggregating the updated node embeddings from all the nodes in the graph using a graph pooling operation (the pooling in Fig. 2). We test summation, averaging, and maximum pooling as graph pooling operations, and find that summation works best. After the graph pooling operation, a final linear transformation is performed to transform aggregated node embeddings to the target predictions (the graph readout in Fig. 2). Here we predict the total extinction, scattering, and absorption efficiencies \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), the asymmetry parameter, g, and the angle-resolved elements of the scattering phase matrix \(S_{ij}(\theta )\) for the orientation-averaged case. (See Methods for discussion of aerosol optical properties and data sets). Further details of the GNN approaches tested in this work are proved in the SI.

Figure 4
figure 4

The true vs. predicted values for the efficiencies and the asymmetry parameter. (ad) shows the results for the training (\(N_{s}<100\)) and test (\(N_{s}<100\)) data sets, while (eh) shows the results for the training (\(N_{s}<100\)) and zero-shot test (\(N_{s}\) = 100–1000) data sets.

For each training example, we input \(X_{v}\), the real part of the index of refraction \(Re(n_{k})\) (since we consider only cases where the imaginary part is \(Im(n_{k})=1-Re(n_{k})\)), and the dimensionless coordinates of each sphere as node features. As edge features, we use the distance between neighboring spheres. We trained the model using 15,314 aggregates from the training data set, as the training loss did not significantly decrease with additional samples (SI Fig. S13); training data sets as small as 3000 aggregates showed reasonable generalization performance. The training data set consisted of aggregates with a small number of monomers (\(N_{s} < 100\)). We tested the model on an independent test set of 7656 aggregates with the same distribution of parameters as the training data set (\(N_{s} < 100\)). We further investigated the generalizability of the model on an independent zero-shot test set of 440 aggregates that were significantly larger \((100< N_{s} < 1000)\) than the ones the model was trained on. An additional 440 large aggregates were used as a zero-shot validation data set to determine which model architecture provided the best zero-shot performance (SI Figs. S9S12). While the model weights were not directly trained on this zero-shot validation data set, the hyperparameters for the best model architecture were determined from performance on this data set; thus, we use an independent zero-shot test data set to evaluate generalization performance. \(N_{s}=100\) was chosen as the maximum size for aggregates in the training data set as smaller maximum sizes increased the bias in the zero-shot performance (SI Fig. S14). The zero-shot test data set was evenly distributed among the aggregate parameters (Fig. S2) to provide an estimate of generalization performance across the full parameter space.

Figure 5
figure 5

GNN model predictions of the angle resolved optical properties. (a) Comparison of \(S_{11}(\Theta )\) predictions for 4 randomly chosen aggregates in the test set with the same distribution of parameters as the training data set: Blue (\(N_{s}=76\), \(X_{v}=0.7\),\(n_{k}=1.4+i0.4\)), Red (\(N_{s}=14\), \(X_{v}=0.5\),\(n_{k}=1.6+i0.6\)), Green (\(N_{s}=17\), \(X_{v}=0.5\),\(n_{k}=1.4+i0.4\)), Yellow (\(N_{s}=36\), \(X_{v}=0.9\),\(n_{k}=2.0+i1.0\)). (b) The predicted \(S_{11}\) integrated over the solid angle (Eq. 4) vs. the predicted value for asymmetry parameter (g) for all of the aggregates in the training and test sets (\(N_{s}<100)\). (c) As in (a), for 4 randomly chosen aggregates in the zero shot test set: Blue (\(N_{s}=128\),\(X_{v}=0.9\),\(n_{k}=1.6+i0.6\)), Red (\(N_{s}=640\),\(X_{v}=0.3\),\(n_{k}=2.0+i1.0\)), Green (\(N_{s}=960\),\(X_{v}=0.7\),\(n_{k}=1.8+i0.8\)), Yellow (\(N_{s}=416\),\(X_{v}=0.9\),\(n_{k}=2.0+i1.0\)). (d) As in (b), for all of the aggregates in the training and zero shot test sets.

Figure 4 shows the IN predictions compared to the actual values for \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), and g with the training data shown in blue, the test data set of smaller aggregates in yellow (top row, Fig. 4a–d) and the zero-shot test data sets shown in orange (bottom row, Fig. 4e–h). Figure 5 shows the predictions for the \(S_{11}(\theta )\) element of the scattering phase matrix for several different aggregates in the test data sets of smaller aggregates (Fig. 5a) and larger aggregates (Fig. 5c). For the test data with the same distribution of parameters as the training data set (\(N_{s} < 100\)), the model predictions were very close to the true values. For the zero-shot test data set, predictions for both integral and angle resolved optical properties were reasonable across the entire range of size parameters (\(X_{v}\)=0.1 to 1.0), indices of refraction \(n_{k} = 1.4+0.4i\) to \(n_{k}=2.0+1.0i\), and fractal parameters. For the prediction of \(S_{11}\), both the magnitude and functional form were well-approximated across the range of parameters in the test set, although the model did deviate slightly more from the true values for larger \(N_{s}\) and \(X_{v}\) (e.g. the green line in Fig. 5c). Predictions for the entire angle-resolved scattering phase matrix elements \(S_{ij}(\theta )\), for \(j \ge i\), were also reasonable (See SI Fig. S16).

Table 1 gives the mean absolute percentage error (MAPE) for the predictions from the IN model for the integral optical properties and the asymmetry parameter for the training, test, and zero-shot test data sets. We use MAPE as a metric to assess the performance of the model predictions because this metric is independent of the size of the data sets. MAPE can be interpreted in terms of relative error, which means that the performance on the test data set and zero-shot test data set are directly comparable; metrics such as MSE depend on the absolute magnitude of the integral optical properties, which differ between the test and zero-shot test data sets. We find that the predictions for \(Q_{ext}\) and \(Q_{abs}\) are within 2% of the true value for the training and test data sets, and within 4% of the true values for the zero-shot test data set. \(Q_{scat}\) and g have more significant deviations between the true and predicted values for the IN model (Table 1, 2nd column), but this is mainly due to a bias in the predictions for the IN model for the smallest \(X_{v}\) values, because the magnitude of \(Q_{scat}\) is so small. At larger \(X_{v}\), the model performance is within 2–9% of the true value for the training and test data sets, and within 4–8% for the zero shot test data set. The IN model generally performs best as larger size parameters for the predictions of the integral optical properties. The bias for smaller values of \(X_{v}\) may be improved by training on each \(X_{v}\) separately or alternatively, by using methods such as meta-learning46.

Table 1 MAPE values for GNN model prediction of integral optical properties and asymmetry parameter.

In addition to generalizability, the IN model demonstrated physical consistency in its predictions for the aggregate optical properties. The 3 scattering efficiencies are not independent, as \(\langle Q_{sca} \rangle +\langle Q_{abs} \rangle =\langle Q_{ext} \rangle\). The model directly inferred this dependency for both the training and test sets without imposing this as a constraint. Additionally, integrating \(S_{11}\) over the solid angle is equivalent to g44,

$$\begin{aligned} g = \frac{1}{2}\int S_{11}(\theta )cos(\theta ) d\Omega = \frac{1}{2}\int _{0}^{\pi } S_{11}(\theta )cos(\theta ) sin(\theta ) d\theta \end{aligned}$$
(4)

Without explicitly imposing this integral constraint, the model predictions were consistent with this constraint (Fig. 5b, \(N_{s}<100\), and Fig. 5d for \(N_{s}>100\) ).

Analysis of the GNN predictions

To understand how the IN model predicts the optical properties of BC fractal aggregates, including those much larger than the model was trained on, we emphasize that the graph input for the model does not directly include \(D_{f}\) or \(k_{f}\) as features but rather the fractal structure is implicitly encoded as the interactions between the neighboring spheres. The previously used SVM approach to predict BC’s optical properties included \(N_{s}\), \(D_{f}\), and \(k_{f}\) as features to predict \(\langle Q_{ext}\rangle\), \(\langle Q_{scat}\rangle\), \(\langle Q_{abs}\rangle\), and g29. Since the network structure in the IN approach is directly learned from the sphere positions, and the model is learned at the node level, the IN approach can generalize beyond its initial training set for these morphological parameters to unseen configurations.

The generalization of the IN model to a range of \(D_{f}\) is an important feature, as it is challenging to find approximations that are valid across fractal dimension47. Because the IN model learns about the local neighborhood of each sphere, it is able to more accurately estimate the impacts of screening on absorption and scattering than the RDG approximation44,47, an approach often used to approximate the optical properties of BC aggregates in a computationally efficient manner as an improvement on the equivalent sphere model. RDG assumes that individual monomers only interact with the incident electromagnetic field (neglecting multiple interactions), which can lead to absorption being under-predicted by 10–20%, and significantly under-predicting g by more than a factor of 1015. The IN model effectively learns, in an unsupervised manner, a simplified sphere level model that more fully captures the complexity of the optical properties of the full analytical solution15. As noted earlier, the best performance for the IN model used summation for graph pooling, which is physically consistent with the node level model learning the Mie theory solution for the individual spheres in the aggregate, given their interaction with neighboring spheres.

The optical properties of aggregates in this regime can be modeled with the assumption of a fairly shallow graph model (for the IN model a single layer performed best; for the GCN little improvement was seen beyond 3 or 4 layers, Fig. S5), suggesting that the majority of the structure influencing the optical properties of aerosols in this regime can be approximated from local interactions. We also investigated using a length scale of \(C=X_{v}Log(N_{s})/Log(Log(N_{s}))\) (characteristic of scale-free networks) to form graphs from aggregates43, rather than Eq. 3. This length scale has the advantage that the degree of each node scales less quickly with \(N_{s}\), but the IN model performed worse in this case. This indicates that including a larger local neighborhood at each layer (Eq. 3) is more informative for the model.

Discussion and outlook

The network approach presented here provides a new framework for understanding the microphysical relationship between the morphological properties of BC and its larger scale physical properties. Here we have chosen to focus on the prediction of optical properties for numerically generated fractal aggregates, as the generation of these aggregates from combustion processes and their transformation during atmospheric aging is not yet completely understood. However, applying network theory to atmospheric aerosols suggests new directions for thinking about the generation of these fractal aggregates through combustion processes due to the connection between complex networks and percolation theory48. Here we have used a cluster-cluster algorithm, although previous work has noted that the morphology of numerically generated fractal aggregates depends not only on the parameters (\(N_{s}\), a, \(D_{f}\), and \(k_{f}\)) defining the shape of the aggregate, but also on which algorithm is used to generate the sphere positions (e.g. diffusion-limited aggregation or diffusion-limited cluster aggregation)38,39. The network approach provides a new framework from which to understand how realistically numerical algorithms reproduce the properties of aerosols formed during incomplete combustion through comparison of their network characteristics43. This approach may also be useful for inferring 3 dimensional structure of aggregates from 2 dimensional transmission electron microscope (TEM) images of these aerosols10,11, since it relates the relative positions of spheres to their overall morphological features; 2D methods have previously been shown to systematically underestimate the fractal dimension of BC49. Recent methods such as graph cumulants could provide sophisticated approaches to describe substructures of graphs (such as motifs or cliques used to describe clustering)50. Since any particular network observation is a single realization of an underlying generative process (in this case, the generation of primary aerosol particles from combustion sources), this framework could allow for an unbiased estimator of the variance of the propensity for specific graph substructures as a result of this generative process. These estimators could be used to compare specific aggregate-generating algorithms to observations of real fractal aerosol particles to assess the realism of the algorithms.

As a proof of concept we have trained a GNN to predict the optical properties of bare BC fractal aggregates with a range of different fractal parameters. This study demonstrates that modeling aerosol fractal aggregates as networks of interacting spheres provides morphological information that allows the machine learning model to extrapolate far beyond their initial training data set. This approach may also be useful for other fractal systems found in nature, such as turbulence, vegetation, or river networks.

BC in the atmosphere is typically internally mixed. The GNN approach provides an obvious extension to internally mixed aerosols (Fig. 1), as the thickness of coatings and their indices of refraction or organic fraction could be included as additional node-level features (in the thinly coated case) or graph-level features (for the thickly coated case). Other factors influencing the optical properties of aggregates such as “necking” between overlapping monomers could be included as edge features. Because atmospheric aerosol retrievals rely on orientation averaged parameters, models for predicting the scattering phase function should be equivariant under rotations. Recently developed equivariant machine learning methods51,52,53,54 may provide improved prediction of the orientation averaged optical properties.

Uncertainty in BC direct radiative climate effects is attributable to multiple factors, including BC’s emissions, lifetime, atmospheric processing, and optical properties1,2,55; the GNN approach could help resolve this uncertainty by improving both the interpretation of BC observations and by allowing BC’s morphology to be accurately represented in atmospheric models in a computationally efficient manner. As a greater understanding of BC’s physical properties from different source contributions and atmospheric aging pathways becomes available through laboratory and observational studies13,14,16, the major remaining hurdle to accurately representing BC in models will be computational.

While previous exact analytical methods have computational wall-times scaling from hours to days for larger aggregates, inference is on the order of < 0.3 s per aggregate for the trained GNN model (On a CPU– see SI Fig. S15). The computational time for these exact analytical methods has precluded exact calculations of aerosol optical properties being used in models or observational retrievals. CELES, a CUDA-accelerated version of MSTM capable of running on a GPU, demonstrated a factor of 1.5–6 times speed up over MSTM, but was still too slow to be implemented online in models56. The significantly faster time-scale for the GNN model, as well as its generalizability to arbitrarily shaped aggregates compared to more standard ML methods, has the potential to transform existing model parameterizations for BC. For MSTM computational wall times scale with \(N_{s}\), \(X_{v}\), and \(D_{f}\); while the total inference and memory scales with \(N_{s}\) and \(D_{f}\) in the GNN approach, it is no longer a function of \(X_{v}\).

We have focused here on the forward problem of predicting the optical properties of BC given an assumed single particle morphology; however such an approach may also be useful for the inverse problem, i.e. inferring the morphology given the scattering phase function and integral optical properties. This approach could also provide insight into other physical properties which require detailed information about particle morphology38, such as energy and heat transfer between aggregates and the surrounding gas needed to develop physical models of laser-induced incandescence7,57. Radiative transfer calculations for mineral dust and ice crystals also rely on detailed information about particle morphology, suggesting that the GNN approach would be useful for modeling their optical properties as well. This approach could mitigate several long-standing issues with model parameterizations and observational retrievals for these species, by providing flexible parameterization of arbitrarily shaped aerosol and cloud particles that are fast enough to be deployed online in atmospheric models.

Finally, these methods have potential for new applications of machine-learning assisted materials discovery58,59. Proposed geo-engineering approaches to mitigate global or regional impacts of climate change, such as stratospheric aerosol injection, marine cloud brightening, or precipitation enhancement, rely on the development of novel aerosol materials. Generative graph models could be used to determine optimal aerosol morphologies resulting in physical properties specific to these applications at a fraction of the cost of traditional numerical methods60.

Methods

Numerical aggregate properties

Cartesian coordinates for the positions of spheres in aggregates were determined using a cluster-cluster algorithm38,42. This cluster-cluster algorithm starts with primary clusters of size \(N_{c}\), and then randomly agglomerates these clusters pairwise into larger clusters. The algorithm repeats this process for multiple levels until it has combined all clusters into one larger cluster, which satisfies the scaling laws given by Eq. (1). Primary clusters of size \(N_{c}=3,4,5,7,9,11,13,15,17,19\) were used to generate aggregates between \(N_{s}\) = 8–960 spheres with fractal dimensions between \(D_{f}\) = 1.8–2.3. Following23, we assume a fractal pre-factor of \(k_{f}\)=1.2 (for the aggregates used in the MSTM calculations). We also investigated the network parameters of aggregates with \(k_{f}=\) 1.0–1.5, for a given \(D_{f}=1.8\) (Fig. S4b). Aerosols are assumed to consist of isotropic, homogeneous spheres, with size parameters \(X_{v} =\) 0.1, 0.3, 0.5, 0.7, 0.9, and 1.0, corresponding to monomer radii between 7–72 nm for incident light at 450 nm and 10–104 nm at 650 nm. For each primary cluster size and fractal dimension, 10 aggregate realizations were randomly generated.

Aerosol optical properties

For radiative transfer applications, the orientation-averaged total scattering \(\langle Q_{sca} \rangle\), extinction \(\langle Q_{ext}\rangle\), and absorption efficiencies \(\langle Q_{abs}\rangle\), as well as the asymmetry parameter \(g = \langle C_{sca} cos(\theta ')\rangle\) are typical parameters that are needed (\(C_{sca}\) is the scattering cross-section which is related to the efficiency as \(Q_{sca}=C_{sca}/(pi*a_{agg}^{2})\), where \(a_{agg}\) is the effective radius of the aggregate). The asymmetry parameter relates the amount of forward to back-scattered light. Other parameters relevant for radiative transfer, such as the single scattering albedo (SSA), can be derived from these parameters (SSA = \(Q_{sca}/Q_{ext}\)). The mass absorption coefficient (MAC) or mass extinction coefficient (MEC) are typically used to relate emissions of these aerosols to their direct radiative effects, and they are sometimes estimated theoretically from \(\langle C_{abs}\rangle\) or \(\langle C_{ext}\rangle\) with assumptions about particle density.

The scattering phase function relates the incident and scattered Stokes parameters, e.g., it indicates how light scattering from the particle is transformed relative to incident light in terms of its intensity and polarization state44. Here we assume initially unpolarized incident light, in which case the \(S_{11}\) element specifies the angular distribution of the intensity of scattered relative to incident light. The scattered light is partially polarized, with degree of polarization given by \(\sqrt{(S_{21}^{2}+S_{31}^{2}+S_{41}^{2})/S_{11}^{2}}\).

MSTM calculations of bare BC optical properties

To determine the ground-truth optical properties for the BC fractal aggregates generated by the cluster-cluster algorithm we use the Fortran-90 implementation of the multiple-sphere T-matrix code as described in45, which can run on a high-performance, parallel based computational platform. This code numerically solves for electromagnetic wave scattering from multiple (non-overlapping) sphere systems for either a fixed or random (orientation-averaged) orientation with respect to an incident plane wave. Here we have focused on calculation of random orientation optical properties, which utilizes the T-matrix procedure developed in18. We assume indices of refraction consistent with a range of values from the literature for BC at 550 nm: (1.4+0.4i, 1.6+0.6i, 1.8+0.8i, 2.0+1.0i). MSTM calculations were performed for these range of indices of refraction for 57,556 numerically generated aggregates for \(N_{s} < 100\); we used randomly chosen aggregates from this data set for the training, validation, and test sets for the model. To test the zero-shot performance, MSTM calculations were performed for 880 aggregates with these parameters in the size range \(100< N_{s} < 1000\); we randomly split this data into a zero-shot validation data set to evaluate the model’s performance and an independent zero-shot test data set. A summary of the range of parameters for each data set is given in Table S1. The distribution of parameters among the small (\(N_{s}<100\)) and large (\(N_{s}>100\)) aggregates are shown in Figs. S1 and S2, and the integral optical properties calculated with MSTM are shown in Fig. S3.

Graph neural networks

We used Pytorch Geometric61 to implement the GNN models. Several GNN approaches were tested, including a simple graph convolutional network (SGC)62, a graph convolutional network (GCN)30, and an interaction network (IN)31 (See Supplementary Information S1 for additional details of the graph models and a comparison of performance metrics among different model parameters and targets). The best performance for the integral optical properties used an IN model with a hidden layer size of 300 for both the node and edge models, and a message size of 100. Both the node and edge models are MLPs with ReLU as non-linear activation function between layers. Aggregation for the edge model is addition, with global mean pooling followed by dropout (p = 0.5) and a linear layer of size 100 as the global aggregation function. For the prediction of \(S_{11}\) we found that adding a fully connected node to each graph slightly improved the zero-shot performance. The model architecture was the same as that used to predict the integral optical properties. A batch size of 20 was used (training with a batch size of 2 led to slower training but did not lead to significantly worse performance). For the graph regression task, MSE loss was assumed. We trained the GNN models on a Nvidia RTX 8000 GPU.