Exploring robust architectures for deep artificial neural networks

The architectures of deep artificial neural networks (DANNs) are routinely studied to improve their predictive performance. However, the relationship between the architecture of a DANN and its robustness to noise and adversarial attacks is less explored, especially in computer vision applications. Here we investigate the relationship between the robustness of DANNs in a vision task and their underlying graph architectures or structures. First we explored the design space of architectures of DANNs using graph-theoretic robustness measures and transformed the graphs to DANN architectures using various image classification tasks. Then we explored the relationship between the robustness of trained DANNs against noise and adversarial attacks and their underlying architectures. We show that robustness performance of DANNs can be quantified before training using graph structural properties such as topological entropy and Olivier-Ricci curvature, with the greatest reliability for complex tasks and large DANNs. Our results can also be applied for tasks other than computer vision such as natural language processing and recommender systems.


Introduction
The architecture or structure of a deep artificial neural network (DANN) is defined by the connectivity patterns among its constituent artificial neurons.The mere presence or absence of a connection between two neurons or a set of neurons may provide a useful prior and improve the predictive performance of a DANN.A range of architectures has been developed over years to tackle various machine learning tasks in computer vision, natural language processing, and reinforcement learning [1][2][3][4][5] .In general, the process of the development of DANN architectures is manual, iterative and time consuming.AutoML and neural architecture search (NAS) attempt to use machine learning and search the design space of DANNs for architectures that may yield maximum test data accuracy.After the selection of a suitable DANN architecture for the given task, the optimal values of the connections (parameters or weights) are found using the training dataset and the well-known gradient descent or one of its variant algorithms.Recently, considerable research efforts have been focused on automating the laborious task of DANN architecture design and development using techniques of autoML and NAS.However, all such efforts are primarily focused on improving the test accuracy of the DANN on the given task.
In the real world, DANNs face the challenging problem of maintaining their predictive performance in the face of uncertainties and noise in the input data 6 .The challenge is further exacerbated for mission-critical application areas, such as clinical diagnosis, autonomous driving, financial decision-making, and defense.Ideally, a real world deployment-ready DANN should be robust to or equivalently maintain its predictive performance against two different types of noise, natural and malicious.The natural noise is related to the out-ofdistribution generalization.Such noise is caused by the day-to-day changes in input data, e.g., changes in the hardware or software configurations used for processing input data.The malicious or adversarial noise is imperceptible to human eye and is generated by an adversary for fooling the DANN into producing an erroneous decision 7 .
It has been shown with the help of Percolation theory that the architecture or structure underlying a network of any real-world system may play a key role in defining its robustness to various insults and attacks 8 .Graph-theoretic measures, such as network topological entropy and Ollivier-Ricci curvature, successfully quantify the functional robustness of various networks 9 .Examples include studying the behavior of cancer cells, analyzing the fragility of financial networks, studying the robustness of brain networks, tracking changes attributable to age and Autism Spectrum Disorder (ASD), and explaining cognitive impairment in Multiple Sclerosis (MS) patients [10][11][12][13] .Recently, the relationship between the architectures of DANNs (quantified by various graphtheoretic measures before training) and their predictive accuracy (available after training) has been established 14,15 .Various graph-theoretic measures (e.g., path length and clustering coefficient) calculated from the architectures of DANNs are quantitatively linked to their accuracy on various image classification tasks.However, the relationship between the graph-theoretic measures related to the robustness (entropy and Ollivier-Ricci curvature) of the architecture of DANNs and their performance against natural and adversarial noise has never been explored.Establishing such a relationship will allow the autoML and NAS research community to design and develop robust DANNs without training and testing these architectures.
In this work, we study graph-theoretic properties of architectures of DANNs to quantify their test-time robustness.Specifically, we use the graph measures of topological entropy and curvature of the architecture of DANNs as robustness metrics.We make two distinct research contributions to the robustness analysis of DANNs: (1) We establish a quantitative relationship between the graph-theoretic robustness measures of entropy and curvature of DANNs (available before training) and the robustness of these DANNs to natural and adversarial noise (evaluated after training DANNs).Previous studies explored graph measures that relate to the performance of DANNs, but robustness of DANNs through graph-robustness measures has never been studied.We show that graph entropy and curvature are related to DANNs' robustness and these structural measures can identify robust architectures of DANNs even before training for the given task.(2) We show that relationship between the graph robustness measured using entropy and Ollivier-Ricci curvature and the robustness performance of DANN against noise and adversarial attacks becomes significantly stronger for complex tasks, larger datasets, and bigger DANNs.Given that the sizes of DANNs and the complexity of tasks/datasets are growing significantly for many real-world applications, the strong entropyrobustness relationship assumes greater importance.The autoML/NAS design problems where robustness of DANNs is vital, our analysis can help identify robust architectures without the need to train and test these DANNs under various noisy conditions.
In Fig. 1, we provide an overview of the proposed approach.Fig. 1(a) illustrates how graph-theoretic measures are often applied in Network Science (NetSci) to study various real-world networks.The illustrated examples include biological systems such as brain networks, economic systems such as financial networks, and social systems such as social networks.Path length, graph connectivity, efficiency, degree measures, clustering coefficient, centrality, and spectral measures (curvature, entropy) are the graph-theoretic measures that researchers have employed for studying real-world networks [10][11][12][13]16 . Fig 1(b) illustrates our proposed methodology.We start with building random, scale-free, or small-world networks (or graphs) that are later transformed into architectures of DANNs.We study various graph-theoretic properties of these networks in the graph domain and later quantitatively relate these measures to the robustness of the trained DANNs built from these graphs.We hypothesize that the graph-theoretic measures that quantify the robustness of networks/graphs in the NetSci domain will also provide insight into the robustness of DANNs in the deep learning domain.We provide empirical evidence to support our hypothesis.We use the term DANN for deep artificial neural networks, graphs for unweighted directed acyclic graphs, and network for various networks as used in the network science (NetSci) domain.

Graph design space
We use two graph measures, average path length (L) and clustering coefficient (C), for exploring the graph design space.Extensively used in prior works [17][18][19] , these measures smoothly span the whole design space of the random graphs as illustrated in Fig. 2. We generate 2.313 Million (M) candidate random graphs using Watts-Strogatz flex (WS-flex) graph generator for a range of C and L values as illustrated in Fig. 2(a).We chose WS-flex because its graphs are superset of graphs generated by three classical methods including, Watts-Strogatz (WS), Erdős Rényi (ER), and Barabási-Albert (BA) 17,20,21 .We downsample 2.313 M candidate WS-flex graphs into coarser bins of 3854 and 54 graphs (Fig. 2(b)&2(c)), where each bin has at least one representative graph.We visualize our candidate graphs using their average path length (L) clustering coefficient (C) and entropy (H), which is a graph-theoretic measure for robustness as show in Fig. 2(d)&2(e).Fig. 2(a)&2(e) also depict the extreme cases of complete and sparse graphs.For a complete graph, we have (C, L, H) = (1.0,1.0, 4.1).

From graphs to DANN architectures
We transform the downsampled 54 graphs into DANNs using the technique of relational graphs proposed by You et al. 15 .We transform the same 54 graphs into multiple types of DANNs including, multilayer perceptrons (MLPs), convolutional neural networks (CNNs), and residual neural networks (ResNets).We use four image classification datasets of varying complexity to train and evaluate DANNs built using 54 different graph structures.These datasets include CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet [22][23][24] .
The robustness of trained DANNs is quantified by subjecting these models to various levels and types of natural and malicious noise.We used three types of additive noise, Gaussian, Speckle, and Salt&Pepper.For malicious noise, we employ three different adversarial attacks with varying severity levels.These include Fast Gradient Sign Method (FGSM) 25 , Projected Gradient Descent (PGD) 26 , and Carlini Wagner (CW) 27 .We observe a consistent decline in the predictive performance of all DANNs as the severity levels of adversarial attacks or natural noise increase.

Performance trends of DANNs
Fig. 3 presents predictive performance of different MLPs, CNNs, and ResNets built using 54 selected graphs and trained on four different image classification datasets.Performance evaluation of the trained DANNs is done using randomly selected 30 different sets of clean, adversarial, and noisy images.The test accuracy numbers presented in Fig. 3 are average values across all tests.

CNNs on CIFAR-10
Panel 2 of Fig. 3 shows the average test accuracies of 8-layer CNNs built from the same 54 candidate graphs.We observe that the average clean test accuracy for CNNs is 84.19 ± 1.26%, dropping to 8.07 ± 1.43% under PGD attack, and to 34.35±3.12%under CW attack.We noticed similar trends for various levels of FGSM attacks, as well as for the Gaussian, speckle, salt&pepper noise.

Comparison of MLPs vs. CNNs on CIFAR-10
We observed that CNNs achieve higher accuracy on the clean test data as compared to MLPs on CIFAR-10 dataset.However, under adversarial conditions (FGSM, PGD, and CW attacks), the drop in the performance of CNNs is significantly higher than MLPs as shown in panels 1 and 2 of Fig. 3.The test accuracy drop is ∼ 76% for CNNs compared to ∼ 33% for MLPs under PGD attack.For the CW attack, the accuracy drop for CNNs is ∼ 50% compared to ∼ 34% for MLPs.The same trend was observed for all severity levels of the FGSM attack.Generally, as expected CNNs outperform MLPs under clean test conditions; however, MLPs are more robust to adversarial perturbations as compared to CNNs.We argue that the observed fragility of CNNs is linked to their weight sharing and shift-invariant characteristics, which was previously noted by Zhang et al. 28 .

Robustness analysis
Our work is a cross-pollination between graph theory and deep learning.We attempt to link the robustness of graphs underlying the architectures of DANNs to their performance against noise and adversarial attacks.On the graph theory side, we use entropy and Ollivier-Ricci curvature to quantify the robustness of graphs.These graphs, in turn, are used to build architectures of DANNs.On the deep learning side, we train these DANNs and quantify their robustness using test accuracy against various types of noise and adversarial attacks.Entropy and Ollivier-Ricci curvature have been extensively studied in the NetSci.These measures have been shown to capture the robustness of cancer networks 9,10 , track changes in brain networks caused by age and Autism Spectrum Disorder 12 , explain cognitive impairment in patients with Multiple Sclerosis 13 , identify financial market's fragility 11 , and detect communities in complex social networks 16 .We study the robustness of DANNs and establish the statistical correlation of the observed robustness with entropy and curvature.The correlation results for entropy of graphs and test robustness of DANNs for different datasets are given in Fig. 4, 5, and 6.The correlation results between the robustness of DANNs and graph curvature are provided in Supplementary appendix C.  of corresponding ResNet-18 under various conditions.The Pearson productmoment correlation coefficient values between entropy and accuracy along with p values are shown on each sub-plot.There was a positive correlation between the two variables, r = 0.73, n = 54, p<0.05 for the clean test dataset.We note similar behavior under PGD and CW attacks, that is, a strong correlation between entropy and accuracy exists, r = 0.69 for PGD and r = 0.85 for CW, p < 0.05 for both.Similar trends exist for various severity levels of FGSM attack, Gaussian, speckle, and salt&pepper noise.In general, across all types of adversarial attacks and noises, the DANNs corresponding to graphs with higher entropy showed stronger robustness and vice versa.Additional results are provided in Supplementary Figs.B3 and B4.The circles represent an average value calculated over five runs.We note a strong positive correlation between entropy and the accuracy of DANNs for all types of noise cases.

ResNet-18 on ImageNet and Tiny ImageNet
Fig. 5 presents test accuracy vs. entropy plots for 54 ResNet-18 models trained using Tiny ImageNet and tested under various noisy conditions.We observe a strong positive correlation between entropy and predictive performance under all noise conditions.However, there is a notable decrease in the Pearson product-moment correlation coefficient values in all noise categories compared to the same DANNs when trained and tested on ImageNet.As Tiny ImageNet is a subset of ImageNet with only 200 distinct classes instead of 1,000, the observed decrease in the correlation may be linked to the reduction in complexity of the task, i.e., 200 classes instead of 1,000.

CNNs on CIFAR-100 and CIFAR-10
In Fig. 6(a)&(b), we present accuracy vs. entropy plots for the 54 8-layer CNNs trained on CIFAR-100 and CIFAR-10 datasets and tested under various noisy conditions.For the CIFAR-100 experiments, we observe relatively strong correlation between entropy and predictive performance except for CW (r = 0.40, p < .05)and PGD (r = 0.14, p < .05)adversarial attacks.For CIFAR-10 dataset, there is a significant correlation between entropy and predictive performance except for the PGD, CW attacks and salt&pepper noise which were not statistically significant.
We opine that the weak correlation between graph entropy and DANNs' performance under PGD and CW attacks is due to the strong nature of PGD and CW attacks on relatively simple classification tasks of CIFAR compared to Tiny ImageNet and ImageNet.This opinion was strengthened from the evaluation results of the CNNs on a more straightforward classification task of CIFAR-10.We observe that the correlation of entropy with the predictive performance of CNNs reduces for all categories.Moreover, the entropy's correlation with accuracy under CW attack becomes negative.Under PGD attack and salt&pepper noise, it becomes insignificant with p>0.05 as highlighted by the red text in respective subplots of Fig. 6.

Effect of task and model complexity
We observed that DANNs' robustness, evaluated under noisy conditions, and the robustness of underlying graph structures, quantified using entropy, are strongly correlated.Moreover, this correlation has a strong dependence on the complexity of the model and/or the dataset.In our settings, the model complexity refers to the number of parameters in the model and the task complexity refers to the number of classes in the dataset.As the complexity of the task and/or model increases, the correlation between robust performance and entropy of DANNs increases, as shown in Fig. 7.
In Fig. 7(a), we note that for the same 8-layer CNNs, increasing the complexity of the task (from 10 classes of CIFAR-10 to 100 classes of CIFAR-100) results in increase in the correlation values as noted by the Student's t-test (t = −2.31,n = 34, p < .05).The same holds true for increasing the task complexity from 200 classes of Tiny ImageNet to 1000 classes of Ima-geNet and using the same ResNet-18 models (t = −4.66,n = 23, p < .05),as shown in Fig. 7(b).In Fig. 7(c), we present the effect of increasing the model complexity measured by the number of parameters against the entropyrobustness correlation.We observe that for the same CIFAR-100 dataset, as the model complexity increases from ∼0.3 M parameters in ResNet-29 to ∼1.3 M in CNN, the entropy-robustness correlation increases significantly (t = −6.8,n = 23, p < .05).

Discussion
In this work, we have shown that graph structural properties such as entropy and curvature can quantify the robustness of DANNs before training.We calculated entropy and curvature of a set of random graphs, which were later transformed into architectures of different types of DANNs.The DANNs were trained and their robustness was evaluated using different types of natural and adversarial noise.We noted that the robustness of trained DANNs was highly correlated with their graph measures of entropy and curvature.We also noted that the said correlations were even stronger for relatively large models and complex tasks.
Currently various autoML and NAS techniques are being developed to search for accurate model architectures for the given datasets and/or tasks.We argue that for many mission-critical applications, the robustness of these models is equally or in some cases more important than accuracy.However, as there are currently no assured ways of estimating the robustness of DANNs in the graph design space except training and testing the candidate DANNs in the deep learning domain.We suggest that the users of autoML/NAS techniques should incorporate entropy and Ollivier-Ricci curvature information into their search framework.Such a practice would allow users or autoML/-NAS algorithms choose accurate as well as robust DANNs keeping in view the application area of the machine learning model.The users and autoML/NAS algorithms can identify and choose the most robust model out of all the models that meet the accuracy criteria set by the user.
A possible future direction is to extend the presented analysis to more complex tasks (e.g., natural language processing) and larger models (e.g., Transformers).Given our current analysis, we anticipate that for the larger datasets, complex tasks, and huge models, the graph robustness measures will be even more relevant and will help users/autoML/NAS algorithms find robust DANN architectures.

Methods
We start by presenting the techniques we employed for generating random graphs in the graph theory domain.Next, we describe the graph-theoretical properties used in our experiments to study random graphs.These graph measures are needed to study the structural information of the random graphs.
Next, we provide details on transformations for building DANN architectures from random graphs and training these DANNs for various computer vision classification tasks.Finally, we present the multiple conditions, including natural noise and adversarial attacks that we used to evaluate the trained DANNs and quantify their robustness.

Generating Random Graphs
Random graphs are extensively used in percolation studies, social sciences, brain studies, and deep learning to understand the behavior of natural systems and DANNs 14,19,20,29,30 .We used random graphs, called relational graphs, employed recently in deep learning 15 .

Relational graphs
A recent a study used relational graphs and showed that the performance of a DANN can be quantified using its graph properties such as clustering coefficient and path length 15 .The relational graphs are generated through the WS-flex graph generator.WS-flex is a generalized version of the WS model having same-degree constraint relaxed for all nodes.Parameterized by N nodes, K average degree, and P rewiring probability, we represent these graphs by WS-flex(N, K, P ).For the graph generator, we use notation g(θ, s), where g is the generator (for example, WS-flex), θ represents parameters (N, K, P ), and s is the random seed.It is important to note that WS-flex(N, K, P ) graph generator encompasses the design space of all the graphs generated by the three classical families of random graph generators, including Watts-Strogatz (WS), Erdős Rényi (ER), and Barabási-Albert (BA) 15,17,20,21 .

Graph-Theoretic Measures
Average Path Length (L).It is a global graph measure defined as the average shortest path distance between any pair of graph nodes.It depicts the efficiency of the graph with which information is transferred through the nodes 31 .Small values of L indicate that the graph is globally efficient, and the information is effectively exchanged across the whole network and vice versa.Let G be an unweighted directed graph having V , a set of n vertices Clustering Coefficient (C).Clustering coefficient is a measure of the local connectivity of a graph.For a given node i in a graph, the probability that all its neighbors are also neighbors to each other is called clustering coefficient.
The more densely interconnected is the neighborhood of a node, the higher is its measure of C. Large value of C is linked with the resilience of the network against random network damage 32 .The small-worldness of networks is also assessed by C 33 .For a node i with degree k i , clustering coefficient C i is defined as, where d i is the number of edges between the k i neighbors of node i. Graph Spectral Measures.The spectral measures focus on eigenvalues and eigenvectors of the associated graph adjacency and Laplacian matrices.We will use topological entropy and Ollivier-Ricci curvature.

Topological Entropy(H). Entropy of graph G having adjacency matrix
A G , is the logarithm of the spectral radius of A G , i.e., logarithm of the maximum of absolute values of the eigenvalues of A G 34 .
2. Ollivier-Ricci Curvature (ORC).It is the discrete analog of the Ricci curvature 35,36 .From the many alternatives of Ricci curvature 37 , we use the definition presented by Farooq et al. 12 (see Fig. 6 of ref).Let (X, d) be a geodesic (a curve representing the shortest path between two points on a surface or in a Riemannian manifold) metric space having a family of probability measures {p x : x ∈ X}.Then, ORC κ ORC (x, y) along the geodesic connecting x and y is, where W 1 is the earth mover's distance (Wasserstein-1 metric), and d is the geodesic distance on the space.Curvature is directly proportional to the robustness of the network.The larger the curvature, the faster will be the return to the original state after perturbation.Smaller curvature means slow return, which is also called fragility 12 .
Robustness (R).It is the rate at which a dynamic system returns to its original state after perturbation.Fluctuation theorem 38 states that, given random perturbations to the network, change in robustness ∆R is positively correlated to change in system entropy ∆H, Entropy ∆H and curvature ∆κ ORC are also positively correlated (see Equation (7) of Tannenbaum et al. 9 ), that is, From Equations ( 5) and ( 6), we see that graph curvature and robustness are also positively correlated, Equations ( 5) and ( 7) are the primary motivation in this work to study the curvature and entropy of deep neural networks.

From graphs to DANNs
Let G = (V, ε) be a graph having node-set V = {v 1 , v 2 , ..., v n }, where node v has feature vector x v , and edge set ε={ To transform the graphs into DANNs, we adopt the concept of neural networks as relational graphs 15 .In relational graph, a single node represents one input channel and one output channel.Edge in the relational graph represents a message exchange between the two nodes it connects.The message exchange is a message function having node feature x v as input and a message-aggregation function as output.The aggregation function takes a set of messages as input and gives an updated node feature as output.One iteration of this process is one round of message exchange.At each round, each node sends messages to its neighbors, receives messages from all the neighbors, and aggregates them.At each edge, message transformation occurs through a message function f (.), followed by summation at each node through an aggregation function F (.).The i-th message exchange round between nodes v and u can be expressed as, You et al. have shown that Equation ( 8) is the general definition of message exchange that can be used to instantiate any neural architecture 15 .We generate MLP, CNN, ResNet-18, and ResNet-29 for each of the 54 random graphs generated from the WS-flex generator.
The same 54 WS-flex random graphs were transformed into a total of 216 DANNs having 54 neural networks in each of the four categories (MLP, CNN, ResNet-18, and ResNet-29).MLPs were trained on CIFAR-10 dataset, whereas, the CNNs were used for training on CIFAR-10 and CIFAR-100 datasets.The same ResNets-18 were used for training on ImageNet and Tiny ImageNet datasets.The baseline architectures have a complete graph structure for each architecture category.To ensure consistency of our results, we trained each MLP and CNN five times and ResNets one time on respective datasets.The results reported in this paper are average values calculated for thirty different inferences over random test inputs for each MLP and CNN, whereas, five random test inference runs for each ResNet.The compute resources and wall clock times are given in Supplementary appendix E. List of frameworks and hyperparameters used in our experiments are provided in Supplementary appendix D.

Datasets
We used four different image classification datasets for our experiments that allowed us to train DANNs of different sizes on tasks that varied in their complexity.We used 10-class CIFAR-10 22 dataset to train MLPs and CNNs.CIFAR-100 22 dataset having 100 classes was used to train CNNs and ResNet-29.Both datasets have 50,000 training images and 10,000 validation images.To further scale our experiments, we trained ResNet-18 on the Tiny Ima-geNet 23 dataset having 200 classes.Each class in Tiny ImageNet has 500 training images and 50 validation images.We also trained ResNet-18 on the ImageNet 24 dataset having 1,000 classes, 1.2 M training images and 50,000 validation images.

Robustness analysis
We assessed the robustness of DANNs against natural additive noise and malicious noise (adversarial attacks).First, we evaluated the models using clean test images from respective datasets.Then we fed DANNs with different test images corrupted with additive noise and adversarial attacks.It is important to note that we chose the severity levels of adversarial attacks and additive noise so that the predictive performance of DANNs is at the minimum greater than 3%.We observed at higher levels of noise, the performance would naturally drop to 0%, which was not helpful in our analysis.Moreover, different severity levels work on different datasets owing to the inherent features and attributes of the data.Performance evaluation under adversarial attacks.We evaluated DANNs using adversarial examples generated from three different types of attacks, (1) Fast Gradient Sign Method (FGSM) 25 , (2) Projected Gradient Descent (PGD) 26 , and (3) Carlini Wagner (CW) 27 .
Consider a valid input x 0 and a target class y 0 .It is possible to find x through imperceptible non-random perturbation to x 0 that changes a DANN's prediction to some other y; such x is called an adversarial example.Given a loss function J(x;w), x 0 be the input to the model having parameter w, the adversarial example x is created by the adversarial attack as, FGSM : PGD : CW : min In Equation (9), is the severity level of the attack and should be small enough to make the perturbation undetectable.In Equation (10), x t is an adversarial example after t-steps, α is the step-size, Π x+B refers to the projection operator for each input x having a set of allowed perturbations B chosen to capture the perceptual similarity between images.In Equation (11), c > 0 is the attack magnitude, i is the input class, and j is the target class.FGSM and PGD have the l ∞ -distance metric, whereas CW, a regularization-based attack, has l 2 -distance metric in our analysis.

Statistical analysis
We conducted various statistical tests to ascertain the significance of our analysis.We computed the Pearson product-moment correlation coefficient to assess the relationship between adversarial accuracy and the graph robust structural properties.We also computed the Pearson product-moment correlation coefficient between different structural graph-theoretic measures as shown in Supplementary Fig. B7.We used the Student's t-test to establish that average of the correlations between entropy and robustness for two types of datasets as well as two model types are statistically different.This analysis established how entropy is related to the increase in model size and task complexity.The significance level in all these analyses is set to 95%.

Appendix D Frameworks and Hyperparameters
Frameworks and corresponding packages used in our experiments are given in Table D1.The hyperparameters used in the training and evaluation of DANNs are given in Table D2.For the sake of procedural consistency and comparisons of results, the set of parameters other than the those mentioned in Table D2 are kept the same as in original experiments for relational graphs by their respective authors 15 .

Fig. 1
Fig. 1 Exploring robustness of deep artificial neural networks (DANNs) with graphtheoretic measures.(a) In the network science (NetSci), real-world systems such as brain networks, financial networks, and social networks are studied using graph-theoretic measures to quantify their robustness and fragility.(b) We use graph-theoretic measures established in NetSci to study graphs of architectures of DANNs.Our approach consists of five steps: (1) build random graphs using classical families of graphs, including ER, BA, WS, and WS-flex, (2) calculate graph-theoretic measures of these random graphs and select a small subset from the entire design space for further analysis, (3) convert selected random graphs into architectures of DANNs, (4) train, validate and test these DANNs under different natural noise and adversarial conditions, and (5) analyze and link the robustness of architectures (measured with graph-theoretic properties) to the performance of trained DANNs against natural noise and adversarial attacks.In summary, using graph-theoretic robustness measures, we can find robust architectures for DANNs without exhaustively training and evaluating many DANNs.

Fig. 2
Fig. 2 The graph design space for generating random graphs.(a),(b),(c): show 2.313 M candidate graphs from WS-Flex generator downsampled to 3854 and 54 graphs.(d) and (e) show 3854 and 54 graphs in a 3-D space spanned by clustering coefficient (C), path length (L), and entropy (H).Samples of the complete and sparse graphs are identified in (a) and (e).

Fig. 4
Fig. 4 presents 54 ResNet-18 DANNs trained on ImageNet and tested on clean images, adversarial examples generated with FGSM, PGD, and CW attacks, and images with additive Gaussian, speckle, and salt&pepper noise.Each subplot shows entropy (H) of the underlying graph structure and the test accuracy

Fig. 4
Fig. 4 Test accuracy vs. entropy for ResNet-18 on ImageNet.Test accuracy is shown on the vertical axis and entropy (H) of the underlying graph is shown on the horizontal axis.The circles represent an average value calculated over five runs.The type and severity level of noise is shown on the top of each sub-plot.Sub-plots also show trendlines and Pearson correlation coefficients (r) with p value.We note significant positive correlation between graph entropy and the performance of the DANNs for all cases.

Fig. 5
Fig. 5 Test accuracy vs. entropy for ResNet-18 on Tiny ImageNet.Test accuracy is shown on the vertical axis and entropy (H) of the underlying graph is shown on the horizontal axis.The circles represent an average value calculated over five runs.We note a strong positive correlation between entropy and the accuracy of DANNs for all types of noise cases.

Fig. 6
Fig. 6 Accuracy vs. entropy for CNNs trained on (a) CIFAR-100 and (b) CIFAR-10 datasets and tested under various noisy conditions.The circles represent average test accuracy over 30 runs.The Pearson correlation r and corresponding p values between entropy and accuracy are also presented for each noise condition.Red text color shows correlation values that are not significant.For the same 8-layer CNNs, the entropy-robustness correlation values increase with the task complexity, that is, relatively higher correlation values are observed for CIFAR-100 as compared to CIFAR-10 dataset for all noise conditions.

Fig. 7
Fig. 7 Effect of task and model complexity on entropy (calculated in the graph domain) and robustness (evaluated in the deep learning domain) relationship.(a) The entropy-robustness correlation coefficient (vertical axis) is plotted against the number of classes (horizontal axis) for CIFAR-10 and CIFAR-100 datasets for 8-layer CNNs.As the task becomes complex, entropy becomes significantly more correlated to the robustness of DANNs.The inset box shows the Student's t-test statistical analysis.There is significant increase in the entropyrobustness correlation values as the task complexity increases from 10 classes to 100 classes.(b) Entropy-Robustness correlation plot for Tiny ImageNet (200 classes) and ImageNet (1000 classes) datasets.The entropy-robustness correlation for the same DANNs (ResNet-18 in this case) increases significantly as the task becomes complex.(c) The entropy-robustness correlation coefficient is plotted against the number of model parameters for different DANNs (ResNet-29 and CNNs) trained and tested using CIFAR-100 dataset.As the number of parameters increases for the same classification task, there is significant increase in the entropy-robustness correlation (p < .05).

Fig. B2
Fig. B2 Test accuracy vs. entropy for MLPs trained on CIFAR-10 dataset.The correlations between entropy and test accuracy are insignificant for most of the evaluation categories.

Fig. B3
Fig. B3 Additional results for ResNet-18 on ImageNet dataset.The experiments show strong positive correlation between entropy and test accuracy for all evaluation categories.

Fig. B4
Fig. B4 Additional results for ResNet-18 trained and evaluated on Tiny ImageNet dataset.

Fig. B5
Fig. B5 Additional results for robustness evaluation of CNN on CIFAR-100 dataset.

Fig. B6
Fig. B6 Additional results for robustness evaluation of CNN on CIFAR-10 dataset.

Fig. B7
Fig. B7 Correlation of Entropy and Curvature with other graph measures considered in our study.The Pearson correlation coefficient is shown in the inset box of each plot.Top row (in green color) shows the correlation of topological entropy with curvature, clustering coefficient, average path length, average degree, global efficiency, betweenness centrality, eigenvector centrality, and local efficiency.The bottom row (in blue color) depicts the correlation between curvature and aforementioned graph properties.We considered various graph-theoretic measures in our experiments for quantifying robustness of DANNs.Consistent with the findings of previous studies in NetSci, the graph structural properties of entropy and curvature are better indicators of DANNs' robustness.

Fig . F9
Fig. F9 Comparison of clean images from ImageNet dataset with images having natural noise.Three noise types have been used in our experiments; Gaussian, Speckle, and Salt&Pepper.The severity level for each image is shown as a variance (σ 2 ) for Gaussian and Speckle noise types, and as salt vs. pepper ratio=0.5 in Salt&Pepper noise.As the severity levels increase, the images are visibly distorted.
Appendix C Curvature vs. test accuracy Pearson correlation coefficient between graph curvature and test accuracy of DANNs.All values except ( †) are significant, r(52), p<0.05.The bold font indicates the better accuracy of the same DANN on one dataset compared to the other dataset.† denotes insignificant correlation values, r(52), p>0.05.These results indicate that curvature can quantify the robustness of DANNs, especially in complex tasks and bigger models.

Table D1
Frameworks and packages used in our codebase.Table D2Training and evaluation hyperparameters for our experiments on DANNs.