Universality of political corruption networks

Corruption crimes demand highly coordinated actions among criminal agents to succeed. But research dedicated to corruption networks is still in its infancy and indeed little is known about the properties of these networks. Here we present a comprehensive investigation of corruption networks related to political scandals in Spain and Brazil over nearly three decades. We show that corruption networks of both countries share universal structural and dynamical properties, including similar degree distributions, clustering and assortativity coefficients, modular structure, and a growth process that is marked by the coalescence of network components due to a few recidivist criminals. We propose a simple model that not only reproduces these empirical properties but reveals also that corruption networks operate near a critical recidivism rate below which the network is entirely fragmented and above which it is overly connected. Our research thus indicates that actions focused on decreasing corruption recidivism may substantially mitigate this type of organized crime.


Results
We start by presenting the two datasets of corruption scandals used in our study. The Brazilian data is the same reported in Ref. 19 and comprises 65 well-documented political corruption scandals that occurred in Brazil between 1987 and 2014. This information was manually compiled from web pages of magazines and newspapers with wide circulation and includes the names of the 404 people involved in each of the 65 scandals. The Spanish data are original to our work and have been extracted in May 2020 from a non-profit website 24 that aims to list all known corruption scandals in Spain. The information on this website is also compiled from publicly accessible web pages of popular Spanish news magazines and daily newspapers. The Spanish data comprises 437 corruption scandals that occurred between 1989 and 2018 and involved 2753 people.
Having described our datasets, we first examine the distribution associated with the size (number of people involved) of corruption scandals. As reported in Ref. 19 for Brazil, we find that the size distribution of scandals is roughly approximated by an exponential distribution with a characteristic number of people around seven people for both countries (Fig. 1A,B). Despite the deviations between the exponential model and the empirical distributions observed for large scandals, this result shows that political corruption runs in small groups that rarely exceed more than ten people (only 20% and 17% of corruption cases in Spain and Brazil, respectively). Thus, it seems that corrupt agents usually rely on a small number of cronies for running their criminal activities, probably because large-scale processes are hard to manage and remain undetected for longer periods 25 . Moreover, the surprising similarities in size distributions of scandals in both countries already indicate a possible universal pattern related to political corruption processes.
To investigate the emerging patterns of people involved in corruption cases, we have created a static network representation of these scandals where people are nodes and connections among them indicate individuals engaged in the same corruption case. Figure 2A Fig. 2A,B) and usually merge more than one scandal into a single network module, as estimated by the infomap clustering algorithm 26,27 . Indeed, the ratio between modules and scandals is 0.76 for the Spanish network and 0.62 for the Brazilian one. Moreover, we find that the degree distributions of both networks are well-approximated by exponential distributions with characteristic degrees equal to 20.0 people for Spain and 17.6 people for Brazil (see insets of Fig. 3A,B). Beyond the previous static representation, our data allow us to investigate dynamical patterns associated with the growth of these corruption networks over time. To do so, we create time-dependent networks of people involved in corruption scandals up to a given year. Then, by increasing this threshold year, we observe a process of network growth in which new nodes and edges among new and old nodes emerge year after year due to the discovery of new scandals. Using this time-varying representation, we first ask whether the approximated exponential degree distribution holds for all years. We have fitted the exponential model to each stage of our networks via the maximum-likelihood method, and the results indicate that the degree distributions are in good agreement with the exponential distribution for all years of both Spanish and Brazilian networks. Figure 3A,B show the complementary cumulative degree distributions divided by the characteristic degree (insets depict the degree distributions for the latest network stage), where the linear behavior on the log-linear scale and the good quality collapse of the distributions support the exponential hypothesis. Moreover, Fig. 3C,D depict the evolution of the characteristic degree of our corruption networks. We observe significant variations in earlier network stages followed by an approximately steady characteristic degree in later stages that is surprisingly similar for the two countries.
We have also investigated how the size of the main components of our corruption networks changes over time. Figure 4A,B show the evolution of these quantities for the giant and second-largest components, where we find abrupt changes between particular years. For the Spanish network, the giant component steeply increases between 2011 and 2012, while the second-largest component abruptly shrinks during the same time interval. The Brazilian network exhibits similar patterns between the years 2004 and 2005, as also reported in Ref. 19 . This behavior is qualitatively similar to what happens in percolation transitions 28 and indicates the existence of a coalescence-like process of network components. Indeed, by visualizing snapshots of our corruption networks (Fig. 4C,D), we discover that the emergence of new political scandals involving a few recidivist agents causes the abrupt changes observed in the largest components. www.nature.com/scientificreports/ As we have already shown, the latest stage of these networks displays a modular structure in which two or more scandals are usually merged into a single network module (Fig. 2). We now ask whether this behavior is particular to later network stages or a more general property of different stages of corruption networks. To answer this question, we examine the modular structure of these networks year after year using the infomap algorithm 26,27 and determine the association between the number of network modules and the total of corruption scandals. While there is no fail-safe method for community or modular structure detection in networks 29,30 , we use the infomap due to its computational efficiency and good performance in benchmark tests with planted partition models 29,30 ; however, we find similar results with modularity maximization or stochastic block models. Figure 5A,B show that the number of network modules grows linearly with the total of political scandals with similar rates for both countries (0.744 modules per scandal for Spain and 0.626 modules per scandal for Brazil). Thus, despite the underlying complexity of corruption processes, the structure of corruption networks approximately preserves the ratio between number of modules and scandals over their entire growth process. It is also worth remarking that this precise balance between modules and scandals is driven by the emergence of recidivist agents responsible for connecting different political scandals.
The dynamics of the largest network components and the linear association between modules and scandals exposed the critical role recidivist agents have on the structure of these corruption networks. To further understand the emergence of these special agents, we have investigated how the number of recidivist agents increases as new scandals are discovered and added to our corruption networks. Figure 5C,D show the relation between the number of recidivist agents and the total of people for each year of the corruption networks of both countries. We observe that these two quantities are linearly associated, which implies that agents become recidivists at an www.nature.com/scientificreports/ approximately constant rate over the years. By fitting a linear model to the association between the number of recidivist agents and the total of people, we find the recidivism rate to be 0.090 ± 0.001 recidivists per agent for Spain and 0.142 ± 0.003 recidivists per agent for Brazil. These rates indicate that we expect to find about nine recidivists every hundred corrupt agents in the Spanish network. In comparison, the Brazilian network has about fourteen recidivists per hundred corrupt agents. Moreover, the higher recidivism rate observed for Brazil partially explains why the Brazilian network is denser and characterized by a lower average shortest path length than the Spanish counterpart. Motivated by our empirical findings and the commonalities between the Spanish and Brazilian networks, we propose a simple model describing these corruption networks. This model starts with an empty network that grows by including complete graphs representing political scandals at each iteration. The number of people or the size of these complete graphs (s) is randomly drawn from an exponential distribution (P) to mimic the empirical behavior (Fig. 1), that is, P(s) ∼ e −s/s c , where s c represents the characteristic size of corruption scandals (empirically, s c ≈ 7 people). We consider that part of the agents added to the network at each iteration are recidivists. By following the empirical behavior (Fig. 5C,D), we assume the number of recidivists (r) to increase linearly with the total number of agents (n) via r = αn − β , where α is the recidivism rate and β > 0 controls the minimal number of people necessary for the emergence of the first recidivist agents. We keep track of the number of recidivists during the network growth process, and when new recidivists emerge, we randomly select nodes already present in the network to become recidivists and make them belong to the next scandal (complete graph) added to the network. Moreover, when selecting nodes for representing recidivist agents, we can select nodes that were already recidivists with a small probability p or nodes that will become recidivists for the first time with probability 1 − p . This last procedure allows us to control the number of agents involved in more than two corruption scandals and reproduce the empirical behavior as about 2.5% of all agents of both Spanish and Brazilian networks fit this condition.
We have generated networks using this model for different parameters and observed that the recidivism rate α is the most relevant parameter for the network structure. Because of this, we have fixed s c = 7 , β = 12 ,  Fig. 6), we find a distinct peak that defines a critical recidivism rate α c = 0.065 capable of generating networks visually similar to the empirical corruption networks (see the network example in Fig. 6). Interestingly, this critical recidivism rate is close to the empirical rates estimated for the Spanish ( α = 0.09 ) and Brazilian ( α = 0.14 ) networks. We have also investigated the model behavior near this critical point and the results suggest that this transition from very fragmented to chain-like networks has a continuous nature (see Fig. S5) similarly to classical percolation transitions. Thus, corruption processes seem to operate close to a critical recidivism rate below which the network becomes entirely fragmented and above which it is overly connected. To compare our model with the empirical results, we have simulated an ensemble of one hundred networks using the recidivism rate of each country while fixing all other parameters ( s c = 7 , β = 12 , and p = 0.025 ). In these simulations, the number of complete graphs added to the networks is set equal to the total number of  www.nature.com/scientificreports/ In addition to static properties, we have also verified that our model replicates the growth process of corruption networks. In particular, we find that the degree distributions of simulated networks are well-described by exponential distributions with characteristic degrees approaching a constant value for later network stages (Fig. S2). The evolution of simulated networks is also marked by the coalescence of components, which in turn reproduces the abrupt changes observed in the size of the largest and second-largest components (Fig. S3). Simulated networks also display modular structures that often merge two or more scandals into single network modules. Moreover, the association between the number of network modules and scandals is linear over the entire growth of simulated networks (Fig. S4), although the average ratios between number of modules and scandals estimated from simulations with the recidivism rates of Spain and Brazil ( 0.813 ± 0.001 and 0.783 ± 0.001 , respectively) are slightly larger than the empirical ones ( 0.744 ± 0.004 and 0.63 ± 0.02 , respectively).
While the agreement between properties of empirical and simulated networks is far from perfect, it remains surprising that such a simple model qualitatively replicates all features of our corruption networks, including dynamical properties. Part of the discrepancies between data and model (such as the smaller characteristic degree and the larger average shortest paths obtained in the simulations) can be attributed to the deviations observed between the exponential distribution and the size distribution of scandals (Fig. 1). However, these deviations also indicate that other processes are likely to affect the structure of corruption networks. An exciting possibility that future investigations can explore has to do with the fact that our model does not distinguish among corrupt agents. This distinction is crucial in the present context of rising political polarization [31][32][33][34] , where one may expect partisan and ideological divisions to be also reflected in political corruption and thus on the structure of corruption networks. Besides the likely importance of this and other mechanisms related to corruption processes, our findings indicate that the recidivism of a small fraction of corrupt agents is crucial for the structure and dynamics of corruption networks.

Discussion
We have presented an extensive characterization of static and dynamical properties of corruption networks related to political scandals in Spain and Brazil. Despite important differences in the political systems of both countries, our results have shown that the Spanish and Brazilian corruption networks share surprisingly similar structural and dynamical properties. This universality indicates that corruption processes share universal features that are independent of social and cultural differences among countries, as well as independent of individual psychological attributes of corrupt agents. Moreover, we have proposed a simple model in which the recidivism rate is the main ingredient to strengthen this hypothesis. Simulations of our model not only qualitatively replicate all properties of the empirical networks but also indicate that corruption processes appear to operate near a critical recidivism rate. Corruption networks simulated below this critical recidivism rate are completely fragmented, while networks generated above this critical value become overly connected.
Taken together, empirical results and simulations indicate that a few recidivist agents typically play a prominent role in corruption activities. These agents act as bridges among minor corrupt groups and possibly engage and coordinate them to work in more extensive and often much more harmful corruption processes to society. Considering the many adverse impacts of corruption on democracy 35 , economy 36,37 , and on the trust in the rule of law 38 , our findings indicate that public policies and operational law enforcement activities focused on decreasing corruption recidivism, such as increasing the severity of sentences, swift legal processes, and strict serve of sentences, are likely to have a significant negative impact on this type of organized crime by reducing the overall connectivity of corruption networks.
However, since our results are based on corruption scandals of two western countries, and despite the difficulties in finding information about corruption processes, future work should be, if at all possible, dedicated to other countries in order to further strengthen or limit the universalities that we report. Moreover, the lack of quantitative agreement between our model and some empirical properties of corruption networks suggests that factors other than recidivism may affect the structure of political corruption networks. These factors may include political polarization, demography, agent adaptation, and memory effects. There is thus certainly room for the development of other, likely more complex, network models to describe organized crime. Another limitation of our work concerns the information quality used to create corruption networks. Despite the best efforts to make these data reliable, as it happens with all data related to illegal activities, ours may suffer from two types of bias. First, being named in a corruption scandal does not guarantee that a particular person has been convicted of a crime or done anything illegal. Second, it is likely that some people involved in corruption scandals have not been identified during investigations. The compilation of data on corruption will always suffer, at least to a certain degree, from such limitations. Still, we have found that our empirical findings are very robust against randomly removing a fraction of scandals from our data set (see Figs. S6-S9), indicating that the general patterns of corruption processes uncovered by our work are not affected by such biases. Thus, and despite these limitations, we believe that our work contributes significantly to better understand organized crime as a complex networked system, and to identify the essential features of corruption networks that may lead to better criminal policies and more efficient law enforcement interventions.

Data availability
The datasets used during the current study are freely available as a supplementary file in Ref. 19 (the Brazilian corruption network) and can be download from the web page casos-aislados.com (the Spanish corruption network).