Maximal modularity and the optimal size of parliaments

An important question in representative democracies is how to determine the optimal parliament size of a given country. According to an old conjecture, known as the cubic root law, there is a fairly universal power-law relation, with an exponent equal to 1/3, between the size of an elected parliament and the country’s population. Empirical data in modern European countries support such universality but are consistent with a larger exponent. In this work, we analyse this intriguing regularity using tools from complex networks theory. We model the population of a democratic country as a random network, drawn from a growth model, where each node is assigned a constituency membership sampled from an available set of size D. We calculate analytically the modularity of the population and find that its functional relation with the number of constituencies is strongly non-monotonic, exhibiting a maximum that depends on the population size. The criterion of maximal modularity allows us to predict that the number of representatives should scale as a power-law in the size of the population, a finding that is qualitatively confirmed by the empirical analysis of real-world data.

In recent times, political scientists and technocrats have heavily relied on the so-called cubic root law (CRL) formulated by Taagepera and collaborators in [12][13][14][15] . This law follows the realisation that the size of most elected parliaments exhibits a strong statistical regularity with respect to their population. The proposed empirical model optimises the assemblies' representation based on the efficiency of communication between MPs and their constituents. According to Taagepera's arguments, the size of parliaments should follow S ∝ N γ 0 , where γ = 1/3 and N 0 an "effective" population size, rescaled by considering only the portion of active voters and, among them, the fraction of literate adults 12 . As literacy is believed to be strongly correlated with mobility, the latter rescaling was introduced to account for social mobility in the absence of a reliable direct measure for this parameter 16 . Figure 1 highlights the aforementioned regularity, but when considering the sizes of European lower chambers only, the best fitting curve deviates from the theoretical CRL, resulting in γ ≈ 0.44 . A more comprehensive analysis of parliaments' size data can be found in 11 . It is also worth mentioning that the CRL formulated in terms of an effective population was perceived to be in contrast with the spirit of any "good" representation model that should include the entire pool of constituents, regardless of age, political engagement, or education 6 . Moreover, Taagepera's derivation has been criticised in a recent paper 17 , whose formulation leads instead to a square-root law (SRL). The same SRL arises from a constitutional game-theoretical model put forward in 18 .
In this work, we tackle the democratic representation problem using network theory. Networks have been successfully employed in social science for over 30 years, for their versatility in describing different aspects of political, behavioural, and social interaction between individuals 19 . In social networks, nodes represent social agents (for example, individuals in a population) and links represent their interactions. The network structure contains important information about relational ties in a society and is likely influenced by agents' attributes (e.g. age, occupation, wealth ...) 20 . An important topological feature of real-world social networks is their scalefree degree distribution, i.e. the distribution of the number of ties following a power-law 21 . This topology can be reproduced in growing networks using a preferential attachment wiring protocol that was proposed in 22 .
The objective of this work is to shed light on the observed statistical regularities in the size of parliaments. We will be focusing on electoral systems in which representatives are elected according to an FPTP ("First-Past-The-Post") principle, i.e. whoever collects the majority of votes within a constituency gets elected. We expect that simple modifications of the model presented here could be devised to account for other scenarios. For example, the case where more than one representative is expressed by the same electoral pool, as in the case of the US Senate, could be accounted for by suitably increasing the number of representatives per constituency.
Taking the United Kingdom as an example, each Member of Parliament is elected to the House of Commons from one of the 650 constituencies. The nature and physical boundaries of the constituencies are regulated by the House of Commons (Redistribution of Seats) Act (1944), which prescribes that the MP's role is to "represent the common interest of the residents in a spatially bounded territory". Thus, when constituencies are designed, the legislator should aim to enclose within geographical boundaries areas that share common interests and values 23 .
The network model we propose is inspired by this design principle. We build a synthetic scale-free network in which N agents (nodes) represent the entire population of a country that has to be partitioned into constituencies, each electing their MP to the national Parliament. Two citizens are connected if there is a stable social interaction between them. Individuals are therefore arranged into social communities, as proposed, for instance, in 24 . The key result of our paper is a principle for determining the optimal number of constituencies, i.e. for grouping the population into electoral clusters that "best" represent the underlying community structure of the network. We remark that we need to strike a balance between representativity and homogeneity in constituency size. Hence, our approach needs to improve upon the standard "community detection" framework, which would allow the constituency size to fluctuate wildly. We achieve this result by constraining the size of the constituencies at the outset, and determining the number of constituencies that optimises the partitioning of the underlying network.
More specifically, we generate synthetic networks from a growth model with preferential attachment to nodes of higher degrees and within the same constituency. We introduce a mobility (or affinity) parameter into our model, of a similar nature to that proposed in 12 , which allows us to tune the probability that a node interacts with foreign constituencies. Note that in this approach, each node is assigned a constituency label a priori, and the network topology follows as a result of this assignment. This is in contrast to what usually happens in community detection, where node memberships are determined a posteriori, based on the network topology. In this regard, our approach is based on a generative model for block-structured networks rather than on a cluster detection model.
Working with synthetic networks relieves us from making assumptions over geographical constraints, as constituencies are purely virtual, i.e. designed around groups of people with stronger interpersonal ties. Although this modelling choice may need to be supplemented with more realistic assumptions, non-geographical electoral systems have been proposed in the past with strong supporting arguments in terms of representation of minorities and dispersed communities 25 . Furthermore, geographical constraints would strongly depend on the country at hand, whereas our model aims to be as general as possible.
We adopt the modularity as a metric to measure the goodness of these partitions, and we derive an exact expression for the average network modularity, in terms of the number D of constituencies for fixed network size N. By maximising the modularity, we are able to determine analytically the optimal number of equally sized constituencies into which networks generated according to our prescription should be partitioned. We show that our findings are robust against the tuning of resolution parameters, by also considering the generalised definition of modularity introduced in 27 . In our investigation, we find that the empirical regularities discussed above arise quite naturally from the topology of the clustered networks that we study here.
The manuscript is organised as follows: in "The model" section, we introduce the network growth model. In "Recursive equations" section, we derive and solve the recursive equation for the expected modularity at generic network size and number of constituencies. In "Approximate solution" section, we present a numerical solution for the maximum modularity as a function of the network size and we construct an approximate scheme to solve the problem analytically. In addition, we show that the size dependence of the maximum modularity is robust when using a generalised definition of modularity which allows to tune the resolution of communities. Finally, we present our main findings in the "Conclusion" and we compare them with empirical evidence. The technical details of our derivation are presented in the Appendix.
Our findings reveal that the optimal partitioning in constituencies for a given population is well approximated by a power-law S ≃ N γ , which is in qualitative agreement with the empirical data. Interestingly, we observe that the mobility does not play a significant role in determining the exponent γ , at least for the homogeneous mobility case studied in this work.

The model
We model social interactions within a population by means of simple, undirected networks, which are constructed using a block preferential attachment prescription, a modified version of the Barabási-Albert (BA) algorithm in which the target node's block membership influences the wiring probability 22,[28][29][30][31] .
The network is formed dynamically in such a way that at each time step N, a node N is created, with m stubs, and a constituency membership label σ N ∈ {1, . . . , D} is assigned, according to a prescribed sequential order such that σ N = mod(1 + N, D) (see Fig. 2 for an illustration). In this way, any two nodes i and i + D have the same membership and all the constituencies have roughly equal size (their sizes are either identical or differ by one unit). The sequential assignment is a modelling choice that greatly simplifies the analytical treatment presented in this section, however the final outcome does not heavily depend on the way the constituencies σ are assigned, provided that they are on average all equally sized. The network at time step N is represented by an N × N adjacency matrix A(N) , with entries A ij (N) for i, j ≤ N . We prescribe that the initial configuration of the network be a clique of m + 1 nodes. Accordingly, we set the initial time at m + 1 , so that the growth process starts at m + 2.
When a node N is added, each of its m stubs is wired to a random node i of the existing network sampled with probability (right) of the growth algorithm, respectively. Membership attributes 1, 2 and 3 represented by the colour blue, yellow and green respectively are allocated sequentially in such a way that , being the total number of links present in the network, k i (N) = N j=1 A i,j (N) the degree of node i, calculated at N > m , and p(σ i |σ N ) being the probability that any node with given constituency label σ N attaches to any of the nodes with constituency σ i . The denominator in (1) As the addition of new nodes cannot modify the links between pre-existing nodes, we have that A ij (N) is the same for any N ≥ max(i, j) , so from now on, we will drop the time index from the entries of the adjacency matrix.
The probability p(σ i |σ N ) can be parametrised by a mobility parameters µ that controls the likelihood to pick the target constituency, as Hence, for µ = 0 , the new node N will attach necessarily to a member of its own community whereas, for µ = 1 , the node N can attach to any community with the same probability 1/D. Thus, the probability that a new node attaches to a given node of a foreign constituency is µ/D . The contribution N−1 (1) ensures that new nodes attach preferentially to nodes with higher degree. Our attachment prescription realizes a power-law degree distribution with exponent = 3 , that is typical of social networks 22 and, although the attachment mechanism is extremely simple, it is rich enough for our purposes 32,33 . Using Eq. (1), we can write the probability for the entry A i,N , with i = 1, . . . , N − 1 , of the adjacency matrix, given its previous configuration A(N − 1) and the community membership sequence denoted by σ , as Assuming that each of the m stubs is wired independently to a randomly drawn node, the joint distribution for the N-th row and column is and, by iteration, one can get the full distribution for the configuration A(N) of the adjacency matrix with p(A(m + 1)) = m+1 i<j δ A i,j ,1 δ A i,j ,A j,i , determined by the initial configuration of the growth algorithm. For the sake of simplicity, we limit our analytical considerations to the case m = 1 . A different choice of the number of stubs m, or a different initial configuration of the growth model, do not significantly affect the degree distribution, in the large-N limit 34 . Our numerical explorations suggest that this also holds true for our key observable and results. Therefore, we leave the m > 1 case for future investigations.
The key observable we will monitor in our model is the modularity, introduced in 35 , as a quality factor for a partition of a network in communities. The modularity of a graph is defined as This quantity compares the intra-cluster edge density of a given network (in our case, the clusters are defined by the constituency membership attribute) with the edge density of a null model, i.e. a set of unbiased random graphs that are wired regardless of the community structure but with the same degree sequence as the original network 36 . This comparison mechanism provides a reliable metric to establish the goodness of a network clustering procedure. Moreover, the modularity takes values Q D (N) ∈ [−1, 1] , with positive values denoting that a graph exhibits a community structure being captured by their assigned memberships 37 .
We will use the modularity to assess the cluster structure induced by the sequence σ and the underlying social structure originated from the web of connections. We aim to find the number of constituencies that maximises this observable, resulting in the optimal partitioning of the synthetic population created by the growth algorithm. For a network of size N, we have that the expected modularity is given by where the expectation is over the distribution (5). At the (N + 1)-th step, one row and one column are added to the adjacency matrix as follows www.nature.com/scientificreports/ We note that the number of links is deterministic as at each time step m(= 1) links are added to the network. Therefore we argue that an expression for �Q D (N + 1)� can be found recursively, in particular by solving recursions for the coefficients a N and b N that we present in the following subsection.
Recursive equations. We now construct a recursion for the term a N . We note that the term a N+1 can be split into the contribution from the new row/column and the rest of the matrix as follows where the sum runs over all pre-existing nodes (up to N) belonging to the community σ N+1 . We recognise that the expectation value represents the probability that node N + 1 attaches to any node within its own community at step N. Under the assumption that any target constituency is chosen independently of the degrees of its members, and using Eq. (2), one can derive the expression provided in (9), as shown in more detail in the Appendix Using Eq. (9), the solution of the recursion (8) is found as A comparison between Eq. (10) and a numerical simulation is shown in Fig. 3. The simulation data in Fig. 3 was obtained by computing the observable a N = N r,s=1 A r,s δ σ r ,σ s on synthetic networks of different sizes, generated using the growth algorithm described in this section. For each size, results were averaged over 200 realisations of the network generative process.
We then consider the recursion for term b N . Following our definition in Eq. (7), Distinguishing the two cases   (11) as shown in the Appendix, while the expectation (ii) in Eq. (11) reads since constituency σ N+1 is populated by one node at time step N + 1 , which is however excluded from the sum.
Gathering all the terms, the recursive equation for b N is found to be with initial condition Solving the recursion in Eq. (18), for general N, is not an easy task. In the next Section, we provide an analytical expression for C N+1 (N) in the limit N ≫ D , which turns out to be a good approximation for the exact solution, even at small N. In Fig. 4 we plot the approximate solution for b N against numerical simulations of the growing process.

Approximate solution.
To make analytical progress, we assume that the edges are uniformly distributed between the D communities so that C N+1 (N) ≃ L(N)/D . This holds true in the limit N ≫ D , however, uniformity is not expected in the regime N ∼ D . As shown in Fig. 5, though, the exact expression for C N+1 (N) (red dots), defined in Eq. (14), is well approximated by L(N)/D (blue line) in the whole range. The exact calculation for C N+1 (N) in the regime N < D is carried out in the Appendix. Combining the results in the two regimes, the analytical bottleneck arising in Eq. (11) -(i) simplifies as follows (see Appendix, eq. (38)) Under this approximation, the recursion in Eq. (11), now simplifies to N r:σ r =σ N+1 k r (N + 1) = 0,    www.nature.com/scientificreports/ The numerical simulation in Fig. 7 shows a perfect agreement with the average modularity given by Eq. (23). We observe that the expected modularity is strongly non-monotonic in the number of constituencies. This is in agreement with the following observations about the limiting cases: for D = 1 , the two terms of the sum in Eq. (6) cancel out, and for D = N the concept of community is lost and the Kronecker delta is always zero, thus both cases result in Q D (N) = 0 . Note that in the intermediate regime 1 < D < N and provided that µ ∈ [0, 1) we have that the probability for a node to link to any of its fellow constituents is higher than the "rest of the population". This ensures that, on average, the modularity is positive. This non-monotonic behaviour was also observed empirically in 38 .
Values for µ ∈ [0, 1) are consistent with the constituencies design process according to which boundaries should be drawn around local communities. The regime µ ≥ 1 would result in an equal or lower intra-constituency edge density compared to the density of outgoing edges, suggesting that the imposed partitions would not capture the real community structure of the network and thus would not be interesting for our purpose. Moreover, µ = 1 is the physiological upper bound to ensure that probability (2) is non-negative.
Furthermore, we observe that the mobility parameter µ dampens the modularity without producing a pronounced shift of its maximum, as shown in Fig. 8. This effect is due to a tightening of the community structures within each constituency as the effect of a decreasing µ is to increase their average intra-cluster density and thus to increase the overall modularity. Conversely, when µ → 1 , nodes attach randomly to any constituency resulting in an average modularity Q D (N) that tends to zero.  www.nature.com/scientificreports/ Finally, we find an expression for the maximum value of the modularity. Indeed, it is our key objective to find an optimal way to partition our synthetic population into constituencies. We argue that this optimal way of partitioning is realised when the modularity reaches its maximum and the imposed partitions best capture the underlying community structure of the network. We derive then the location D * (N) = arg max D �Q D (N)� , in the regime N ≫ D , from Eq. (23) with ψ (1) (D) being the first order polygamma function, defined as the first derivative of the digamma function. This expression constitutes our main result, as it gives a recipe to pick the optimal number of constituencies, for a given population size N. The implicit Eq. (24) can be solved numerically for D * . Interestingly, we find that D * (N) has a clear power-law behaviour similar to the one observed in demographic data. Figure 9 shows D * (N) for small networks, the numerical data being fitted by a power-law D * = αN γ , in the interval N ∈ [100, 1000] , with exponent γ ≈ 0.53 and α ≈ 0.77 . In this range of N, the real value of D * leads to the same integer constituency size for any values of µ . In this sense, µ does not significantly affect the behaviour of D * . The value of the exponent can be also determined by the following analytical considerations in the more interesting large network limit. Expression (24) can be rewritten as follows with the coefficients α N = (N − 1)(µ − 1) and β N = N(N + µ(1 − N) − 2) . By setting ψ (1) (D * ) = T and extracting D * from Eq. (25) Now, since we are evaluating this quantities in the large network limit, we may use the polygamma asymptotic behaviour ψ (1) x in Eq. (26), and solve for T obtaining Inserting Eq. (27) in the original Eq. (25), we obtain the asymptotic optimal number of constituencies which is consistent with the numerical solution of the implicit Eq. (24) shown in Fig. 9. We conclude this section by noting that generalised definitions of modularity have been introduced in the literature 26,27,39 , which allow to control the resolution of communities by tuning a control parameter. The purpose of such generalizations is to overcome the so-called resolution limit of the original definition of modularity, that is known to prefer merging two well-defined clusters into larger ones, when these are smaller (in terms of we can repeat the analysis carried out earlier to obtain the value of D that maximises the generalised modularity. In the limit of large network size, this is given by the η-dependent value which correctly reproduces the result obtained in (28), for η = 1 . Values of η > 1 shift the maximum of the average modularity to values larger than D * = √ N , whereas for η < 1 the maximum is shifted to smaller values than D * , consistently with the interpretation of η as a fine-tuning parameter. Interestingly, (30) shows that there is a physical lower bound on the values that η is allowed to take, given by η ≥ µ . Notably, the introduction of the control parameter η does not affect the overarching behaviour in N of D * , as long as η = µ + O(N 0 ) , showing robustness of our results at different resolutions. While there is no general recipe to choose η a priori, the choice η = 1 , for which the modularity (29) weighs equally the network A under consideration and the null model, allows for a kinetic interpretation of the modularity as the autocovariance function of the cluster occupancy of a random walk on the network between two successive time steps 39 , and it has the additional advantage of making the optimal number of constituencies, as given in (30), independent of the mobility µ . This intriguing independence may constitute an interesting subject for further investigations, as well as the interplay between mobility and resolution parameters in generalised definitions of network modularity.

Conclusion
The problem of democratic representation is of primary importance for modern societies. In this work, we proposed a network model representing a growing population of final size N that has to be partitioned into D equally sized constituencies. The underlying network community structure can be tuned by the mobility parameter µ that controls the interaction probability between nodes belonging to different constituencies.
We adopted the average modularity as a measure for the goodness of the resulting partitioning and showed that it displays a strong non-monotonic behaviour, as a function of D, in the regime µ ∈ [0, 1) . By solving the recurrence equations for the modularity in the regime N ≫ D , we found an analytical expression for the optimal number of constituencies D * that maximises the modularity w.r.t. the number of induced partitions.
The approximate regime in which the problem is solved corresponds to one MP accounting for a large fraction of the population. This is arguably a reasonable assumption when considering democratically elected parliaments for which the condition above is always satisfied. Nevertheless, a numerical solution was also attainable for any value of N and D. Our main finding concerns the functional form for the optimal size of a Parliament D * ∼ N 1/2 that is found to be in reasonable agreement with what is observed in real-world data, thus providing further support for a revision of Tageepera's arguments in the direction of a "square root" law 17,18 . While a larger mobility parameter induces a more efficient mixing of the population and therefore reduces its average modularity for a fixed number of available constituencies (see Fig. 8), quite interestingly it does not influence the position of the maximum D * as a function of N to leading order. www.nature.com/scientificreports/ It is worth noting that, when considering a generalized definition of the modularity, which explicitly depends on a resolution parameter η , the position of the maximum retains the same dependence on the population size, i.e. D * ∼ N 1/2 f (µ, η) , but acquires a prefactor f (η, µ) that depends on the mobility µ for all values of the resolution parameter η except η = 1 , for which the original definition of the modularity is retrieved. Hence, the intriguing independence of the position of the maximum on the mobility emerges as a peculiarity of the modularity (in its original definition). Given the lack of a general criterion to fix η a priori, explorations of the interplay between η and parameters that control preferential attachment in growth models, such as µ in our model, can provide an interesting pathway for future work.
As a further pathway for future work one could consider introducing geographical constraints in the model and including a mobility parameter that depends on the population density of each constituency. This is expected to generate a richer behaviour for D * . Other, more complex, network topologies may also achieve a similar result. In future research, adopting different topologies, or studying small network scenarios, may broaden the applicability of our model to other absolute representation problems, from management (optimal proportion of managers vs. employees in a company) to determining the optimal number of parishes in a community.

Data availability
According to UK research councils′ Common Principles on Data Policy, all codes used to produce and analyse numerical data supporting this study will be available upon request.

Appendix
In this section, we discuss the recursion for a N and we present a more detailed version of the exact calculation for b N . a N : We consider the term where the factor 2 is due to the symmetry of the adjacency matrix. We analyse each average in Eq. (11), labelled (i) and (ii), separately: (i) Starting with the case N ≥ D , we have observed that, when m = 1 , the degree of a given node can only increase by one at each time step and only if the newly created node connects to it, i.e. k r (N + 1) = k r (N) + A r,N+1 . This leads to ≃ p(σ N+1 |σ N+1 ), www.nature.com/scientificreports/ The first expectation may now be rewritten in the following way The D − 1 constituencies such that σ r = σ N+1 are equivalent in the sum above as the probability of wiring N + 1 ∼ r is µ/D for all, thus, considering the contribution to Eq. (35) from one constituency σs � = σ N+1 we may rewrite Now performing the expectation value of s:σ s =σ N+1 A N+1,s , using Eq. (9), we get We are left with the term N r:σ r =σ N+1 k r (N) = C N+1 (N) to calculate, being the expected number of links in community σ N+1 at time N and its complementary r:σ r � =σ N+1 k r (N) =C N+1 (N) . When the first round of communities has been assigned, i.e. N ≥ D , the expression becomes where we have used C N+1 (N) + C N+1 (N) = 2(N − 1) . The expression for C N+1 (N) is found to be where mod( · , D) is the modulus operator with divisor D and ⌊·⌋ denotes the floor operator. This builds up by summing the contribution to the expected links in community σ N+1 from every node. For the case N ≥ D , a node j such that its membership σ j = σ N+1 always contributes with one link (its own stub) plus another link with probability 1 − µ D−1 D , in case the target node at the other end of the stub has the same membership σ N+1 . Conversely, if σ j = σ N+1 , the new node j can only add one link to constituency σ N+1 with probability µ/D . In the first round of membership assignment, a contribution to the number of links can come from nodes generated following the first node in σ N+1 , with a probability 1/j. The expression can be intuitively checked following Fig. 10. The remaining expectation of Eq. (34), is non-zero only if s = r , so      www.nature.com/scientificreports/ We finally note that, in the case N < D , the expectations in Eq. (35) s:σ s =σ N+1 A N+1,s = 0 and s:σ s � =σ N+1 A N+1,s = 1 . This is due to constituency σ N+1 not being populated before time N + 1 . As a result the average number of links in any given community is given by (ii) Now, this expectation value is similar to C N+1 (N) = N r:σ r =σ N+1 k r (N) . The difference lies in the time step at which this is being calculated. Thus it can be given a similar interpretation as the expected number of links in community N + 1 , at time step N + 1 , excluding the last node. This implies that if at time N + 1 the newly created node connects to one of its own community, the quantity in question increases by one, in a similar way to what is shown in Fig. 10. We note that, since the last node is excluded, its degree shall not be counted. So, the relation with C N+1 (N) can be made explicit as follows Back to Eq. (9), we finally get the recursive relation we are after for the coefficient b N ,  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.