Introduction

In modern times, representative democracies have played a leading role in the advancement of human rights, education, and technology on a global scale. At the heart of every representative democracy is a centralised parliament: an assembly of elected citizens who are delegated by their constituents to exercise the legislative power, and to keep the government in check1. This apparatus has an operating cost and, in the shadows of political scandals, economic crises, and social turmoil, people have questioned the effectiveness of their country’s costly political and administrative structure and have claimed that a reduction of the number of elected representatives would reduce deviant behaviours and enhance efficiency of parliamentary works2. However, there is so far no sound analytical framework to determine the optimal parliament size of a given country, so to ensure an adequate representation and cost-effectiveness, which are both in the public interest. In this paper, we argue that a principle of maximal modularity can provide some reliable guidance on how to determine the absolute number of representatives required for efficient public representation in a democratic country. This principle may therefore provide a transparent reference point to inform public policies.

Generally speaking, the ideal number of members of Parliament (MPs) has to strike a balance between efficiency, in terms of the share of power held by each MPs and their ability to realise their electoral agenda, and optimal representativity, i.e. the ability of the MPs to promote the instances of their voters, in proportion to their number. Both criteria are encoded in the assembly size, as a bigger chamber allows constituencies to be smaller and thus more homogeneous in terms of character, local economic activity, and social needs. On the other hand, it diminishes the influence and resources that each MP can count on to advance their agenda and thus promote their constituents’ interests3. The “efficiency” paradigm has been at the core of a flourishing line of research amongst political scientists and “electoral engineers”, since the late 1980s4. Researchers have revealed the effect of different electoral systems on the efficiency and stability of political architecture, in relation to the size of the corresponding assembly5. Representation of minority groups, gender quotas, ballot votes, and district sizes are believed to heavily influence the efficiency of a parliament and the relative voting power of political parties6,7. Another pressing issue that has been thoroughly studied concerns the distribution of relative weights of votes for delegates in international bodies or for parliaments in federal states such as the United States. The most famous approach is the one proposed by Webster and Sainte–Laguë independently, after which several other quotient rules were adopted8. Game theory approaches, such as the Penrose square root law9, were also proposed later on and are currently used for instance in the Council of the European Union, to implement a “one person, one vote” system10. An interesting empirical decision-making model linking participation in elections and electoral college size, at any level (local to national), was proposed in11.

Both the problems of efficiency and relative representativity have been investigated for a long time in the political science literature and share a common denominator: they depend—directly or indirectly—on the absolute chamber size in a way that is yet to be fully understood6.

Figure 1
figure 1

The figure shows a log-log scale plot of the size of lower chamber S vs. population N for some European countries and the EU Parliament. The best fit with a power-law (red line) shows that the size of the chamber grows as \(S\approx \alpha N^{\gamma }\), with \(\gamma \approx 0.44\) and \(\alpha \approx 0.17\). Demographic data from Eurostat (2017).

In recent times, political scientists and technocrats have heavily relied on the so-called cubic root law (CRL) formulated by Taagepera and collaborators in12,13,14,15. This law follows the realisation that the size of most elected parliaments exhibits a strong statistical regularity with respect to their population. The proposed empirical model optimises the assemblies’ representation based on the efficiency of communication between MPs and their constituents. According to Taagepera’s arguments, the size of parliaments should follow \(S\propto N_0^{\gamma }\), where \(\gamma = 1/3\) and \(N_0\) an “effective” population size, rescaled by considering only the portion of active voters and, among them, the fraction of literate adults12. As literacy is believed to be strongly correlated with mobility, the latter rescaling was introduced to account for social mobility in the absence of a reliable direct measure for this parameter16. Figure 1 highlights the aforementioned regularity, but when considering the sizes of European lower chambers only, the best fitting curve deviates from the theoretical CRL, resulting in \(\gamma \approx 0.44\). A more comprehensive analysis of parliaments’ size data can be found in11. It is also worth mentioning that the CRL formulated in terms of an effective population was perceived to be in contrast with the spirit of any “good” representation model that should include the entire pool of constituents, regardless of age, political engagement, or education6. Moreover, Taagepera's derivation has been criticised in a recent paper17, whose formulation leads instead to a square-root law (SRL). The same SRL arises from a constitutional game-theoretical model put forward in18.  

In this work, we tackle the democratic representation problem using network theory. Networks have been successfully employed in social science for over 30 years, for their versatility in describing different aspects of political, behavioural, and social interaction between individuals19. In social networks, nodes represent social agents (for example, individuals in a population) and links represent their interactions. The network structure contains important information about relational ties in a society and is likely influenced by agents’ attributes (e.g. age, occupation, wealth ...)20. An important topological feature of real-world social networks is their scale-free degree distribution, i.e. the distribution of the number of ties following a power-law21. This topology can be reproduced in growing networks using a preferential attachment wiring protocol that was proposed in22.

The objective of this work is to shed light on the observed statistical regularities in the size of parliaments. We will be focusing on electoral systems in which representatives are elected according to an FPTP (“First-Past-The-Post”) principle, i.e. whoever collects the majority of votes within a constituency gets elected. We expect that simple modifications of the model presented here could be devised to account for other scenarios. For example, the case where more than one representative is expressed by the same electoral pool, as in the case of the US Senate, could be accounted for by suitably increasing the number of representatives per constituency.

Taking the United Kingdom as an example, each Member of Parliament is elected to the House of Commons from one of the 650 constituencies. The nature and physical boundaries of the constituencies are regulated by the House of Commons (Redistribution of Seats) Act (1944), which prescribes that the MP’s role is to “represent the common interest of the residents in a spatially bounded territory”. Thus, when constituencies are designed, the legislator should aim to enclose within geographical boundaries areas that share common interests and values23.

The network model we propose is inspired by this design principle. We build a synthetic scale-free network in which N agents (nodes) represent the entire population of a country that has to be partitioned into constituencies, each electing their MP to the national Parliament. Two citizens are connected if there is a stable social interaction between them. Individuals are therefore arranged into social communities, as proposed, for instance, in24. The key result of our paper is a principle for determining the optimal number of constituencies, i.e. for grouping the population into electoral clusters that “best” represent the underlying community structure of the network. We remark that we need to strike a balance between representativity and homogeneity in constituency size. Hence, our approach needs to improve upon the standard “community detection” framework, which would allow the constituency size to fluctuate wildly. We achieve this result by constraining the size of the constituencies at the outset, and determining the number of constituencies that optimises the partitioning of the underlying network.

More specifically, we generate synthetic networks from a growth model with preferential attachment to nodes of higher degrees and within the same constituency. We introduce a mobility (or affinity) parameter into our model, of a similar nature to that proposed in12, which allows us to tune the probability that a node interacts with foreign constituencies. Note that in this approach, each node is assigned a constituency label a priori, and the network topology follows as a result of this assignment. This is in contrast to what usually happens in community detection, where node memberships are determined a posteriori, based on the network topology. In this regard, our approach is based on a generative model for block-structured networks rather than on a cluster detection model.

Working with synthetic networks relieves us from making assumptions over geographical constraints, as constituencies are purely virtual, i.e. designed around groups of people with stronger interpersonal ties. Although this modelling choice may need to be supplemented with more realistic assumptions, non-geographical electoral systems have been proposed in the past with strong supporting arguments in terms of representation of minorities and dispersed communities25. Furthermore, geographical constraints would strongly depend on the country at hand, whereas our model aims to be as general as possible.

We adopt the modularity as a metric to measure the goodness of these partitions, and we derive an exact expression for the average network modularity, in terms of the number D of constituencies for fixed network size N. By maximising the modularity, we are able to determine analytically the optimal number of equally sized constituencies into which networks generated according to our prescription should be partitioned. We show that our findings are robust against the tuning of resolution parameters, by also considering the generalised definition of modularity introduced in27. In our investigation, we find that the empirical regularities discussed above arise quite naturally from the topology of the clustered networks that we study here.

The manuscript is organised as follows: in “The model” section, we introduce the network growth model. In “Recursive equations” section, we derive and solve the recursive equation for the expected modularity at generic network size and number of constituencies. In “Approximate solution” section, we present a numerical solution for the maximum modularity as a function of the network size and we construct an approximate scheme to solve the problem analytically. In addition, we show that the size dependence of the maximum modularity is robust when using a generalised definition of modularity which allows to tune the resolution of communities. Finally, we present our main findings in the “Conclusion” and we compare them with empirical evidence. The technical details of our derivation are presented in the Appendix.

Our findings reveal that the optimal partitioning in constituencies for a given population is well approximated by a power-law \(S\simeq N^{\gamma }\), which is in qualitative agreement with the empirical data. Interestingly, we observe that the mobility does not play a significant role in determining the exponent \(\gamma\), at least for the homogeneous mobility case studied in this work.

The model

We model social interactions within a population by means of simple, undirected networks, which are constructed using a block preferential attachment prescription, a modified version of the Barabási–Albert (BA) algorithm in which the target node’s block membership influences the wiring probability22,28,29,30,31.

Figure 2
figure 2

Sketch of a \(D=3\) constituencies network, for the case \(m=1\), at the steps \(N=8\) (left) and \(N=9\) (right) of the growth algorithm, respectively. Membership attributes 1, 2 and 3 represented by the colour blue, yellow and green respectively are allocated sequentially in such a way that \(\sigma _i=\mod (1+i, 3)\), hence \(\sigma _1=1\), \(\sigma _2=2\), \(\sigma _3=3\), \(\sigma _4=1\) and so on. The dashed line represents the new connection made at step \(N=9\).

The network is formed dynamically in such a way that at each time step N, a node N is created, with m stubs, and a constituency membership label \(\sigma _N\in \{1,\ldots ,D\}\) is assigned, according to a prescribed sequential order such that \(\sigma _N={\text {mod}}(1+N, D)\) (see Fig. 2 for an illustration). In this way, any two nodes i and \(i+D\) have the same membership and all the constituencies have roughly equal size (their sizes are either identical or differ by one unit). The sequential assignment is a modelling choice that greatly simplifies the analytical treatment presented in this section, however the final outcome does not heavily depend on the way the constituencies \(\sigma\) are assigned, provided that they are on average all equally sized. The network at time step N is represented by an \(N\times N\) adjacency matrix \({{\varvec{A}}}(N)\), with entries \(A_{ij}(N)\) for \(i,j\le N\). We prescribe that the initial configuration of the network be a clique of \(m+1\) nodes. Accordingly, we set the initial time at \(m+1\), so that the growth process starts at \(m+2\).

When a node N is added, each of its m stubs is wired to a random node i of the existing network sampled with probability

$$\begin{aligned} p_{iN}=m\frac{\sum _{\ell =1}^{N-1} A_{i,\ell }(N-1)}{L(N-1)/D}p(\sigma _i|\sigma _N), \end{aligned}$$
(1)

with \(L(N) = \sum _{i=1}^N k_i(N) = m(2N -m- 1)\), being the total number of links present in the network, \(k_i(N) = \sum _{j=1}^N A_{i,j}(N)\) the degree of node i, calculated at \(N>m\), and \(p(\sigma _i|\sigma _N)\) being the probability that any node with given constituency label \(\sigma _{N}\) attaches to any of the nodes with constituency \(\sigma _i\). The denominator in (1) ensures normalization of \(p_{iN}\), such that \(\sum _{i=1}^N p_{iN}=m\), where we have used \(\sum _i k_i(N-1)\delta _{\sigma ,\sigma _i}=L(N-1)/D ~\forall ~\sigma\). As the addition of new nodes cannot modify the links between pre-existing nodes, we have that \(A_{ij}(N)\) is the same for any \(N\ge \max (i,j)\), so from now on, we will drop the time index from the entries of the adjacency matrix.

The probability \(p(\sigma _i|\sigma _N)\) can be parametrised by a mobility parameters \(\mu\) that controls the likelihood to pick the target constituency, as

$$\begin{aligned} p(\sigma _i|\sigma _N) = \frac{\mu }{D} + (1-\mu )\delta _{\sigma _i,\sigma _N}, \end{aligned}$$
(2)

which is normalised \(\sum _{\sigma _i=1}^D p(\sigma _i|\sigma _N) =1\), as it should.

Hence, for \(\mu = 0\), the new node N will attach necessarily to a member of its own community whereas, for \(\mu =1\), the node N can attach to any community with the same probability 1/D. Thus, the probability that a new node attaches to a given node of a foreign constituency is \(\mu /D\). The contribution \(\sum _{\ell =1}^{N-1} A_{i,\ell } /L (N-1)\) in the definition (1) ensures that new nodes attach preferentially to nodes with higher degree. Our attachment prescription realizes a power-law degree distribution with exponent \(=3\), that is typical of social networks22 and, although the attachment mechanism is extremely simple, it is rich enough for our purposes32,33. Using Eq. (1), we can write the probability for the entry \(A_{i,N}\), with \(i = 1, \ldots ,N-1\), of the adjacency matrix, given its previous configuration \({{\varvec{A}}}(N-1)\) and the community membership sequence denoted by \({{\varvec{\sigma }}}\), as

$$\begin{aligned} p(A_{i,N}|{\mathbf {A}}(N-1),{\varvec{\sigma }}(N)) = [p_{iN}\delta _{A_{i,N},1} + (1-p_{iN}) \delta _{A_{i,N},0} ]\delta _{A_{i,N},A_{N,i}}\delta _{k_{N}(N),m}. \end{aligned}$$
(3)

Assuming that each of the m stubs is wired independently to a randomly drawn node, the joint distribution for the N-th row and column is

$$\begin{aligned} p(A_{1,N}\ldots A_{N-1,N} |{\mathbf {A}}(N-1),{\varvec{\sigma }}(N)) = \prod _{i=1}^{N-1} p(A_{i,N}|{\mathbf {A}}(N-1),{\varvec{\sigma }}(N)), \end{aligned}$$
(4)

and, by iteration, one can get the full distribution for the configuration \({{\varvec{A}}}(N)\) of the adjacency matrix

$$\begin{aligned} p({{\varvec{A}}}(N) |{\varvec{\sigma }}(N)) = \prod _{i=1}^{N-1} p(A_{i,N}|{\mathbf {A}}(N-1),{\varvec{\sigma }}(N)) \prod _{i=1}^{N-2} p(A_{i,N-1}|{\mathbf {A}}(N-2),{\varvec{\sigma }}(N-1)) \ldots p({{\varvec{A}}}(m+1)), \end{aligned}$$
(5)

with \(p({{\varvec{A}}}(m+1))=\prod _{i<j}^{m+1} \delta _{A_{i,j},1}\delta _{A_{i,j},A_{j,i}}\), determined by the initial configuration of the growth algorithm.

For the sake of simplicity, we limit our analytical considerations to the case \(m=1\). A different choice of the number of stubs m, or a different initial configuration of the growth model, do not significantly affect the degree distribution, in the large-N limit34. Our numerical explorations suggest that this also holds true for our key observable and results. Therefore, we leave the \(m>1\) case for future investigations.

The key observable we will monitor in our model is the modularity, introduced in35, as a quality factor for a partition of a network in communities. The modularity of a graph is defined as

$$\begin{aligned} Q_D(N)&= \frac{1}{L(N)} \sum _{s,r}^N \left[ A_{r,s} - \frac{k_r(N)k_s(N)}{L(N)}\right] \delta _{\sigma _r,\sigma _s}. \end{aligned}$$
(6)

This quantity compares the intra-cluster edge density of a given network (in our case, the clusters are defined by the constituency membership attribute) with the edge density of a null model, i.e. a set of unbiased random graphs that are wired regardless of the community structure but with the same degree sequence as the original network36. This comparison mechanism provides a reliable metric to establish the goodness of a network clustering procedure. Moreover, the modularity takes values \(Q_D(N)\in [-1,1]\), with positive values denoting that a graph exhibits a community structure being captured by their assigned memberships37.

We will use the modularity to assess the cluster structure induced by the sequence \({{\varvec{\sigma }}}\) and the underlying social structure originated from the web of connections. We aim to find the number of constituencies that maximises this observable, resulting in the optimal partitioning of the synthetic population created by the growth algorithm. For a network of size N, we have that the expected modularity is given by

$$\begin{aligned} \left\langle Q_D(N)\right\rangle = \frac{1}{L(N)} \left\langle \sum _{s,r}^N \left[ A_{r,s} - \frac{k_r(N)k_s(N)}{L(N)}\right] \delta _{\sigma _r,\sigma _s}\right\rangle =: \frac{1}{L(N)} a_N - \frac{1}{L(N)^2} b_N, \end{aligned}$$
(7)

where the expectation is over the distribution (5). At the \((N+1)\)-th step, one row and one column are added to the adjacency matrix as follows

$$\begin{aligned} {{\varvec{A}}}(N+1) = \left[ \begin{array}{ccccc|c} &{}&{}&{}&{}&{}0\\ &{}&{}&{}&{}&{}0\\ &{}&{}{{\varvec{A}}}(N)&{}&{}&{}\vdots \\ &{}&{}&{}&{}&{}1\\ &{}&{}&{}&{}&{}0\\ \hline 0&{}0&{}\cdots &{}1&{}0&{}0\\ \end{array}\right] . \end{aligned}$$

We note that the number of links is deterministic as at each time step \(m(=1)\) links are added to the network. Therefore we argue that an expression for \(\left\langle Q_D (N+1) \right\rangle\) can be found recursively, in particular by solving recursions for the coefficients \(a_N\) and \(b_N\) that we present in the following subsection.

Recursive equations

We now construct a recursion for the term \(a_N\). We note that the term \(a_{N+1}\) can be split into the contribution from the new row/column and the rest of the matrix as follows

$$\begin{aligned} a_{N+1} = a_N + 2 \left\langle \sum _{r:\sigma _r=\sigma _{N+1}}^N A_{r,N+1}\right\rangle , \end{aligned}$$
(8)

where the sum runs over all pre-existing nodes (up to N) belonging to the community \(\sigma _{N+1}\). We recognise that the expectation value represents the probability that node \(N+1\) attaches to any node within its own community at step N. Under the assumption that any target constituency is chosen independently of the degrees of its members, and using Eq. (2), one can derive the expression provided in (9), as shown in more detail in the Appendix

$$\begin{aligned} \left\langle \sum _{r:\sigma _r=\sigma _{N+1}}^N A_{r,N+1}\right\rangle = {\left\{ \begin{array}{ll} 0 &{}\quad {\text {when }}N< D\\ 1 - \mu \frac{D-1}{D} &{}\quad {\text {when }}N\ge D. \end{array}\right. } \end{aligned}$$
(9)

Using Eq. (9), the solution of the recursion (8) is found as

$$\begin{aligned} a_{N} = {\left\{ \begin{array}{ll} 0 \quad &{}\quad {\text {when }}N\le D\\ 2(N-D) \left( 1 - \mu \frac{D-1}{D}\right) \quad &{}\quad {\text {when }}N>D \ . \end{array}\right. } \end{aligned}$$
(10)

A comparison between Eq. (10) and a numerical simulation is shown in Fig. 3. The simulation data in Fig. 3 was obtained by computing the observable \(a_N=\sum _{r,s=1}^N A_{r,s}\delta _{\sigma _r,\sigma _s}\) on synthetic networks of different sizes, generated using the growth algorithm described in this section. For each size, results were averaged over 200 realisations of the network generative process.

Figure 3
figure 3

Plots of \(a_N\) as a function of the network size N, with parameters \(\mu =0.5\) and \(D=3\). The simulation data were obtained averaging over 200 realisations of the network generative process. Error bars, red vertical segments, are calculated as standard deviation from the mean.

We then consider the recursion for term \(b_N\). Following our definition in Eq. (7),

$$\begin{aligned} b_{N+1}= & {} \underline{ \left\langle \sum _{s,r}^{N} k_r(N+1)k_s(N+1)\delta _{\sigma _r,\sigma _s}\right\rangle }_{(i)} +2 \underline{\left\langle \sum _{r}^{N} k_r(N+1)\delta _{\sigma _r,\sigma _{N+1}}\right\rangle }_{(ii)}+1, \end{aligned}$$
(11)

where we used \(k_{N+1}(N+1)=m=1\). Distinguishing the two cases

  • case \({\mathbf {N}}\ge {\mathbf {D}}\): the expectation (i) in Eq. (11) yields

    $$\begin{aligned} \left\langle \sum _{s,r}^{N} k_r(N+1)k_s(N+1)\delta _{\sigma _r,\sigma _s}\right\rangle = b_N+ 4\frac{\mu }{D}(N-1) + 2\left( 1-\mu \right) C_{N+1}(N)+1, \end{aligned}$$
    (12)

    as discussed in the Appendix, and for the expectation (ii) in Eq. (11) we get

    $$\begin{aligned} \left\langle \sum _{r:\sigma _r=\sigma _{N+1}}^{N} k_r(N+1) \right\rangle = C_{N+1}(N) + 1-\mu \frac{D-1}{D}, \end{aligned}$$
    (13)

    also in the Appendix, where

    $$\begin{aligned} C_{N+1}(N) = \left\langle \sum _{r:\sigma _r=\sigma _{N+1}} k_r(N)\right\rangle \end{aligned}$$
    (14)

    represents the average number of intra-cluster connections for constituency \(\sigma _{N+1}\) at time N. When evaluating the expectation in Eq. (14) one gets

    $$\begin{aligned} C_{N+1}(N)&= \left[ \sum _{x={\text {mod}}(N+1,D)+1}^D \frac{1}{x-1} + \frac{\mu }{D} ({\text {mod}}(N+1,D)-1) \right] (1-\delta _{{\text {mod}}(N+1,D),0}) + \nonumber \\&\quad +\mu \frac{D-1}{D}\delta _{{\text {mod}}(N+1,D),0} + 2 \left\lfloor \frac{N}{D}\right\rfloor -1, \end{aligned}$$
    (15)

    where \({\text {mod}}(\ \cdot \ ,D)\) is the modulus operator with divisor D and \(\left\lfloor \cdot \right\rfloor\) denotes the floor operation.

  • case \({\mathbf {N}}<{\mathbf {D}}\): this case is characterised by only N constituencies being yet populated and a uniform probability of wiring, leading to the following expectation for (i) in Eq. (11)

    $$\begin{aligned} \left\langle \sum _{s,r}^{N} k_r(N+1)k_s(N+1)\delta _{\sigma _r,\sigma _s}\right\rangle = b_N+ 4\frac{N-1}{N} +1, \end{aligned}$$
    (16)

    as shown in the Appendix, while the expectation (ii) in Eq. (11) reads

    $$\begin{aligned} \left\langle \sum _{r:\sigma _r=\sigma _{N+1}}^{N} k_r(N+1) \right\rangle = 0, \end{aligned}$$
    (17)

    since constituency \(\sigma _{N+1}\)is populated by one node at time step \(N+1\), which is however excluded from the sum.

Figure 4
figure 4

\(b_N\) as a function of the network size N for the parameters \(D=3\) and \(\mu =0.5\). The simulation data were obtained averaging over 200 realisations of the network generative process. Error bars, red vertical segments, are calculated as standard deviation from the mean. Note that error bars are smaller than symbols for most values of N.

Gathering all the terms, the recursive equation for \(b_N\) is found to be

$$\begin{aligned} b_{N+1} = {\left\{ \begin{array}{ll} b_{N} + 4 \frac{N-1}{N} +2 &{}\quad {\text {when }} 2< N<D\\ b_{N} + 4 \mu \frac{N-1}{D} +2(2-\mu ) C_{N+1}(N)+2\left( 2-\mu \frac{D-1}{D}\right) &{}\quad {\text {when }}N\ge D, \end{array}\right. } \end{aligned}$$
(18)

with initial condition

$$\begin{aligned} b_2 = 2 +2\delta _{D,1}. \end{aligned}$$
(19)

Solving the recursion in Eq. (18), for general N, is not an easy task. In the next Section, we provide an analytical expression for \(C_{N+1}(N)\) in the limit \(N\gg D\), which turns out to be a good approximation for the exact solution, even at small N. In Fig. 4 we plot the approximate solution for \(b_N\) against numerical simulations of the growing process.

Approximate solution

To make analytical progress, we assume that the edges are uniformly distributed between the D communities so that \(C_{N+1}(N)\simeq L(N)/D\). This holds true in the limit \(N\gg D\), however, uniformity is not expected in the regime \(N\sim D\). As shown in Fig. 5, though, the exact expression for \(C_{N+1}(N)\) (red dots), defined in Eq. (14), is  well approximated by L(N)/D (blue line) in the whole range. The exact calculation for \(C_{N+1}(N)\) in the regime \(N<D\) is carried out in the Appendix. Combining the results in the two regimes, the analytical bottleneck arising in Eq. (11) - (i) simplifies as follows (see Appendix, eq. (38))

$$\begin{aligned} \left\langle \sum _{s,r}^{N} k_r(N)A_{N+1 s} \delta _{\sigma _r,\sigma _s} \right\rangle = {\left\{ \begin{array}{ll} 2\frac{(N-1)}{N} &{}\quad {\text {when }}N< D\\ 2\frac{(N-1)}{D} &{}\quad {\text {when }}N\gg D. \end{array}\right. } \end{aligned}$$
(20)

Under this approximation, the recursion in Eq. (11), now simplifies to

$$\begin{aligned} b_{N+1} \simeq {\left\{ \begin{array}{ll} b_{N} + 4 \frac{N-1}{N} +2 &{}\quad {\text {when }} 2< N<D\\ b_{N} + 4 \frac{2N-1}{D} +2 &{}\quad {\text {when }}N\ge D, \end{array}\right. } \end{aligned}$$
(21)

with the initial condition (19). The solution is given by

$$\begin{aligned} b_N = {\left\{ \begin{array}{ll} -2(3+2\gamma -3N+2\psi ^{(0)}(N)) &{}\quad {\text {when }}N< D\\ \frac{2}{D}( D -4N +DN + 2N^2 - 2D(\gamma +\psi ^{(0)}(D)) &{}\quad {\text {when }}N\ge D, \end{array}\right. } \end{aligned}$$
(22)

with \(\gamma\) being the Euler–Mascheroni constant and \(\psi ^{(0)}(x)\) the digamma function, arising from summing the first inverse integers series \(\sum _{k=1}^{x-1}\frac{1}{k}\). This approximate solution is in very good agreement with numerical simulations, as shown in Fig. 6.

Figure 5
figure 5

Mean-field approximation of \(C_{N+1}(N)\) plotted against numerical simulations for \(D=3\) and \(\mu =0.5\). The simulation data were obtained averaging over 200 realisations of the same experiment.

Figure 6
figure 6

Approximate solution for \(b_N\) plotted against numerical simulations of the generative process for \(D=3\) and \(\mu =0.5\). The simulation data were obtained averaging over 200 realisations of the same experiment.

Inserting in Eq. (7) the expressions for \(a_N\) and \(b_N\), provided in Eq. (10) and in Eq. (22) respectively, we obtain the solution for \(\left\langle Q_D(N)\right\rangle\) in the regime \(N\ge D\)

$$\begin{aligned} \left\langle Q_D(N)\right\rangle = \frac{N-D}{N-1} \left( 1 - \mu \frac{D-1}{D}\right) -\frac{1}{2D(N-1)^2} (D -4N +DN+ 2N^2- 2D(\gamma +\psi ^{(0)}(D))). \end{aligned}$$
(23)

The numerical simulation in Fig. 7 shows a perfect agreement with the average modularity given by Eq. (23). We observe that the expected modularity is strongly non-monotonic in the number of constituencies. This is in agreement with the following observations about the limiting cases: for \(D=1\), the two terms of the sum in Eq. (6) cancel out, and for \(D=N\) the concept of community is lost and the Kronecker delta is always zero, thus both cases result in \(Q_D(N)=0\). Note that in the intermediate regime \(1<D<N\) and provided that \(\mu \in [0,1)\) we have that the probability for a node to link to any of its fellow constituents is higher than the “rest of the population”. This ensures that, on average, the modularity is positive. This non-monotonic behaviour was also observed empirically in38.

Values for \(\mu \in [0,1)\) are consistent with the constituencies design process according to which boundaries should be drawn around local communities. The regime \(\mu \ge 1\) would result in an equal or lower intra-constituency edge density compared to the density of outgoing edges, suggesting that the imposed partitions would not capture the real community structure of the network and thus would not be interesting for our purpose. Moreover, \(\mu =1\) is the physiological upper bound to ensure that probability (2) is non-negative.

Furthermore, we observe that the mobility parameter \(\mu\) dampens the modularity without producing a pronounced shift of its maximum, as shown in Fig. 8. This effect is due to a tightening of the community structures within each constituency as the effect of a decreasing \(\mu\) is to increase their average intra-cluster density and thus to increase the overall modularity. Conversely, when \(\mu \rightarrow 1\), nodes attach randomly to any constituency resulting in an average modularity \(Q_D(N)\) that tends to zero.

Figure 7
figure 7

Expression for the average modularity \(\left\langle Q_D(N)\right\rangle\) in Eq. (23) as a function of the number of constituencies D, with parameters \(\mu =0.2\) and \(N=50\). The simulation data was obtained averaging over 350 realisations of the same experiment. Error bars, red vertical segments, are calculated as standard deviation from the mean.

Figure 8
figure 8

The plot shows the effect of the mobility parameter \(\mu\) on the modularity \(\left\langle Q_D(N)\right\rangle\), with \(N=50\). The simulation data were obtained averaging over 200 realisations of the same experiment for \(\mu =\{0.4, 0.6, 0.8\}\) and 350 realisations for \(\mu =0.2\). Error bars, vertical segments, are calculated as standard deviation from the mean.

Finally, we find an expression for the maximum value of the modularity. Indeed, it is our key objective to find an optimal way to partition our synthetic population into constituencies. We argue that this optimal way of partitioning is realised when the modularity reaches its maximum and the imposed partitions best capture the underlying community structure of the network. We derive then the location \(D^*(N)=\arg \max _D \left\langle Q_D(N)\right\rangle\), in the regime \(N\gg D\), from Eq. (23)

$$\begin{aligned} \frac{\partial \left\langle Q_D(N)\right\rangle }{\partial D} = \frac{1}{(N-1)^2}[ (N-1)(\mu -1) +\frac{1}{D^{*2}}N(N+\mu (1-N) -2) + \psi ^{(1)}(D^*) ]=0, \end{aligned}$$
(24)

with \(\psi ^{(1)}(D)\) being the first order polygamma function, defined as the first derivative of the digamma function. This expression constitutes our main result, as it gives a recipe to pick the optimal number of constituencies, for a given population size N. The implicit Eq. (24) can be solved numerically for \(D^*\). Interestingly, we find that \(D^*(N)\) has a clear power-law behaviour similar to the one observed in demographic data. Figure 9  shows \(D^*(N)\) for small networks, the numerical data being fitted by a power-law \(D^*=\alpha N^\gamma\), in the interval \(N\in [100,1000]\), with exponent \(\gamma \approx 0.53\) and \(\alpha \approx 0.77\). In this range of N, the real value of \(D^*\) leads to the same integer constituency size for any values of \(\mu\). In this sense, \(\mu\) does not significantly affect the behaviour of \(D^*\).  The value of the exponent can be also determined by the following analytical considerations in the more interesting large network limit. Expression (24) can be rewritten as follows

$$\begin{aligned} \alpha _N + \frac{1}{D^{*2}}\beta _N + \psi ^{(1)}(D^*) = 0, \end{aligned}$$
(25)

with the coefficients \(\alpha _N = (N-1)(\mu -1)\) and \(\beta _N = N(N+\mu (1-N)-2)\). By setting \(\psi ^{(1)}(D^*) = T\) and extracting \(D^*\) from Eq. (25)

$$\begin{aligned} T = \psi ^{(1)}\left( \sqrt{\frac{-\beta _N}{\alpha _N + T}}\right) . \end{aligned}$$
(26)

Now, since we are evaluating this quantities in the large network limit, we may use the polygamma asymptotic behaviour \(\psi ^{(1)}(x)\sim \frac{1}{x}\) in Eq. (26), and solve for T obtaining

$$\begin{aligned} T \approx \frac{1}{2}\left( -\frac{1}{\beta _N} + \frac{\sqrt{1-4\alpha _N\beta _N}}{\beta _N}\right) . \end{aligned}$$
(27)

Inserting Eq. (27) in the original Eq. (25), we obtain the asymptotic optimal number of constituencies

$$\begin{aligned} D^* \approx \sqrt{\frac{-\beta _N}{\alpha _N + \frac{1}{2}\left( -\frac{1}{\beta _N} + \frac{\sqrt{1-4\alpha _N\beta _N}}{\beta _N}\right) }} \sim \sqrt{N}, \end{aligned}$$
(28)

which is consistent with the numerical solution of the implicit Eq. (24) shown in Fig. 9.

We conclude this section by noting that generalised definitions of modularity have been introduced in the literature26,27,39, which allow to control the resolution of communities by tuning a control parameter. The purpose of such generalizations is to overcome the so-called resolution limit of the original definition of modularity, that is known to prefer merging two well-defined clusters into larger ones, when these are smaller (in terms of internal links) than a certain threshold. Denoting with \(\eta\) the resolution parameter and considering the generalised modularity defined in27 (other definitions have been shown to be equivalent39)

$$\begin{aligned} Q_D(N)= \frac{1}{L(N)} \sum _{s,r}^N \left[ A_{r,s} -\eta \frac{k_r(N)k_s(N)}{L(N)}\right] \delta _{\sigma _r,\sigma _s}, \end{aligned}$$
(29)

we can repeat the analysis carried out earlier to obtain the value of D that maximises the generalised modularity. In the limit of large network size, this is given by the \(\eta\)-dependent value

$$\begin{aligned} D^*(\eta )\sim \sqrt{N}\sqrt{\frac{\eta - \mu }{1 - \mu }}, \end{aligned}$$
(30)

which correctly reproduces the result obtained in (28), for \(\eta =1\). Values of \(\eta >1\) shift the maximum of the average modularity to values larger than \(D^*=\sqrt{N}\), whereas for \(\eta <1\) the maximum is shifted to smaller values than \(D^*\), consistently with the interpretation of \(\eta\) as a fine-tuning parameter. Interestingly, (30) shows that there is a physical lower bound on the values that \(\eta\) is allowed to take, given by \(\eta \ge \mu\). Notably, the introduction of the control parameter \(\eta\) does not affect the overarching behaviour in N of \(D^*\), as long as \(\eta =\mu +{\mathcal {O}}(N^0)\), showing robustness of our results at different resolutions. While there is no general recipe to choose \(\eta\) a priori, the choice \(\eta =1\), for which the modularity (29) weighs equally the network \({\mathbf{A}}\) under consideration and the null model, allows for a kinetic interpretation of the modularity as the autocovariance function of the cluster occupancy of a random walk on the network between two successive time steps39, and it has the additional advantage of making the optimal number of constituencies, as given in (30), independent of the mobility \(\mu\). This intriguing independence may constitute an interesting subject for further investigations, as well as the interplay between mobility and resolution parameters in generalised definitions of network modularity.

Figure 9
figure 9

The log-log plot shows the maximum modularity \(D^* = \alpha N^\gamma\), in the approximate regime, for \(\mu =0.9\) and \(\mu =0.1\) compared with \(D^* = N^{1/2}\). A fit for the curve \(\mu =0.9\) results in \(\alpha \approx 0.77\) and \(\gamma \approx 0.53\). The fit was performed in the interval \(N\in [100,1000]\).

Conclusion

The problem of democratic representation is of primary importance for modern societies. In this work, we proposed a network model representing a growing population of final size N that has to be partitioned into D equally sized constituencies. The underlying network community structure can be tuned by the mobility parameter \(\mu\) that controls the interaction probability between nodes belonging to different constituencies.

We adopted the average modularity as a measure for the goodness of the resulting partitioning and showed that it displays a strong non-monotonic behaviour, as a function of D, in the regime \(\mu \in [0,1)\). By solving the recurrence equations for the modularity in the regime \(N\gg D\), we found an analytical expression for the optimal number of constituencies \(D^*\) that maximises the modularity w.r.t. the number of induced partitions.

The approximate regime in which the problem is solved corresponds to one MP accounting for a large fraction of the population. This is arguably a reasonable assumption when considering democratically elected parliaments for which the condition above is always satisfied. Nevertheless, a numerical solution was also attainable for any value of N and D. Our main finding concerns the functional form for the optimal size of a Parliament \(D^*\sim N^{1/2}\) that is found to be in reasonable agreement with what is observed in real-world data, thus providing further support for a revision of Tageepera's arguments in the direction of a "square root" law17,18. While a larger mobility parameter induces a more efficient mixing of the population and therefore reduces its average modularity for a fixed number of available constituencies (see Fig. 8), quite interestingly it does not influence the position of the maximum \(D^*\) as a function of N to leading order.

It is worth noting that, when considering a generalized definition of the modularity, which explicitly depends on a resolution parameter \(\eta\), the position of the maximum retains the same dependence on the population size, i.e. \(D^*\sim N^{1/2} f(\mu ,\eta )\), but acquires a prefactor \(f(\eta ,\mu )\) that depends on the mobility \(\mu\) for all values of the resolution parameter \(\eta\) except \(\eta =1\), for which the original definition of the modularity is retrieved. Hence, the intriguing independence of the position of the maximum on the mobility emerges as a peculiarity of the modularity (in its original definition). Given the lack of a general criterion to fix \(\eta\) a priori, explorations of the interplay between \(\eta\) and parameters that control preferential attachment in growth models, such as \(\mu\) in our model, can provide an interesting pathway for future work.

As a further pathway for future work one could consider introducing geographical constraints in the model and including a mobility parameter that depends on the population density of each constituency. This is expected to generate a richer behaviour for \(D^*\). Other, more complex, network topologies may also achieve a similar result. In future research, adopting different topologies, or studying small network scenarios, may broaden the applicability of our model to other absolute representation problems, from management (optimal proportion of managers vs. employees in a company) to determining the optimal number of parishes in a community.