Statistical Thermodynamics of Irreversible Aggregation: The Sol-Gel Transition

Binary aggregation is known to lead, under certain kinetic rules, to the coexistence of two populations, one consisting of finite-size clusters (sol), and one that contains a single cluster that carries a finite fraction of the total mass (giant component or gel). The sol-gel transition is commonly discussed as a phase transition by qualitative analogy to vapor condensation. Here we show that the connection to thermodynamic phase transition is rigorous. We develop the statistical thermodynamics of irreversible binary aggregation in discrete finite systems, obtain the partition function for arbitrary kernel, and show that the emergence of the gel cluster has all the hallmarks of a phase transition, including an unstable van der Waals loop. We demonstrate the theory by presenting the complete pre- and post-gel solution for aggregation with the product kernel.

Recently we developed a statistical thermodynamic formalism to describe the behavior of generic populations 13 . Here we apply this theory to irreversible aggregation, develop the thermodynamics the discrete finite domain, and obtain the solution to the product kernel.

The Cluster Ensemble
We cast the problem in the theory of the cluster ensemble 13 , which we briefly summarize here. We consider a population of M individuals (''monomers'') that form N clusters and construct the microcanonical ensemble of all possible distributions n~n 1 ,n 2 , Á Á Á ð Þ , where n i is the number of clusters with i monomers. All distributions of the (M, N) ensemble satisfy the two constraints P n i~N , P in i~M : When two clusters in distribution n merge, the outcome is a new distribution in the ensemble (M, N 2 1) of the next generation. We formally define generation g 5 M 2 N 1 1 such that g 5 1 refers to fully dispersed monomers and g 5 M to a fully gelled state. These parent-offspring relationships produce a directed graph that represents the phase space of discrete binary aggregation (Fig. 1). Following 13 , we express the probability of distribution n in the (M, N) ensemble as where n!~N!=n 1 !n 2 ! Á Á Á is the multinomial coefficient, V M,N is the partition function, and W(n) is the bias of distribution, a functional of n that is determined by the physics of the problem, here, by k ij . The most probable distribution (MPD) in the thermodynamic limit maximizes V M,N , and is given bỹ where b, log q and logw i are given by the partial derivatives, These general results, established in Ref. 13, describe the thermodynamic state of a generic population in terms of of the selection bias W(n). We now proceed to derive the selection bias and partition function for the specific problem of interest, the discrete binary aggregation process depicted in Fig. 1.

Aggregation in Discrete Finite Systems
The probability P(n) of distribution n propagates from parents to offsprings according to equation where n9 is the parent that produces n by merging cluster sizes i 2 j and j, and P n'?n is the transition probability for the aggregation event (i) 1 (j) R (i 1 j). The parent distribution n9 is transformed into the offspring distribution n via the aggregation of cluster masses i and j.
The probability for this process is proportional to the number of (unordered) i 2 j pairs and the aggregation kernel k ij : where d ij is Kronecker's delta. Normalizing overall all pairs in n9, the transition probability becomes where k n' is the mean aggregation kernel among all pairs in parent n9 (see Supplementary Information for details), Combining (2), (9) and (7), we obtain the recursion where k M,Nz1 h ĩ X P n' ð Þ k n' is the ensemble average kernel over all distributions of the parent ensemble. Equation (11) applies to all distributions n of the (M, N) ensemble, and since the left-hand side is strictly a function of M and N, the same must be true for the righthand side: the quantity in braces must be independent of n. We further require homogeneous behavior in the thermodynamic limit, such that logw i is an intensive function of M/N. This condition fixes the quantity in braces to be 1 and breaks equation (11) into two separate recursions, one for V and one for W(n). The first recursion is (see Supplementary Information for details) and is readily inverted to produce Thus we have the partition function in terms of M, N, and the product of all AEKae from generation 1 up to the parent of the current generation. We note that the binomial factor is the total number of ordered partitions of integer M into N 24 , also equal to the number of distributions in the (M, N) ensemble, each counted n! times. The second recursion gives the selection bias W(n) in terms of the bias of all parents n9 of distribution n (see Supplementary Information for details): Starting with W 5 1 in generation 1 we may obtain, in principle, the bias of any distribution in the phase space. Returning to equation (12), we recognize the right-hand side as q, which produces the path equation of the process: Equations (13) and (14), along with (3)-(6) constitute a closed set of equations for the MPD in the thermodynamic limit.

Product Kernel
We now apply the theory to obtain the solution to the product kernel k ij 5 ij. In the thermodynamic limit, k M,N h i? k n ? M=N ð Þ 2 . With this result and Eqs. (4)-(5) we obtain the parameters of the sol (see Supplementary Information for details): q~h 1{h ð Þ, ð17Þ with h 5 1 2 N/M. The MPD follows from equation (3): The MPD of the sol satisfies the extremum condition d log V M,N 5 0, but for the state to be stable we must also have d 2 log V M,N # 0, or (h log q/hN) M # 0. Applying this stability condition to equation (17) we conclude that the range of stability is 0 # h , 1/2 and that phase splitting must occur at N* 5 M/2. This is the same as the gel point in the Smoluchowski equation with monodisperse conditions. We now proceed to obtain solutions in the post-gel region. Consider a twophase system that contains mass M sol in the sol, and M gel 5 M 2 M sol in the gel (N gel 5 1, N sol 5 N 2 1 13 ). As an equilibrium phase, the sol maximizes V M sol ,N{1 <V M sol ,N . Its distribution, therefore, is given by equation (3) Thus we have the complete solution: in the pre-gel region (h # 1/2) the sol is given by equation (19); in the post-gel region (h $ 1/2) it given by the same equation with h replaced by h sol 5 1 2 h, and the gel fraction is obtained from equation (20). We illustrate the theory with a numerical calculation for M 5 40. This value is sufficiently small that we may enumerate all distributions on the aggregation graph and perform an exact calculation of the entire ensemble, yet large enough that the thermodynamic limit is approached to satisfactory degree (the phase space contains 37338 distributions). As a further test we conduct Monte Carlo (MC) simulations by the constant-volume algorithm described in Ref. 25. The simulations sample the vicinity of the MPD (not the MPD itself) from which the mean distribution is calculated. The exact calculation is done on the entire graph as follows. Starting with W 5 1 in generation g 5 1, we apply equation (14) to obtain the bias of all distributions in the next generation until the entire graph is computed. Next we calculate the partition function in each generation from the normalization condition V~X n!W n ð Þ, and the probability of distribution from equation (2). With all probabilities known, the mean distribution and the ensemble average kernel are readily calculated, and the MPD is identified by locating the maximum P(n). As a check, we calculate the partition function from equation (13) and confirm that for pre-gel states it agrees with the result from the normalization condition.
These calculations are compared in Figure 2, which shows selected distributions ranging from N 5 33 (early stage of mostly small clusters) to N 5 6 (nearly fully gelled). Since the MPD is an actual member of the ensemble, it contains integer numbers of clusters. The mean distribution is a composite of the entire ensemble and is not restricted to integer values. The giant cluster forms at N* 5 22 and its presence is seen very clearly in the MPD. The gel phase is less prominent in the mean distribution because its peak is smeared by lateral fluctuations. Not all distributions in the vicinity of the MPD contain a giant cluster; as a result, the gel fraction grows smoothy at the gel point. In the sol region 1 # (M 2 N 1 1)/2, the theoretical distribution from equation (19) and the mean distribution are in excellent agreement. The analytic result eventually breaks down when N R 1 (the thermodynamic limit is violated at this point), yet even with N as small as 6, agreement with theory remains acceptable. The mean distribution from MC is practically indistinguishable from that by the exact calculation. This confirms the validity of equation (14), which forms the basis of the exact calculation.

Discussion
Our results make contact with several studies in the literature. A recursion for the partition function that is similar to that in equation (12) (different by a factor that is inconsequential for the statistics of the ensemble but crucial for thermodynamics to work) was obtained by Spouge 18,26,27 by a combinatorial derivation for kernels of the form www.nature.com/scientificreports SCIENTIFIC REPORTS | 5 : 8855 | DOI: 10.1038/srep08855 k ij 5 a 1 b(i 1 j) 1 c(ij) in pre-gel states. We recognize equation (19) as the classical pre-gel solution to the Smoluchowski equation 1,3 . We further recognize the post-gel solution as the Flory model, which assumes that the sol fraction continues to interact with the giant component past the gel point 12 . No such a priori assumption is required here: as long as no cluster is excluded from merging, a condition already built into the kernel (k ij ? 0 for all i, j $ 1), the post-gel solution is the Flory solution.
We close with a final observation that points to an even closer analogy to molecular systems. Using equation (15) to calculate q we find whose limiting value for N?1 is the result given in equation (17). Plotted against h 5 1 2 N/M over the full range h 5 0 to 1, this equation shows behavior reminiscent of subcritical van der Waals isotherms (Fig. 3) and for large M it converges to a parabola in the region 0 # h , 1, plus a Dirac delta function at h 5 1. Stability requires (hq/hh) M $ 0, a condition that is met in 0 # h # 1/2, but also on the Dirac branch. When the system crosses into the unstable region (state A in Fig. 3) it must split into two phases. The sol phase is determined by equation (15), which produces a state on the stable branch (h , 1/2) at the same q. Extending this line to the right we  obtain an intersection with the Dirac delta branch, which we identify as the equilibrium gel phase at h gel 5 1. Thus we have the tie line of this two-phase system: it connects two equilibrium phases, with an unstable state at the middle.
The ensemble method was applied here to binary aggregation but can be adapted to any other growth mechanism. For example, by reversing the arrows in Fig. 1 we obtain the graph of binary fragmentation; by including both directions we obtain the graph for reversible aggregation/fragmentation (both processes share the same trajectories in phase space as binary aggregation). In general, the evolution of populations may be viewed as a swarm of trajectories in the phase space of Fig. (1) under parent-offspring relationships that must be derived for each case separately. We may draw, therefore, a rigorous connection between statistical thermodynamics and population balances that offers new insights into the dynamics of evolving populations.