Machine learning of pair-contact process with diffusion

The pair-contact process with diffusion (PCPD), a generalized model of the ordinary pair-contact process (PCP) without diffusion, exhibits a continuous absorbing phase transition. Unlike the PCP, whose nature of phase transition is clearly classified into the directed percolation (DP) universality class, the model of PCPD has been controversially discussed since its infancy. To our best knowledge, there is so far no consensus on whether the phase transition of the PCPD falls into the unknown university classes or else conveys a new kind of non-equilibrium phase transition. In this paper, both unsupervised and supervised learning are employed to study the PCPD with scrutiny. Firstly, two unsupervised learning methods, principal component analysis (PCA) and autoencoder, are taken. Our results show that both methods can cluster the original configurations of the model and provide reasonable estimates of thresholds. Therefore, no matter whether the non-equilibrium lattice model is a random process of unitary (for instance the DP) or binary (for instance the PCP), or whether it contains the diffusion motion of particles, unsupervised learning can capture the essential, hidden information. Beyond that, supervised learning is also applied to learning the PCPD at different diffusion rates. We proposed a more accurate numerical method to determine the spatial correlation exponent \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{\perp }$$\end{document}ν⊥, which, to a large degree, avoids the uncertainty of data collapses through naked eyes.


Introduction
Machine learning (ML) algorithms 1 have been widely used in equilibrium phase transitions, to distinguish matter phases and detect phase transitions [2][3][4][5][6][7] in various kinds of systems.Based on whether labels are involved or not, ML methods can be categorized into supervised and unsupervised learning.Often, they are also closely related to the so-called deep learning in which more elaborate frameworks are adopted 8,9 .As is known, supervised learning includes regression and classification, which are efficient in predicting certain quantities that appear in fields such as biophysics 10,11 , astrophysics 12 , quantum physics 13 , and many more domains in physics 14,15 .In fields such as statistical physics 16,17 and condensed matter physics 2,4,18,19 , supervised learning is employed to identify phases or predict phase transitions, as well as speed up simulations 20 .For supervised learning of phase transitions, we need to have real experimental data ready or generate configuration data through Monte Carlo simulations 21 , before labeling and training them.By this means, the trained model can recognize and predict newly input configurations and obtain the corresponding regression or classification results, from which we can also utilize the rescaling method to yield some critical exponents.
In contrast, unsupervised learning does not require labels, which in autoencoder is what we say the input itself.Unsupervised learning is powerful for data clustering, compression, dimensionality reduction and visualization, due to its ability to extract essential information from raw data.It is believed that the unsupervised learning can learn the hidden information in the input data with a changing trend, which has been intriguing.
In recent years there has been vast progress in supervised and unsupervised learning for equilibrium phase transitions [2][3][4][5][6][7]22 as well as non-equilibrium phase transitions [23][24][25][26] . As te preprocessing for data training and prediction, unsupervised learning is more appealing. The aticle 4 is the earliest literature of unsupervised learning method being used for studying phase transitions that we can retrieve so far.The author uses principal component analysis (PCA) to arXiv:2112.00489v3 [cond-mat.stat-mech]23 Feb 2024 distinguish the phases and reveal the properties of the classical Ising model, such as order parameters and structure factors.In ref. 5 , by using PCA, kernel-PCA, autoencoder, and variational autoencoder to study the 2-D Ising model and 3-D XY model, it is found that some potential variables could be related to the order parameters. Acording to ref. 6 , PCA has been widely applied to comparing critical behaviors of various models, including Ising models with square-and triangular-lattice, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-one Ising (BSI) model, and the 2-D XY model.It is shown that the quantized principal component of PCA can not only detect phases and symmetry breaking but distinguish phase transition types and determine critical points.Similar studies of critical phenomena by using unsupervised learning can refer to 3,27,28 .In what way can unsupervised learning process the known or generated data to extract essential embedded information, is also very crucial for statistical physics.
Inspired by this, the present study was designed to extrapolate ML techniques to a binary stochastic reaction process [29][30][31] called pair-contact process with diffusion (PCPD) [31][32][33][34] .Two unsupervised learning methods, PCA and autoencoder, are used to study this model, respectively.PCA 35,36 is a linear dimensionality reduction method, which has a wide range of applications in extracting salient features of complex data.While the autoencoder [37][38][39] neural network has the advantage of dealing with nonlinear data in image recognition and data compression.By applying PCA and autoencoder, we try to classify different lattice phases and extract the thresholds of the PCPD model.The input data is generated by the Monte Carlo simulations of the model.
It should be noted that the directed percolation (DP) process, a classic non-equilibrium phase transition model 29,30,40 , has been studied by both supervised and unsupervised learning 25 .Combining the learning results of the DP and the PCPD models, we can immediately appreciate the value of investigating such non-equilibrium systems.Among them, the DP is a unitary random reaction process, while the PCPD is a binary one.The results show that our PCA and autoencoder can successfully cluster PCPD's configurations and determine the critical points with high accuracy.Equilibrium phase transitions have some similarities with non-equilibrium ones, and therefore similar methods are suitable for them.The unique feature of non-equilibrium phase transition is the extra dimension, namely time.The states and properties of the system may change with time.If one intends to extend the ML algorithms to non-equilibrium phase transitions, the dimension of time has to be dealt with.
The overall structure of the study takes the forms of three chapters, including this introductory one.In the second section, we briefly introduce the model of PCPD, the two unsupervised learning methods, the data sets and ML results of PCPD.Here, we successfully cluster the configurations and obtain reasonable thresholds.By using supervised learning, we also obtain some critical exponents.In addition, numerical calculations are implemented to determine correlation exponents.The third section is a summary of this paper.

Machine Learning of the PCPD
The model of PCPD Before understanding the PCPD, we will briefly introduce its prototype, the pair-contact process (PCP) in which no diffusion is considered.Fig. 1 displays a configuration generated by the PCP.PCP, proposed by Jensen 41 , is a random reaction process without particle diffusion and can produce a continuous phase transition.In the d-dimensional lattice, sites are either occupied or empty.Under the sequential updating mechanism, processes of proliferation and annihilation compete with each other until the system reaches a steady-state or an absorbing phase.When L → ∞, the particle system has an infinite number of absorbing states.Different from the DP, the order parameter of the PCP is given by the pair-particle density.Nevertheless, the PCP has been proven to have the same critical exponents as the DP, which means that the PCP belongs to the DP universality class.
Different from the DP, the PCPD is a simple binary random reaction process 42 .When the diffusion rate D is 0, the classical model, PCP, is restored.Henceforth the PCPD can be regarded as a generalization of the PCP.In the PCPD, a reaction is triggered by the forming of a pair-particle.The PCPD' reaction-diffusion mechanism is given by, where A means a single particle, and ∅ an empty site.
The PCPD demonstrates a continuous phase transition from the fluctuating active phase to the absorbing phase, whose evolution is governed by the following rules, where p ∈ [0, 1] represents the pair annihilation probability, and D ∈ [0, 1] the diffusion rate.When the system is in the active phase p < p c , the proliferation process plays a dominant role.After a sufficiently long time, the system will reach a steady-state, at which point the particle density ρ s > 0. On the contrary, the annihilation process dominates the p > p c system in the absorbing phase, and the particle density decreases rapidly until the system reaches the absorbing phase.In the thermodynamics, if the diffusion rate D is 0, the space-time trajectory of a single particle will always be a straight line (see left panel of Fig. 1).Once reactions are not longer possible due to adjacent pairs being depleted, each single particle leaves behind a long stripe.In the inactive phase, since there are infinite combinations for the locations of these stripes, the process without diffusion will result in the generation of an infinite number of absorbing states.If diffusion rate D > 0, a single particle is allowed to diffuse, leaving a random-walk like trajectory until it collide with another particle and trigger a reaction event.Even if there are just two particles in the lattice, when the two diffusing particles meet, the offspring of them may still meet again after a long period of time, resulting in a certain degree of randomness in the trajectory of the diffusing particles, and the visual appearance are generally not straight lines.If the value of p is fixed, the particle density of the system decays with the increase of the diffusion rate D.
In MC with periodic boundary conditions, sequential update mechanism is employed to generate configurations of the PCPD.We first select a particle and a direction randomly.Then a particle in the selected direction moves to its nearest neighboured site with probability D; the nearest neighbour in the selected direction annihilates with probability p(1 − D), and creates a new particle at the next nearest neighbourhood with probability (1 − p)(1 − D).For a block of sites with indices h, i, j, k, l, we can characterize the dynamic rules of the PCPD model by Eq. (3), where s i (t) represents the state of node i at time t, and z ∈ (0, 1) is a random number generated from a uniform distribution.
Like many non-equilibrium models, the PCPD is easy to simulate on computers, but hard to implement with experiments.Moreover, the exact analytical solution of the PCPD is not yet available.It does not have rapid time inversion symmetry.Analogous to many non-equilibrium models, the PCPD is described by four independent critical i f s j (t) = 1 and s k (t) = 0 and z < D/2, s j (t + 1) = 0 s i (t + 1) = 1 i f s j (t) = 1 and s i (t) = 0 and z < D/2, s j (t + 1) = 0 s k (t + 1) = 0 i f s j (t) = 1 and s k (t) = 1 and z < p(1 − D), s l (t + 1) = 1 i f s j (t) = 1 and s k (t) = 1 and s l (t) = 0 and z < (1 − p)(1 − D)/2, s h (t + 1) = 1 i f s i (t) = 1 and s j (t) = 1 and s h (t) = 0 and otherwise, exponents (β , β ′ , ν ⊥ , and ν ∥ ).However, its universality class category has become one of the unsolved and controversial problems of non-equilibrium critical phenomena 31,33 , which has received considerable amount of attention.Take (1+1)-dimensional lattice as an example, the PCP has infinite number of absorbing states, whereas the PCPD can only have at most two absorbing states, including the one with all empty sites and the one with a single diffusion particle.The particle annihilation of the PCPD at absorbing phases shows an algebraic decay vs time, which is a piece of evidence that the PCPD may not belong to the DP universality class.In Monte Carlo simulations 32,34,43 , usually we set the sum of diffusion, annihilation and proliferation probability of a single particle in the PCPD to be 1.In addition, numerical simulations indicate that the upper critical dimension of the PCPD is d c = 2, and that of the DP is d c = 4.It is worth noting that for numerical results, some variables (such as particle density and pair density) do not obey exact power-law distributions.This could be one of the factors that p c and critical exponents of the PCPD are dependent on diffusion rates.In general, the critical behaviors of the PCPD still need to be unveiled, with higher measurement accuracy, which is exactly one of the motivations of this work.

PCA
As one of the most commonly used dimensionality reduction [44][45][46] and visualization techniques, PCA 44 transforms a set of potentially linearly correlated data into a set of linearly unrelated variables.The first principal component is the data with the highest variance after transformation, and the second principal component is the data with the second-highest variance, and so on and so forth.There are two feasible methods to achieve PCA dimensionality reduction.The first one is on the strength of the eigenvalue decomposition of the covariance matrix, and the second one is on the strength of the singular value decomposition (SVD) of the original matrix.These two methods are intrinsically related, and please refer to Appendix A for details.

Autoencoder
PCA is a linear dimensionality reduction algorithm, which eliminates redundancy by reducing the spatial dimension, and uses fewer features to describe data information as completely as possible.Autoencoder has advantages in feature extraction of linear data, nonlinear data denoising, image recognition, image compression, visual dimensionality reduction, and feature learning.In this paper, we focus on the latter two points.
We can refer to article 47 for recent advances of autoencoders, and 48,49 for systematic reviews of autoencoder and its variants.For example, sparse autoencoder is usually used to implement classification tasks, while denoising autoencoder can extract the most important features of data and learn their robust representation.Variational autoencoder is a generation system, which has similar functions as generative adversarial networks(GAN).Fully connected autoencoder generally ignores the spatial structure of the image.In this paper, we used the convolutional autoencoder 50 .Please refer to Appendix B for an introduction to the principle including the loss function of autoencoder.
To prevent over-fitting 9 , we add the L2-norm (λ /(2N) ∑ i w 2 i ) to the loss function.The AdamOptimizer is used to speed up our neural networks.Our ML is implemented based on TensorFlow 1.15.

Data Sets and Results
Before building an adaptable model, we first need to explicitly define an ML problem 51 .It includes data point selection, its features and labels, hypothesis space, and loss function.For (1+1)-dimensional PCPD, its data points can be regarded as a configuration with all time steps being generated from a single annihilation probability, as shown in Fig. 1.It is characterized by the number of lattice sites contained in the configuration, including "1" (occupied site) and "0" (empty site).In unsupervised learning, including PCA and autoencoder studies, we do not need to label the configuration, but instead extract some information directly from the original data.In supervised learning, however, we have to label the configuration generated by MC simulations, only in this way can we train a special and reasonable model or hypothesis space.In the neural networks of this paper, our loss function is the mean cross-entropy between the label vector y l and the output layer vector y.In autoencoder neural networks, the label can be regarded as the original configuration itself.The main task of unsupervised learning's applying to phase transitions is to reduce the dimensionality of original high-dimensional data to extract critical points.In this paper, the main purpose of supervised learning is to make a binary classification for (1+1)-dimensional PCPD configurations, which are percolating and non-percolating phases.Depending on the binary classification output, we can also calculate some critical exponents by rescaling of the data collapse.Obtaining critical points and critical exponents is the fundamental motivation for us to study phase transition problems.
As explained in 25 , for (1+1)-dimensional bond DP, the paper chooses the generated configuration with a fully occupied lattice as the initial condition.And this paper will deal with the data form in the same way.We employ Monte Carlo simulations to generate the raw input data(see Supplementary Section "PCPD's data sets for training and test") needed for ML.It is clear that for non-equilibrium lattice models, the characteristic time t c ∼ L z is much larger than the lattice size because of the extra time dimension.Therefore, for the sake of calculation, we will truncate the original data for ML.For example, in many cases, we will take T = L.This operation can reduce the amount of computation and the impact of fluctuations in some cases.Of course, such processing does not lower the accuracy of the results of ML and is therefore considered feasible.
In autoencoder, our configuration data is divided into a training set, a validation set, and a test set.We use the validation set to adjust the hyper-parameters so that the model is optimal.
Unsupervised ML of the PCP and the PCPD via PCA First, we perform PCA dimensionality reduction for (1+1)-dimensional PCPD.For this binary random reaction process, we learn the case of D = 0.1.We use Monte Carlo simulations to generate configurations corresponding to different annihilation probabilities p.Here, we set the lattice size L = 40 and the time step T = 40.We select 31 annihilation probabilities with an interval of 0.01 between 0 to 0.3, and each of them generates 100 samples.That is, the raw data matrix is X 3100×1600 .It is obvious that in Fig. 2 (a), there is only one dominant principal component whose corresponding explained variance ratio is the largest.Simultaneously, the configurations within the range of p = 0 ∼ 0.3 are approximately arranged in a straight line.We conclude that the first principal component and particle density are proportional to one another.This means that such a correlation makes the particle density a physics quantity that can be related to the first principal component.
The relationship between the first principal component and the annihilation probability, is shown in Fig. 2 (c).To achieve high accuracy, we generated 1000 samples for each annihilation probability to perform an ensemble average for the first principal component.By observing Fig. 2 (c), we can see that the jumping location is the transition point.Fig. 2 (d) is the result of mapping the first principal component and the second principal component onto a plane.This approach also has a good clustering effect for (1+1)-dimensional PCPD.
The article 52 illustrates that using more principal components can capture and retain the most crucial information from the original data.Actually, in this paper, for the purpose of clustering or phase classification (extracting critical points), only one or two principal components are needed.
In addition, we also implemented PCA dimensionality reduction and visualization for (1+1)-dimensional PCP, as shown in Fig. 3.In this case, the particle does not diffuse.We can conclude that, although the particle diffusion rate has changed, the unsupervised learning method PCA can promptly make an excellent clustering representation of (1+1)-dimensional PCPD and capture the critical point.

Unsupervised ML of PCP and PCPD via autoencoder
Before employing the autoencoder learning of PCP, we need to pre-process the raw configurations.In phase transitions, the order parameter is the key quantity that separates the two different phases at the critical point.For PCP model, the density of pair particles, is one of two order parameters (the other one is density of single particles).Henceforth, feature engineering, for separating single and pair particles, is needed to deal with original configurations thereinbefore.After the preparations, we carry out autoencoder learning for these two configurations, and show the results at the bottom panels of Fig. 4, left and right.To determine the critical point, we perform a non-linear fitting in the form of the hyperbolic tangent function a × tanh[b(p − p c )] + c (the blue lines of the bottom panels of Fig. 4 are the fitting curves), and the location p c is the critical point we are looking for.The fitting technology we used is the NonlinearModelFit function of Mathematica, with which the Levenberg-Marquardt algorithm had been used with at least 10 iterations to minimize the sums of squares.We find that the predicted p c ≃ 0.0809 of Fig. 4 (d) is very close to that given by Monte Carlo simulations, where p c ≃ 0.0771.Thus, we conclude that the pair-particle density can characterize the phase transition of the PCP in a good manner.The following learning results of the PCP are then all based on the configurations of pair particles only.Fig. 5 shows the autoencoder learning results of (1+1)-dimensional PCP.The lattice size is L = 40, and the time step is T = 40.Figs. 5 (a) and (b) represent two types of clustering, with configurations being generated by 11 and 31 different annihilation probabilities, respectively.As is known, the diffusion rate of PCP is D = 0. Fig. 5 (a) displays the clustering of the cases with 11 different annihilation probabilities p, and for each p, 100 samples of the configuration are generated.The number of potential neurons in the autoencoder network being set to 2, the eleven categories collapse onto a straight line.In Fig. 5 (b), 31 annihilation probabilities, between 0 to 0.3 and with an interval of 0.01, are selected.Clustering results indicate that data points corresponding to smaller p appear on the lower left (blue), while those corresponding to larger p appear on the upper right (red).The middle segment is more scattered than the two ends, which by our speculation may belong to the critical regions.
In order to determine the critical probability p c , the number of potential neurons of the convolutional autoencoder neural network is set to be 1.After the neural network is trained, we got the latent variable shown in Figs. 5 (c) and (d) where single potential variables are plotted as functions of p.The hyperbolic tangent function fitting yields p c = 0.081(1) and p c = 0.082(1) that are very close to the theoretical value of p c = 0.077092(1) 31 .
Analogous to the PCP model, the configurations of the PCPD model for autoencoder learning are selectively chosen.Three different configurations of the PCPD are shown in the top panel of Fig. 6, and their corresponding autoencoder learning results are given in the bottom panel of the same figure, respectively.It is found that only the learning based on the original configurations agrees well with the Monte Carlo simulations.Therefore, for the PCPD model, we choose the original configurations for autoencoder learning.Theoretically, the order parameter of the PCPD model is expressed by the sum of the densities of pair particles and single particles.
In the PCPD model, it includes not only the pairwise reaction between particles but also the diffusion motion of particles.First, we investigate the autoencoder learning results when D = 0.05.Fig. 7 (a) represents the scatter plot of hidden variables extracted from configurations of 10 different p's, and Fig. 7 (b) represents the counterpart of 41 different p's.For each of p, 100 samples are generated.From these two panels, we can see that the configurations corresponding to the same p are clustered in the proximity of one another.And all the clustered zones are well ordered according to the values of p.This allows for the conclusion that for (1+1)-dimensional PCPD, autoencoder can accurately cluster its configurations.
Furthermore, we hope to be able to obtain the threshold p c by limiting the number of potential neurons.Let the number of potential neurons in the autoencoder network be 1, the plot of Latent variable versus p is shown in Fig. 8.In order to achieve higher accuracy, a finite-size scaling is undertaken for the (1+1)-dimensional PCPD system at D = 0.05, where L = 32, 48, 64, 80, 96.And the critical point is estimated to be 0.105844, quite close to p c = 0.10439(1) as given in 32 .
As seen, the diffusion effect of D = 0.05 is not that significant.To study the cases when the diffusion rate is accelerated, we need to increase the value of D. Interestingly, the particle annihilation rate increases with the increase of D and consequently, the number of occupied lattices drops rapidly.As D increases up to a certain value, it seems that our autoencoder neural network cannot effectively detect the hidden structure of the system.We conjecture that larger D could lead to a rapid decay of particle number in the system so that the neural network may not learn anything useful.That is to say, the input information to the neural network will be very limited, and the training is not effective without sufficient feeds.Our extensive studies indicate that the capture of the PCPD critical point by autoencoder neural network is more feasible when D ≤ 0.5 is satisfied.

Supervised learning of the PCP and the PCPD
Unsupervised learning can detect the transition points of critical systems, whereas supervised learning can yield some critical exponents by rescaling or the so-called data collapse.There are two correlation exponents in non-equilibrium phase transitions, called spatial and temporal correlation exponents, respectively.Here we focus primarily on the former, spatial correlation exponent.Through the dynamical exponent z = ν ∥ /ν ⊥ , the temporal correlation exponent can be indirectly measured.
Here in supervised learning, we employ the fully connected network (FCN) to identify phases of the PCPD.The basic architecture of FCN used in this paper can be found in one of our previous articles 25 .The hyper-parameters we use are as follows: the learning rate is 0.0001, the batch size is 1024 and the regularization parameter is 0.01.The mean cross-entropy between the label vector y l and the output layer vector y used in our FCN also contains a L2-norm mentioned above to prevent over-fitting.Our FCN contains 100 neurons in the hidden layer, and two in the output layer.Input configurations are labelled by "0" for probabilities of annihilation less than the critical threshold, and "1" for rest probabilities.After the neural network is trained and fine tuned, we obtain the learning results of the PCP and the PCPD, shown in Fig. 9.The first column of Fig. 9 represents the output results of two neurons of five different system sizes, and the intersections are the transition points predicted by the learning.Obviously, these predictions are consistent with the theoretical thresholds, and the details of the learning procedure can refer to 25 .
The second column of Fig. 9 displays the data collapses in which the horizontal co-ordinates have been rescaled by a factor related to the spatial correlation exponent ν ⊥ .Both PCP and PCPD are binary reaction diffusion processes, and the PCP belongs to the universality class of the DP, whereas the university class of PCPD remains vague.The spatial correlation exponent given by Fig. 9 (b) is ν ⊥ ≃ 1.13 (1), which approaches the theoretical one ν ⊥ ≃ 1.09.For the PCPD its critical threshold p c (D) may depend on the diffusion rate.ν ⊥ is also found to be dependent on D, the diffusion rate, as give in Table 1.Especially ν ⊥ decreases as D increases.Compared with the results obtained by Monte Carlo simulations in Refs. 32,43,54 ,the measurements of ν ⊥ are accompanied by certain amount of uncertainties, caused by various factors.Although our measurements show that ν ⊥ may vary with the diffusion rate, we have to be very cautious in drawing such a conclusion, due to the fact that ML so far only works with small systems.Because there are finite-size effects, as well as fluctuations driven by diffusion.Therefore, the dependence of ν ⊥ on D might be an artifact caused by the combination of finite-size effects and diffusion-driven fluctuations.
Due to its strong correction to scaling, the PCPD is notoriously known for its extremely slow crossover behavior to the scaling region, rendering the estimations of both its critical point and the critical exponents hard (see e.g. the extensive review 33 ).Therefore, even though several studies, such as a bosonic variant of the PCPD 55 and the refined mean-field phase portrait analysis 56 , suggested that the PCPD constitutes a novel universality class different from the DP or the PCP, most Monte Carlo simulation studies have yet defied this conclusion.Most notably, many elaborate simulations reveal that its critical properties seem to depend on the considered diffusion rate 43,54,57 .Nevertheless, by sophisticatedly taking into account the effects of correction to scaling, a more recent study 58 was able to obtain a diffusion-independent decay exponent that is markedly distinct from the DP value.Given the rather limit system size and simulation time we used in the ML scheme, such large-scale, long-time behaviors as in Ref. 58 of course can not be observed, but as in Refs. 43,54,57 ,one should expect to observe a diffusion-dependent measurement for the spatial correlation exponent ν ⊥ as opposed to other phase transition models.
Very little can be traced in the literature on how to estimate critical exponents from data collapse.Mostly the procedure relies on the eyes, without any solid foundation or reliable criterion.Here, we propose a more reliable means to determine ν ⊥ , combining the Euclidean distance.Take as an example the case where the diffusion rate is 0.1.We choose the two results of the output layers corresponding to L = 32 and L = 80, respectively.For binary classification neural networks, usually, the outputs obey sigmoid functions.After rescaling the abscissa, we perform sigmoid function fitting for the two curves.Assume Y = F 1 (x) and Z = F 2 (x), then the Euclidean distance When calculating the Euclidean distance, we uniformly select all X's in the range of [−0.5, 0.5] with an interval of 0.02.The Euclidean distances are provided in Table 2, and the data collapse results for three different ν ⊥ 's are plotted in Fig. 10.Starting with ν ⊥ = 0.91, with the increase of ν ⊥ , the Euclidean distance first decreases until ν ⊥ = 1.11, and then increases after that.ν ⊥ = 1.11 is the value that we are looking for, which corresponds to the optimal fitting.This method successfully avoids the instability of naked eyes, and therefore might be served as a methodology for data collapse.

Discussion
Since the parameters built in the autoencoder are obtained by minimizing the cost function at certain training samples, it's necessary to check its prediction ability.Here, we implement the cross-entropy as the cost function, and compare its values at the last training epoch in training set to those in validation set.The values of different system sizes are displayed in Table 3.It can be found that for most sizes the two losses, namely train loss and validation loss, are quite close to each other, and the peak of their difference, around 5%, appears at L = 32.Therefore, the trained network is believed to be applicable for all configurations.Now that the training loss reveals reconstruction ability of the autoencoder, it's still interesting to discuss the relation between training loss and lattice size.Firstly, the training loss raises its value from L = 16 to L = 48.This is because the relation between sites of configurations gets augmented as the lattice size increases, but it's insufficient for autoencoder to generate an appropriate encoder-decoder chain.Once L is larger than 64, the loss begins to fall, which implies that a larger lattice size is needed to reduce training loss.This upward and downward trend should be able to demonstrate the finite-size effect of the system.Note that the current ML is only feasible for systems of small sizes, which certainly brings up finite-size effects.Additionally, PCPD itself is a very special non-equilibrium phase transition model, in which there is diffusion of particles which is also affected by the system size.From the perspective of ML, in order to reduce errors to a greater extent, what we can do is to optimize various parameters of the neural network, expand the training scale and increase the number of test samples.We hope this could help improve the computational accuracy.
We hope to find an available benchmark or a baseline level for the desired predictive accuracy.Here, we calculate the critical value of small-sized PCPD by MC simulations, and the MC results of critical values are shown in Tab. 4. It can be seen from Fig. 11 that in the (1+1)-dimensional PCPD with lattice size L = 80, it seems difficult to find a clear power-law density decay curve near the critical regime.Then we find that the predictions of small-size systems of ML are slightly less accurate than those of large-size systems of MC 32 , although the deviation is not that large.Therefore, we believe that ML can provide some reference for critical value prediction of the PCPD model.

Conclusion
In this paper, we apply unsupervised learning methods (PCA and autoencoder) and supervised learning to the binary process PCPD of non-equilibrium phase transition models.The main conclusions are as follows.
In the SVD method, X m×n can be any matrix, which we decompose as follows where U is an m * m square matrix of left singular vectors, Σ is an m * n singular value matrix, V T is an n * n transposed matrix of right singular vectors.We express the dimensional change of the matrix as According to an formula where v ℓ is the right singular vector.Combining this equation and Eq. 8, it is not difficult to prove In most cases, the sum of the singular values of the top 10% or even 1% accounts for more than 99% of the total singular values.The right singular matrix can be utilized for column compression Comparing this expression with the PCA decomposition of Y = XW , one finds that the orthogonal matrix V in SVD is exactly the orthogonal matrix W in PCA.Therefore, the method based on the eigenvalue decomposition covariance matrix is a particular case of the SVD method.That is, the original matrix is square.The code implementation of PCA is easy.The kernel function used by the Scikit-learn package in Python is SVD, which we can call with little hindrance.use it as a layer to build a deep learning network.With appropriate dimensions and sparse constraints, the autoencoder can perform better than PCA and other technologies.
Autoencoders are data-specific, which means they can only compress data similar to what they have been trained to do.For example, an autoencoder trained on images of elephants would do poorly at compressing images of flowers, because the features it would learn would be specific to the elephant.
To build an autoencoder, we need three things that are an encoder function, a decoder function, and a loss function.The loss function is the amount of information lost between the compressed and decompressed data representations.Where the encoder process from the input layer to the hidden layer is as follows The decoder process from hidden layer to output layer is Given that the value of each site x i, j of the PCP and the PCPD configurations is "1" (a given site is occupied) or "0" (a given site is not occupied).Thus, the mean cross-entropy over all sites and samples is employed as the loss function for the autoencoder, where m is the number of samples used in training, and xi, j , the output of the decoder with x i, j as the input.
Encoders and decoders will be selected as parametric functions (usually neural networks) and will be differentiable to the loss function.So by using a stochastic gradient descent algorithm, the autoencoder can optimize the parameters of the encoder or decoder function.
The parameters of our convolutional autoencoder network are as follows.Three convolutional pooling layers and one fully connected layer are used in the encoder process.In the first convolution layer, 16 filters are used, the size of the convolution kernel is 2 × 3, and the corresponding stride is 1.The padding form is 'same' to keep the size of the feature map after the convolution operation.Similarly, the second and third convolutions use 8 filters.All pooling layers use max-pooling, the corresponding filter size is 2 × 2, and the stride 2. In the decoder, the up-sampling layer replaces the pooling layer in the encoder, and it upsamples low-dimensional data to high-dimensional data through a deconvolution filter whose size is 2 × 2. The Sigmoid is used as the activation function after each convolutional operation.hyperparameters values learning rate 0.0001 regularization parameter 0.001 batch size 24 hidden layers of CNN 2 convolutional and pooling layers, 1 fully connected layer hidden units of FCN 1 or 2 activation function sigmoid Table 5.The hyper-parameters tuned of autoencoders.
By minimizing the mean cross-entropy, we obtained detailed information about the hyper-parameters used in autoencoder, please refer to Tab. 5. We chose such an architecture because we have successfully used it in our previous paper 25 to calculate the critical points of another non-equilibrium phase transition model, the directed percolation (DP).
The enormous potential of unsupervised learning has made it popular in scientific research recently.In statistical physics, for systems with tremendous data, unsupervised ML algorithms can process them well.Especially in phase transitions, the Monte Carlo simulations can generate the configuration data of the equilibrium or non-equilibrium phase transition models, and we can use unsupervised ML methods to capture the underlying information of the original data.Therefore, the powerful ability of data processing by unsupervised learning will bring new vitality to the research of statistical physics, which is also the current frontier research hotspot.

Figure 1 .
Figure 1.(1+1)-dimensional PCP and PCPD starting with a fully occupied lattice at criticality.The system size is L = 500, and the time step is 500.

Fig. 2 (
b) presents the relationship between the first principal component and particle density.As can be seen from Fig.2(b), the corresponding configurations of the same annihilation probability are arranged in close proximity.

Figure 4 .
Figure 4.The top panels are configurations of (1+1)-dimensional PCP, where a is the raw configuration with lattice size N = 40 and time step t = 40.b represents the configuration of pair-particle of the lattice defined in a. c and d, autoencoder results of (1+1)-dimensional PCP.c gives the learning of configurations defined in a. d, gives the learning of configurations defined in b.

Figure 5 .
Figure 5. Autoencoder results of (1+1)-dimensional PCP. a and b, encoding of the raw PCP configurations onto the plane of the two hidden neuron activations (h 1 , h 2 ).c and d, Encoding of the raw PCP configurations, using a single hidden neuron activation Latent as a function of the annihilation probability.As seen, the critical threshold approximates the estimate by Monte Carlo simulations.

Figure 6 .
Figure 6.The top panels are configurations of (1+1)-dimensional PCPD, where diffusion rate D = 0.05, lattice size N = 40 and time step t = 40.a-c represent raw, pair-particle and single-particle configurations, respectively.d-f, autoencoder learning results of (1+1)-dimensional PCPD.d corresponds the learning of raw configurations.e, the learning of pair-particle configurations.f, the learning of single-particle configurations.
                                      

Figure 9 .
Figure 9. Supervised learning results of (1+1)-dimensional PCP and PCPD by FCN.a, The output layer, averaged over a test set, as a function of the bond probability p. b, Data collapse of the average output layer as a function of (p − p c (D))L 1/ν ⊥ .System sizes of L = 16, 32, 48, 64 and 80 are represented by different colors, respectively.a and b are the results of PCP, c d, e f and g h correspond to the results when the diffusion rate D of PCPD is 0.1, 0.2, and 0.5, respectively.

Figure 10 .
Figure 10. a and c are results of data collapse with different ν ⊥ in the PCPD, where diffusion rate is D = 0.1.

Table 3 .
The training and validation loss of autoencoder for (1+1)-dimensional PCPD with the diffusion rate 0.1, where each annihilation probability in training and validation sets corresponds to a sample number of 2000 and 200.

Table 4 .
For (1+1)-dimensional PCPD, the comparison of critical point p c on the diffusion rate D = 0.1 between ML and MC simulations with different lattice sizes.The theoretical value is p c = 0.10439(1)32 .

Table 1 .
Autoencoder results of (1+1)-dimensional PCPD (D = 0.05).a-e, encoding of the raw PCPD configurations, using a single hidden neuron activation Latent which is a function of the annihilation probability.f, finite-size scaling, where L = 32, 48, 64, 80, 96.ML and MC simulation results of spatial correlation exponent ν ⊥ with the diffusion rate D.

Table 2 .
The Euclidean distance between two sigmoid curves varies with the spatial correlation exponent ν ⊥