Superdiffusion-like behavior in zero-temperature coarsening of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=3$$\end{document}d=3 Ising model

One key aspect of coarsening following a quench below the critical temperature is domain growth. For the non-conserved Ising model a power-law growth of domains of like spins with exponent \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha = 1/2$$\end{document}α=1/2 is predicted. Including recent work, it was not possible to clearly observe this growth law in the special case of a zero-temperature quench in the three-dimensional model. Instead a slower growth with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha <1/2$$\end{document}α<1/2 was reported. We attempt to clarify this discrepancy by running large-scale Monte Carlo simulations on simple-cubic lattices with linear lattice sizes up to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=2048$$\end{document}L=2048 employing an efficient GPU implementation. Indeed, at late times we measure domain sizes compatible with the expected growth law—but surprisingly, at still later times domains even grow superdiffusively, i.e., with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha > 1/2$$\end{document}α>1/2. We argue that this new problem is possibly caused by sponge-like structures emerging at early times.


INTRODUCTION
To quantify the kinetics of coarsening processes, i.e., their time evolution from a disordered to the preferred equilibrium state at low temperature, is of major interest in many physical systems [1,2].The studied systems have become more and more complex over the last decades ranging from investigations of interface growth [3,4] over systems with long-range interactions [5,6] to the application of the methods to the study of the collapse dynamics of polymers [7,8].Of technical relevance is this process for example in the fabrication of glasses [9].
The theory based on deterministic continuum models, predicts for -dimensional systems with short-range interactions and non-conserved () models for all quench temperatures  below the critical temperature   a power-law growth of the characteristic length scale of the coarsening domain patterns, with  = 1/2 for all systems with  >  or  > 2 [1].This was confirmed in numerous simulation studies of quenches to  ̸ = 0 <   for such models [1,2,10].Also in experiments of the ordering kinetics in Cu 3 Au [11] and at the isotropic-to-cholesteric liquid crystal transition [12], a value close to  = 1/2 was reported.For the special case of a quench to  = 0 in the  = 2 Ising model a power law with growth exponent  = 1/2 is observed as well [13,14].Somewhat as a surprise, for a long time, numerical simulations of the coarsening in the  = 3 Ising model when quenched to zero temperature only reported anomalously small values of  < 1/2 [15], even though many numerical studies were conducted studying the properties of this system [13,14,[16][17][18][19][20].Often reported are values of  ≈ 1/3 [13,14,16] when using system sizes of up to  = 240.This lower exponent has been attempted to be explained in various ways.One such attempt targeted on finding arguments and physical explanations for this are shown in the bottom panels as three-dimensional visualizations highlighting the interfaces between domains.The color (red-white-blue) indicates the distance from the center of the subsection.For details on the visualizations see Supplementary Section III and for snapshots at more times see Supplementary Fig. 5.
phenomenon through the fact that the initial  = ∞ structure does percolate in three dimensions but not in two dimensions [14].Nonetheless, direct simulations of the continuous and deterministic time-dependent Ginzburg-Landau equation for big systems have provided the correct value of  = 1/2 [21].In recent work [22][23][24] the three-dimensional problem was tackled again by simulating this process using very big simple-cubic lattices with linear size up to  = 750, from which the authors conjectured a crossover to  = 1/2 at late times.In an attempt to solve this long-standing puzzle, we performed Monte Carlo (MC) simulations of much larger systems with up to  = 2048 (using periodic boundary conditions), corresponding to more than 8 billion spins by employing a memory and time efficient GPU implementation.Our implementation is adapted from a publicly available code [25] that uses a checkerboard decomposition of the system.

RESULTS AND DISCUSSION
In Fig. 1 we present visualizations of the lattice configuration of an exemplary simulation run for  = 2048 at times  = 10 4 , 6 × 10 4 , and 3 × 10 5 in units of MC sweeps (MCS).The top row shows plane cuts of the configuration allowing for an easy comparison with the well-known smooth behavior in  = 2 (see, e.g., Fig. 2 in Ref. [1]).Three-dimensional representations of domain interfaces are shown in the bottom panel.For early times (left panels), one observes a roughening of the domain boundaries as reported several times earlier for zero-temperature quenches in  = 3 spatial dimensions.This is clearly in violation of the arguments used to derive  = 1/2 where a diffusive domain-curvature minimization is assumed, so that here another effective growth exponent is to be expected.Contrasting, at intermediate and even more so at late times (middle and right panels) the domains appear much smoother and diffusion-like growth might be anticipated.However, as domains inside domains are a prominent feature in earlier snapshots but not as much at late times, during the coarsening process annihilation of these domains has to take place.We conjecture that this annihilation is an additional contribution to the domain growth.
To quantify these observations, we measure the twopoint equal-time correlation function where ⟨•⟩ denotes the average over initial conditions and independent trajectories.With increasing order of the system, one expects the correlation function to decay slower, i.e., for late times the correlation function should correspondingly indicate a stronger correlation.Demonstration of this is shown in Fig. 2(a) for the times mentioned in the key for  = 2048 and  = 0.All data was obtained by starting from random spin configurations (with magnetization  ≈ 0) and averaging over 40 independent realizations (we use the same number of independent realizations for each system size).Note, that previous work [26] indicates that similar results are to be expected from finite starting temperatures.(, ) is expected to follow dynamical scaling, i.e.

𝐶(𝑟
This is self-consistently tested by extracting ℓ() from the intersection of (, ) with a constant value of  = 0.5.
(For the effect of different choices of , see Supplementary Discussion III.) Subsequently we plot (, ) versus /ℓ() in Fig. 2(b).We note that especially at large distances  the data collapse is not optimal.This may be an indication of a number of things, e.g., the occurrence of finite-size effects.Another indicative explanation is that the growth exponent  is not yet a constant but effectively varies with .
Additionally, one looks at the structure factor (, ) which is the Fourier transform of the correlation function.This quantity, similarly to the correlation function, collapses when properly rescaled, i.e., by plotting  (, )ℓ() −3 versus ℓ() as shown in Fig. 2(c).The data for different sizes collapses quite well and shows a clear power-law decay with Porod's exponent  + 1 = 4.
Finally, we present the characteristic length ℓ() versus  on a log-log scale in Fig. 3(a) for system sizes  = 128, 256, 512, 1024, 1536, 2048.Initially, ℓ() grows independently of the system size  as individual domains can grow unrestrictedly.It is only when ℓ() reaches a value of the order of magnitude of  that domain growth is hindered by the finite nature of the lattice which shows itself in the form of finite-size effects that end in a stagnation of the growth for that system size.The solid line shows the asymptotic growth law ℓ() ∼  1/2 as predicted, where one clearly sees that this is not parallel to the data for late times.To get a more detailed impression of this, we show in Fig. 3(b) the instantaneous exponent that is, the local slopes in the top panel of Fig. 3(a).At very early times ℓ() grows like   with   compatible with 0.35 − 0.40.When only considering lattice sizes up to  = 256, then only this behavior can be seen as was the case in Refs.[13,14,16].Reference [27] observed  0.43 using a lattice size of  = 512, which is in good agreement with our measurement for this size.For  = 1024 we observe   ≈ 1/2 for a short time, and for  = 1536 and  = 2048 we observe an exponent   > 1/2, which is completely unexpected from existing simulations and theory.As the two largest system sizes show the   () > 1/2 signal before the onset of finite-size effects in either system size, we conclude that this signal should persist at these times ( i.e.,  ∈ [2 × 10 4 , 7 × 10 4 ] ) as  → ∞.We conjecture that the aforementioned contribution to the domain growth from the annihilation of domains inside domains may be the cause of this superdiffusive behavior with   > 1/2.
To assure that this behavior is not an artifact from the concurrent spin update caused by the checkerboard decomposition of the system, we repeated our measurements of the domain size ℓ() with a number of different update algorithms including an efficient -fold way simulation [28] for system sizes up to  = 1536 and found agreement within error bars; see Supplementary Discussion II for a comparison of the results from the algorithms and Supplementary Methods I for a discussion of their implementations.
By studying significantly larger system sizes than available in the literature, we thus discover yet another twist in the coarsening story of the three-dimensional Ising model at zero temperature.We find strong evidence for   () at least pre-asymptotically taking values significantly larger than 1/2 which is in conflict with previous numerical conjectures that  = 1/2 using smaller systems [23,24]; thus again challenging our understanding of the dynamics in this simple model.(The maximal value obtained for   exceeds 1/2 by four [three] times the standard error for  = 2048 [ = 1536].)The structure of the domains has been described as sponge-like [20] or fractal [27].Anomalous diffusion, including both suband superdiffusion, is a well known phenomenon on fractal structures [29].Hence, we believe that the peculiar structure of domains found in this coarsening problem is both the reason for the early time behavior with   < 1/2 and the late-time stage with   > 1/2.It is nonetheless possible, that we recover  = 1/2 in the thermodynamic limit, that is in the double limit of  → ∞ and  → ∞.
To test our intuition that the sponge-like behavior is responsible for this superdiffusive growth, we carry out one further test.We replace the initial high-temperature random configuration by an artificial sponge structure  to probe its effect on the dynamics.As a prototypical sponge structure we use Menger sponges [30], the three-dimensional generalization of Sierpinski carpets [see Fig. 4(a)].The starting configuration [see Fig. 4(b)] of dimensions  3 is created by repeating sponges of  th iteration of size (3  ) 3 .For each sponge we pick uniformly at random whether the Menger structure is represented by up or down spins.
We carry out a zero-temperature quench on these structures in the same manner as before but using the fold way update [28] (see Supplementary Methods I C for more detail) instead as this choice avoids potential interference of the Menger sponge structure with the structure of the checkerboard decomposition.From this we obtain the characteristic length scale ℓ() presented in Fig. 4(c).Clearly, the significant differences between  = 0 (corresponding to our case from before, i.e., a quench from  = ∞) and higher-order fractals with  ̸ = 0 become more pronounced the larger .We note two key effects: On the one hand, at early times the dynamics for  ̸ = 0 is much slower than in the  = 0 case and on the other hand, at later times it becomes much faster than the original dynamics and clearly exceeds a growth governed by ∝  1/2 .From this we learn that indeed sponge-like structures can cause anomalously slow early dynamics followed by superdiffusive growth at later times which is reminiscent of our observation when quenching the threedimensional Ising model from infinite to zero temperature.
To conclude, we have simulated zero-temperature coarsening of the three-dimensional Ising model with nearest-neighbour interactions.For this model, the growth exponent of the characteristic length scale is predicted to be 1/2, whereas most simulations previously suggested a smaller exponent ≈ 1/3.Using a highly efficient GPU implementation, we simulate this process and are able to go to linear system sizes of  = 2048, i.e., over 8 billion spins.This allows us to monitor late times which previously were not accessible and we discover a previously unknown superdiffusive growth behavior which we attribute to the annihilation of sponge-like structures emerging at early times.
Although we expect 1/2 for the growth exponent in the long-time limit, we cannot fully verify this expectation.This is due to the presence of pre-asymptotic effects at late times even for very large systems.Based on preliminary investigations (not presented here) we are confident that we may get access to the necessary hardware to study even larger systems, i.e.,  = 4096, in the near future.Additionally, very recent work [31] reported on the anomalously slow growth prevailing even for quenches to  > 0 as long as the quench temperature is well below the roughening transition temperature   .We will investigate in future work whether also the superdiffusion-like behavior is seen at these temperatures.
The idea of making use of GPUs for nonequilibrium investigations using MC simulations is expected to spark investigations of bigger systems also for related spin models.While GPU implementations of MC simulations of spin models have been used and studied in the equilibrium context for several years [32][33][34][35][36][37][38][39], the potential of application of this approach to nonequilibrium simulations has not been fully realized.One possible explanation for this neglect in nonequilibrium studies is the necessary reliance on checkerboard decomposition to speed up the simulations on the GPU, which some may suspect to introduce artifacts in the dynamics of the simulation.However, with our work we demonstrate that this fear appears to be unfounded and various GPU update methods are indeed suitable for nonequilibrium studies.

Spin updates
In the following we discuss the checkerboard update as an alternative to the random-site-flip update.For details on further alternative update methods, we refer to Supplementary Methods I.

Random-site-flip update
The random-site-flip (rsf) update is the most straightforward method to perform MC simulations and to study coarsening in the Ising model.In each MC step one chooses a site  at random and proposes to flip the spin   ∈ {−1, +1}.Based on the change in energy ∆ attributed to the proposed move, in general for non-zero temperature  it is accepted with the Glauber acceptance probability [40]  acc (∆, where the Boltzmann constant   usually is set to unity to fix units.In the limit  → 0 this simplifies to which is the acceptance probability we use throughout. =  3 such MC steps are referred to as one MC sweep (MCS), where  is the linear lattice size.
Clearly, this approach is rather inefficient as i) a significant amount of computing resources is wasted on proposing moves with ∆ > 0 [20], which always are rejected, and ii) because it is inherently sequential making it hard to parallelize the algorithm.

Checkerboard update
The key idea of many domain-decomposition spin update algorithms such as the checkerboard update is that with local, i.e., short-range, interactions only, the lattice can be decomposed into sub-lattices such that spins of the same group do not interact with one another.
In the case of the two-dimensional square lattice with only nearest-neighbor interactions one of the simplest such decompositions looks like a checkerboard (see Fig. 5), hence the name of the method.One MC sweep then consists of i) updating all red spins concurrently with /2 parallel threads, followed by ii) updating all blue spins concurrently with /2 parallel threads.Equivalently, one may choose to update all blue spins first (see Supplementary Methods I B for more detail).Each proposed spin flip is accepted with the same probability as before [see Eq. ( 6)] and the only difference as compared to the rsf update is the order in which updates are proposed.The generalization to  = 3 is conceptually straightforward, see the right panel of Fig. 5.
This update scheme is particularly suited for an implementation on graphics processing units (GPUs).GPUs have several thousand threads which can be used to update the independent spins in parallel.Our implementation in CUDA for this update scheme is based on the code from Ref. [25] although heavily adapted as the authors optimized their code for > 1000 simultaneous simulations of small systems on a single GPU.In contrast, we simulate a single large system per GPU.Additionally, Ref. [25] considers the two-dimensional Ising model.Hence, the respective parts of the code have been modified accordingly.

Calculating the correlation function
Naïve calculation of the correlation function defined in Eq. ( 2), i.e., involves a double summation over all spins requiring ( 2 ) time.Using a Fast Fourier Transform (FFT) allows the calculation of ⟨   + ⟩ in ( log  ), i.e., where ℱ is the three-dimensional discrete Fourier transformation operator and the overline stands for an average over , exploiting the translational invariance.(, ) shown in the main text is then obtained by radially averaging over the three-dimensional correlation matrix.
In standard FFT routines two double values are used per (spin) site both for input and output, requiring thus 4 × 8 bytes per spin.For  = 2048 this amounts to 4 × 8 × 2048 3 = 2 38 bytes = 256 GB RAM necessary to carry out the FFT which on modern CPU computing nodes is possible but still quite restrictive as it limits the number of simulations which can be run in parallel on the same node.We use the FFTW library [41] which allows for in-place calculation such that the memory footprint is cut in half.Further, spin variables only take the values −1 and +1, and |ℱ  | 2 in Eq. ( 7) only real values.Hence, the realdata discrete Fourier transform (DFT) routine can be used for both transforms which reduces the used memory by another factor of two by using that the resulting DFTs are Hermitian.This allows the input to be stored in  double values and the output to be stored in /2 complex (= two doubles) values.When using in-place real-data DFT, this brings the memory footprint down to eight bytes per site such that for the FFT of one 2048 3 lattice only 64 GB RAM are necessary which are readily available on our compute nodes.However, already for the next bigger system size, i.e.,  = 4096, we can no longer compute the FFT in the same manner as 64×8 = 512 GB RAM are not available to us.
Additionally, FFTW supports multi-threading such that we can speed up the calculation by a factor of about 10 compared to the sequential algorithm.
We study coarsening, a nonequilibrium process, in the three-dimensional Ising model with non-conserved order parameter by means of Monte Carlo (MC) simulations.As it is not uniquely specified how to perform spin updates, there are several ways to realize the underlying Markov chain.In the case of nonequilibrium investigations clearly one cannot take advantage of non-local updates such as the Wolff cluster algorithm.Nonetheless, for spin models and local updates, the way in which the single-spin updates are proposed is not stipulated.Instead, one has a choice to adapt this, e.g., to speed up simulations.In the following we present different approaches to implement spin updates alternative to the random-site-flip update and checkerboard update which are discussed in the main text.At the end of this section we will check numerically that all the considered approaches produce dynamics that are compatible (within error bars) up to a rescaling of time by a constant factor.

A. Double-checkerboard update
Weigel [1] introduced a slightly more involved domaindecomposition scheme which aims at being more efficient on GPUs than the standard checkerboard approach.On GPUs, threads are organized in so-called thread blocks that can synchronize locally whereas threads from different blocks essentially cannot communicate within one execution of the GPU kernel.For the performance of the algorithm it is beneficial to use the much faster shared memory of these blocks rather than the global memory available to all blocks.However, the shared memory is too small to even fit systems of moderate size.Thus, the idea of the double-checkerboard decomposition is to divide the system up into a coarse checkerboard (red and blue blocks in Fig. 1) of small sub-systems which still can fit in the shared memory.These can then be updated by a thread block using the standard checkerboard (lighter and darker sites in Fig. 1).

FIG. 1. Example double-checkerboard decomposition in
One MC sweep using the double-checkerboard scheme based on the decomposition depicted in Fig. 1 then consists of the following steps: 1. Start GPU kernel on red sublattice.Our implementation is mostly based on the publicly available code of Ref. [1].Similar to checkerboard discussed in the main text, the generalization to three spatial dimensions was performed.

B. Variants of the single-and double-checkerboard update
A priori it is not immediately clear that any of the described domain decomposition updates should exhibit arXiv:2305.12161v1[cond-mat.stat-mech]20 May 2023 the same kind of coarsening dynamics as the rsf update.In fact, there are at least three problems we are aware of regarding this type of update: i) In contrast to rsf, no spin is proposed to be flipped more than once within a sweep, ii) the violation of detailed balance, and iii) extreme non-ergodicity at infinite temperature.
Regarding problem i), using the rsf update the chance for every spin in a system of  spins to be proposed to be flipped exactly once during a sweep is  !/  which approaches √ 2  − for large  .For a 2 3 system this is approximately 0.2% and already for 3 3 around 2 × 10 −11 .Thus, for the system sizes we study, it is nearly impossible not to have a repetition.Through generalization of the birthday problem [2] we expect a repetition after ( √  ) proposals, that is ( √  ) repetitions per sweep.
Problem ii) can in principle be circumvented.Potter and Swendsen [3] previously reported a violation of detailed balance on a broad set of update algorithms such as the sequential (typewriter) update and the checkerboard update.While in the context of nonequilibrium detailed balance is not a really meaningful concept, it led us to wonder whether the checkerboard dynamics that violates detailed balance might be different from the rsf dynamics satisfying detailed balance.Reference [3] further introduces variants of the algorithms which they prove do satisfy detailed balance.In the standard checkerboard algorithm the two sublattices are always updated in the same order, which they refer to as the 01 cycle.They claim that through a simple change, namely choosing at random which sublattice to update first, detailed balance can be reestablished.Note that still every spin is proposed to flip exactly once per sweep.This is referred to as mixed cycle.For the double-checkerboard the order of the coarse checkerboards (red or blue first) is chosen at random once per sweep and the order of the fine checkerboard (lighter or darker sites first) is chosen at random and independently for each (red or blue) block and once per sweep.Reference [3] proves that detailed balance is reestablished in this way in a general setting that entails many domain decomposition schemes including the double-checkerboard although not explicitly mentioned.
Lastly, at infinite temperature and using the Metropolis acceptance criterion  acc (∆,  ) = min{1,  −Δ/   } all spin flip proposals are always accepted leading to problem iii).Consequentially, when every spin is proposed to flip once per sweep then one sweep consists of inverting the lattice.Hence, the system just switches between two states.Technically, this problem does only occur at infinite temperature but nonetheless causes large autocorrelation times at large finite temperatures.Reference [4] comments on this issue but using the typewriter update instead of the checkerboard update.There spins are also proposed once per sweep (but in lexicographical order), and hence suffering the same fate as the checkerboard update at infinite temperature.

C. 𝑛-fold way update
At low temperatures, the problem of high rejection rates is well known (see, e.g., ch.3.4.2 in Ref. [5]).Particularly in the case of  = 0, many proposed spin flips will be rejected with 100% certainty.
Already in the 1970s this problem was tackled by Bortz et al. [6] with the introduction of the rejectionfree -fold way update algorithm.They recognized that instead of selecting each site with the same probability (as is the case in rsf) the chance of a site to be selected can be weighted by the acceptance probability of a spin flip which is then always accepted.This can be implemented very efficiently for systems with discrete spin variables and local interactions only by organizing spins in  spin classes of equal acceptance probability.In our case  = 2 × (6 + 1), as each site can take two values and the sum of its six nearest neighbor's spin values can take seven different values.To have a unit of time in MCS and not the number of successful flips one increments time after every flip by a geometrically distributed random variable with mean / ∑︀    ([∆]  ) where   (∆) being the usual acceptance probability if rsf were used [7].
One way of improving the performance of -fold way is to use continuous time steps instead of geometrically distributed ones, that is to increment time by the expected time.The difference between summing over many geometrically distributed variables and summing over their means is negligible on the time scales considered here which motivates this simplification.
The -fold way update has been used to study coarsening before, see for example Refs.[6,8,9].In the case of the  = 0 coarsening in  = 3, Olejarz et al. [10] previously have used a very similar approach.Instead of complete rejection-free updates they pick at random from the set of allowed spin flips, i.e., they randomly propose to flip a spin from the set {  |[∆]  ≤ 0}.If ∆ < 0 the move is immediately accepted and otherwise if ∆ = 0, the move is accepted with 50% probability.In both plots the color of the line encodes the used method and the line type and symbol the system size.Note that for all methods (except rsf and -fold) time is by a constant factor for better comparison (cf.Table I).For sake of readability, only every fifth data point is shown and lines are a guide to the eye connecting both shown and not shown points by straight lines.Error bars correspond to the standard error.

II. SUPPLEMENTARY DISCUSSION: NUMERICAL COMPARISON OF ALTERNATIVE SPIN UPDATES
In the following we present data obtained through the various different spin update schemes outlined above, where for the double-checkerboard we have tested two different cycles one of which preserves detailed balance.Every dataset is averaged over at least 40 independent realizations.The autocorrelation time is sensitive to the type of spin update employed.Similarly, the time scale in coarsening is affected by the chosen update.We used the dynamics from the rsf update as a base and then rescaled all other observation times by a factor specific to the used method resulting in a rescaled time scale , see Table I for the factors.In particular all simulations show a growth exponent as large as ≃ 0.55 at late times.Using the -fold way update, only linear system sizes of up to 1536 were accessible to us, which also display the anomalous growth exponent larger than 0.5.Hence, we can rule out that the larger exponent is a mere effect from the parallelized GPU update algorithms.Furthermore, this shows that the observation is in fact relevant in the thermodynamic limit and not just a finite-size effect.To check our fold way update implementation for correctness we used the rsf with which systems of linear size up to about 256 are (easily) accessible.Note that due to the significant computing time required to simulate systems as large as  = 1536 3 on CPUs we refrained from simulating past the onset of finite-size effects which is why the  = 1536 line ends earlier than the other data sets displayed.
Finally, we run 1000 simulations with system size  = 256 once using the (01-cycle) checkerboard update and once using the -fold way update.The first method is the one we used to obtain the data presented in the main text and the latter is known to obey the same dynamics as rsf Glauber dynamics.In Fig. 3 we present a histogram of the measured domain sizes at a rescaled time 4000 MCS (well before the onset of finite-size effects).The histograms appear to be in good agreement.Note that for all intercepts, the length scale is rescaled by a constant factor for better comparison (see Table II).For sake of readability, only every fifth data point is shown and lines are a guide to the eye connecting both shown and not shown points by straight lines.Error bars correspond to the standard error.

III. SUPPLEMENTARY DISCUSSION: COMPARISON OF DIFFERENT INTERCEPT VALUES AND EFFECT ON THE CALCULATED DOMAIN SIZE
In the following we repeat the measurement of the characteristic length scale ℓ() and vary the chosen intercept  with (, ).ℓ() is then the distance  at which (, ) intercepts .We use the (01-cycle) checkerboard update (same as in the main text).
Figure 4 shows the different results for (a) ℓ() and (b)   ().The characteristic length scales ℓ() for the various intercepts differ only by a constant factor and exhibit approximately the same   ().To better show this, ℓ() is rescaled by this constant (independent of time and system size) for each intercept.Note, that this constant does not affect the obtained value for   .The length scales obtained in this way are compatible with each other, thus confirming that the exact choice of  is of little importance.Most importantly, note that all datasets for  > 1 024 and any intercept  show   > 0.5 at late times, thus confirming our observations in the main text.The times  in Fig. 4 correspond to the not rescaled times.Hence, the data can readily be compared to Fig. 3 in the main text but times differ by a factor of 1.75 to the ones in Fig. 2 in this supplement.The factors by which the estimates for ℓ from each intercept were multiplied such that the data sets collapse are compiled in Table II.Further, Table II shows the length scale ℓ * at which ℓ() would saturate without rescaling.

IV. THREE-DIMENSIONAL VISUALIZATION
In Fig. 5 we show snapshots for different times in the evolution of the spin configuration from a single -fold way simulation run with  = 1536.In this case we chose the -fold way update instead of the checkerboard in an attempt to check whether any (obvious) structural differences can be seen.We make use of the software Visual Molecular Dynamics (VMD) [11] to illustrate the threedimensional spin configurations.In the cubic representation (left column) a spin is represented either by a cube or the lack thereof.To make the plotting tractable, we choose to represent the minority direction as a cube (such that we always have to display less than /2 cubes).This representation directly shows the shape of typically encountered domains, but most of the inside of the lattice is hidden.For a better illustration of the inside of the lattice, in the interface representation (center column) we only draw lines connecting the spins at the domain boundaries.The color of the interfaces represents the distance from the center of the lattice (this is artificial, since periodic boundary conditions apply), where we employ a hot (red, center) to cold (blue, border) color palette.This allows for a deeper look inside the lattice and highlights the nature of the domain boundaries, in particular the sponge-like structure.Finally, we show two-dimensional cuts through the lattice (right column).

𝑡 = 10 4 FIG. 1 .
FIG. 1. Visualization of the three-dimensional Ising configurations at different times of the quench.Crosssections (top) and three-dimensional snapshots (bottom) of the spin configuration for  = 2048 at various times in units of Monte Carlo sweeps (MCS).The cross-sections in the top panel are cuts through the full lattice, where down-spins (the minority direction in this realization) are colored green.The marked square subsections of linear extension  ≃ 10 ℓ() are shown in the bottom panels as three-dimensional visualizations highlighting the interfaces between domains.The color (red-white-blue) indicates the distance from the center of the subsection.For details on the visualizations see Supplementary Section III and for snapshots at more times see Supplementary Fig.5.

FIG. 3 .
FIG. 3. Length scale as a function of time and instantaneous growth exponent.(a) Length scale ℓ() for quenches to  = 0 in  = 3 spatial dimensions with linear size  = 128, . . ., 2048 on a log-log scale.The black solid line shows the expected power-law behavior of ℓ() ∼  1/2 .(b) Instantaneous exponent  is shown against  for the same data.Here the -axis is logarithmic to highlight the large  behavior where  takes values consistently larger than 1/2.Error bars correspond to the standard error.

FIG. 4 .
FIG. 4. Results for zero-temperature coarsening using an artificial sponge structure as starting configuration.(a) Third iteration ( = 3) Menger sponge of size 27 3 .(b) 128 3 initial Ising configuration consisting of third iteration Menger sponges, red (blue) corresponding to up (down) spins.(c) Length scale ℓ() of a quench to  = 0 starting from such an artificial configuration using Menger sponges of  th iteration and using  = 512.Error bars correspond to the standard error.

FIG. 5 .
FIG. 5. Checkerboard decomposition.All red (blue) sites can be updated simultaneously as they only depend on blue (red) sites.(a) Checkerboard decomposition in two spatial dimensions.(b) Generalization to three spatial dimensions.

7 .
Writing changes to global memory.8. Start GPU kernel on blue sublattice and repeat steps 2 to 7.

FIG. 2 .
FIG. 2. (a)Length scale ℓ() and (b) instantaneous growth exponent () for various spin update methods obtained for quenches to  = 0.In both plots the color of the line encodes the used method and the line type and symbol the system size.Note that for all methods (except rsf and -fold) time is by a constant factor for better comparison (cf.TableI).For sake of readability, only every fifth data point is shown and lines are a guide to the eye connecting both shown and not shown points by straight lines.Error bars correspond to the standard error.

FIG. 3 .
FIG.3.Empirically determined domain-size histogram for  = 256 at rescaled time  = 4000 MCS using the -fold way update and the (01-cycle) checkerboard decomposition.Data was obtained through 1000 independent runs for each method.Dashed vertical lines show the mean value of the two histograms, that is 19.8 for -fold way and 19.9 for the checkerboard update.

Figure 2 (
Figure 2(a) shows the domain size and (b) the instantaneous growth exponent as a function of (rescaled) time .It can be seen that within error bars all the methods produce compatible results.

FIG. 4 .
FIG. 4. (a) Length scale ℓ() and (b) instantaneous growth exponent () for various choices of the correlation-function intercept .In both plots the color of the line encodes the used intercept and the line type and symbol the system size.Note that for all intercepts, the length scale is rescaled by a constant factor for better comparison (see TableII).For sake of readability, only every fifth data point is shown and lines are a guide to the eye connecting both shown and not shown points by straight lines.Error bars correspond to the standard error.

𝑡 = 6 . 4 × 10 4 𝑡 = 1 . 4 × 10 5 𝑡 = 1 . 8 × 10 5 𝑡 = 5 . 8 × 10 5 a
Instead of the full 1536 3 snapshot we show a 1024 3 subsection for the earliest time.This is on the one hand because otherwise the small features are hard to see and on the other hand because VMD appears to fail to handle the large number of "atoms" present in these early-time configurations.The cross-section shown in the third column has length  = 1536 and the 1024 square highlights the region visible in the 3D representations.

FIG. 5 .
FIG.5.Visualizations of snapshots obtained during a quench using  = 1536.The plane cut is a horizontal slice from the three-dimensional structure, which for  = 5.8 × 10 5 is highlighted in the three-dimensional representations as the blue planes.

TABLE I .
Factors by which simulation times in Fig.2were multiplied to compare them.A higher number implies quicker dynamics.

TABLE II .
Rescaling factors and saturation lengths for different intercepts.The middle colum shows the factors by which the length scales in Fig.4were multiplied to compare them.The right column shows the length ℓ * at which the not rescaled ℓ() estimate saturates.