Water clusters and density fluctuations in liquid water based on extended hierarchical clustering methods

The microscopic structures of liquid water at ambient temperatures remain a hot debate, which relates with structural and density fluctuations in the hydrogen bond network. Here, we use molecular dynamics simulations of liquid water to study the properties of three-dimensional cage-like water clusters, which we investigate using extended graph-based hierarchical clustering methods. The water clusters can cover over 95% of hydrogen bond network, among which some clusters maximally encompass thousands of molecules extending beyond 3.0 nm. The clusters imply fractal behaviors forming percolating networks and the morphologies of small and large clusters show different scaling rules. The local favored clusters and the preferred connections between adjacent clusters correspond to lower energy and conformational entropy depending on cluster topologies. Temperature can destroy large clusters into small ones. We show further that the interior of clusters favors high-density patches. The water molecules in the small clusters, inside which are the void regarded as hydrophobic objects, have a preference for being more tetrahedral. Our results highlight the properties and changes of water clusters as the fundamental building blocks of hydrogen bond networks. In addition, the water clusters can elucidate structural and density fluctuations on different length scales in liquid water.

www.nature.com/scientificreports/ Liquid water is a dynamical mixture of tetrahedral-like and "ring-and-chain"-like structures with a slight bias toward the former 19,28 , forming three-dimensional hydrogen bond network continually undergoing topological reformation 3 . Water clusters, an assembly of weakly-bound water molecules, may be formed in hydrogen-bonded networks by locally favored structures with local energy minima 29,30 . However, the microscopic structures of liquid water have remained a hot debate despite a lot of intensive research 15 . Graph theory is widely used to define topological structures as the building blocks of hydrogen-bonded networks [31][32][33][34][35][36][37] . Many investigations have introduced the definitions of rings 33 , fragments 34 and clusters 35 . The rings, the cyclic paths along hydrogen bonds, show a broad distribution, among which 6-membered rings are most favored at ambient temperatures [31][32][33]38 . Fragments, three-dimensional cage-like building blocks constituted by rings, can elucidate the heterogeneity of hydrogen bond structures and rearrangements 34 . Topologically, water clusters also comprise small cyclic water clusters such as tetramers, pentamers and hexamers 39,40 . The topology of water clusters strongly affects the structure and chemistry of clusters 41,42 . The composite patterns of how flickering water clusters form bulk water remain a mystery despite several explorations [43][44][45][46] . A hierarchical clustering method 47 is proposed to search out hierarchical water clusters in the hydrogen bond network. Hydrogen bonds, rings and fragments are defined as 1st-, 2nd-, and 3rd-level structures, respectively. Water clusters can be successively attained by using Louvain algorithm based on the network of fragments.
The local topological structures are closely related with density heterogeneity in liquid water at microscopic scale 48 . Many special structural patterns have been proposed to describe water structures and explain many anomalous properties [44][45][46][48][49][50][51][52] . Based on hydrogen bonds, fluctuating water clusters containing 280 water molecules with local icosahedral symmetry were proposed by Chaplin 46 . The networks contain a mixture of hexamer and pentamer substructures and convert between lower-and higher-density forms without breaking hydrogen bonds. A pair of liquid structures, molecular chains as high-density liquid structures and fused dodecahedra as tetrahedral fluctuations, are defined to reproduce physical properties 49 . In addition, a mixture of two types of helical clusters are also regarded as building blocks 44 . The empty spaces with hydrogen bond networks are strongly related with the nature of structural heterogeneities in liquid water [48][49][50][51][52] . Density fluctuations form a diverse morphology of voids, which can allow for the identification of low-and high-density patches of the liquid 48 .
In this paper, we aim to investigate the evolution of water clusters and the structural origin of density fluctuations using simulations with SPC/E. In our earlier work, we propose a graph-based hierarchical clustering methods 47 to identify three-dimensional cage-like water clusters jointed by small-sized rings. Here, we further improve the hierarchical clustering methods to define water clusters covering most of hydrogen bond networks. Using classical molecular dynamics simulations, we study on how the water clusters changes at ambient temperatures. The distributions, morphologies, energetic and topological properties of water clusters are analyzed in detail. The characteristic structures of water clusters are further regarded as fundamental building blocks of hydrogen-bonded networks. By examining local number density and structural descriptor, the density fluctuations in the vicinity of water clusters are detected, proving that the water clusters can examine density fluctuations in the hydrogen bond networks, especially for small clusters.

Results
Molecular dynamics simulations and hierarchical clustering method. Molecular dynamics simulations are performed using periodic boundaries with 181,914 water molecules in a cubic box interacting through SPC/E water potential 53 . All of the simulations were run at temperatures of 278 K, 298 K and 318 K and a pressure of 1.0 atm with a Nose-Hoover thermostat and barostat. The time step was 1.0 fs. The systems were equilibrated for 2.5 ns in an NPT ensemble before outputting.
The hierarchical clustering methods 47 respectively regard hydrogen bonds, rings, fragments as first-, secondand third-level sub-structures in the hydrogen bond network. The ring size is limited to 8 and the fragments analyzed contain 3, 4, and 5 rings. The graph community is a structure in which the nodes have a higher density of connections within groups than between them. Using Louvain algorithm 54,55 , the best graph communities are defined as water clusters, symbolizing the regions where the rings or specific structures are concentrated.
The alpha shape 56 can describe the shape for water cluster by exploring the polytope wrapped by outermost water molecules. The volume V and surface area A of water clusters are defined as the volume and surface area of the alpha shape of clusters.
Cluster statistics, morphologies and energy. The definition of ring-based clusters can intuitively describe the regions of cage-like local structures. In the hydrogen bond networks, over 95% of water molecules and over 60% of global box are involved in the formation of water clusters. Hence, the ring-based clusters can be regarded as specific microscopic structures in liquid water. It is essential to study the properties of water clusters to elucidate how the cage-like structures influence the hydrogen bond networks, and even physical properties of liquid water.
At ambient conditions, the hydrogen bond networks can form percolating networks which contain infinite hbond-based clusters spanning periodic cubic simulation box at least in one direction 57 . But in our definition, the ring-based networks do not have infinite clusters. The largest clusters only comprise more than 1% molecules in the box. Most of clusters are rather small, among which 6-and 9-molecule clusters are most favored in networks, as shown in Fig. 1a. Small clusters are mainly single-fragment clusters. Similar with hbond-based cluster distributions 57 , large ring-based clusters distribution also obeys a power law suggesting fractal behaviors. The distributions of S > 100 form a shoulder at low temperature.
Cluster morphologies, closely related with the voids inside clusters, allow for density fluctuations in the hydrogen bond network 48 . Some examples in Fig. 1b show various cluster morphologies. Small clusters tend to form spherical shapes and large clusters easily shape irregularly. As illustrated in Fig. 1c www.nature.com/scientificreports/ A for small clusters imply optimal cluster shapes implying subtle balance between hydrogen bonds enthalpies and structural distortions. But large clusters have broader distributions of V and A without distinct peaks suggesting complex morphologies. Meanwhile, as shown in Fig. 1e, maximum diameters of clusters L, the longest distance between two molecules in a cluster, can indicate that structural heterogeneity occurs on a length scale of more than 3.0 nm. Cluster energy plays an important role on the relationship between cluster structures and energy fluctuation in previous researches. The clusters have local energy minima based on quantum-chemical calculations and molecular dynamic simulations 59,60 . The energy of water clusters E S is defined as the sum of interaction energy between molecules in the clusters. Under our definition of ring-based clusters in bulk water, the cluster energy E S is almost directly proportional to cluster size S in form of E S = a × S + b. The equation E S /S = a + b/S indicates that larger clusters may averagely have lower molecular energy E S /S, as shown in Fig. 2a. At the same time, cluster energy is mainly determined by hydrogen bonds number and cluster topologies related with the quality of hydrogen bonds, as illustrated in Fig. 2b. The formation of low-energy structures suggests local favored topologies, explaining that energy fluctuations in the hydrogen bond networks may be greatly connected with structural arrangements and dynamics.
Temperature can greatly change the properties of ring-based clusters. Cluster distributions have an isosbestic point S c = 230-270 dividing cluster size with opposite temperature responses. With heating up, the cluster numbers decrease when S > S c but increase when S < S c , suggesting that large clusters are destroyed into small ones and as a consequence average number of water clusters obviously increases. The isosbestic points are determined by box sizes shown in part IV of supplemental material. Then, it also indicates that the agglomeration of cage-like structures also tends to break up in the ring-based networks. Low temperature can cause the formation of lowerenergy structures. In terms of E S = a × S + b, at 278 K, 298 K and 318 K, the parameter a is equal to − 12.92 eV, − 12.37 eV and − 11.81 eV, respectively, and b is equal to 65.49 eV, 60.60 eV and 54.12 eV.  www.nature.com/scientificreports/ The percolating threshold demonstrates the existence of percolating transition forming infinite hydrogenbonded clusters, which closely relates with phase transition, especially liquid-liquid transition in supercooled water 52,57 . At ambient conditions, the ring-based networks do not reach percolating threshold forming infinite cage-like clusters. The ice can form infinite ring-based clusters because the networks are tessellated by tetrahedral structures. It implies that percolating transition of ring-based networks may cause the formation of infinite cluster at lower temperature than hydrogen bond networks. The infinite clusters can represent specific chain-or sheetlike structure of spanning hydrogen-bonded network at mesoscopic scale. Formation of giant cluster structures may affect the properties of liquid water which are sensitive to mesoscopic structures 57 . The ring-based clusters also showing scaling behaviors of distributions and large cluster distributions also obeys a approximate power law n S ~ S −τ , where n S denotes cluster number and S denotes molecular number in a cluster. The fractal dimensions τ are roughly equal to 2.44, similar to exponents τ = 2.20 in the case of hydrogen bond networks 57 .

Scaling behaviors of cluster distributions and morphologies.
The complex morphologies of water clusters also show underlying fractal structures. Both cluster volume V and surface area A scale well with cluster radius r = L/2 illustrated in

Structural heterogeneity of water clusters.
In the hydrogen bonded networks, the water clusters represent ring-concentrated regions where different constitution of ring structures implies structural heterogeneity closely related with the arrangement of hydrogen bond networks.
The spatial distributions of clusters suggest microscopic local structures between clusters. The RDFs (radial distribution functions) of oxygen-oxygen atoms in the clusters have several peaks at 2.7 Å, 4.3 Å and 6.5 Å similar with that of all oxygen atoms and gradually approach zero, as illustrated in Fig. 5a. For small clusters, the RDF peaks are located at 0.5 Å, 3.6 Å and 7.2 Å in Fig. 4d, respectively representing adjacent water clusters. It indicates that structural inhomogeneity around ring-concentrated regions reveal weak structuring up to ~ 1 nm.
The surface rings among adjacent clusters have significant effects on the compatibility among adjacent fragments. The rings connecting two fragments in a cluster are defined as body rings, and the others are defined as surface rings. The 55% of 4-membered rings and 45% of 6-membered rings bridge adjacent fragments. The surface rings, where most hydrogen bond arrangements occur, dominate at high temperature. When the clusters are formed, the defects are transformed from their interior to surface area. The dynamics of surface area can cause the growth or decay of ring-concentrated regions 34 .

Water clusters and density fluctuations in the hydrogen bond network. The water clusters can
represent ring-concentrated cage-like structures. We explore how the water clusters are related with density fluctuations and then prove that water clusters can be regarded as characteristic structures to describe inhomogeneous patches of hydrogen bond networks on different length scales.
First, local density (ρ local ) near the center of clusters is a significant parameter to quantify density fluctuations around cluster structures. The local density (ρ local ) around the center of a cluster is defined as the number of oxygen atoms inside a probe sphere with a radius of 4.6 Å 48 . The bulk number density at 298 K fluctuates at approximately 0.0334 molecule/Å 3 . The small and large water clusters respectively have S ≤ 10 and S ≥ 100. The average maximum cluster radius is approximately 3.85 Å and 19.40 Å. www.nature.com/scientificreports/ As shown in Fig. 6a, high-density patches exist in the interior of clusters. For small clusters, the distribution of densities reaches the peak at 1.8 Å and the minima near clusters surface. High-density patches is ~ 3% larger than bulk density. The regions outside the interior of large clusters are still larger than bulk density. In Fig. 6b, for small and large clusters, 64.8% and 64.7% local densities are larger than the bulk density at density peak compared with 51.3% and 57.4% at minima, which explain the discrepency of density at different regions. Figure 6c shows an instantaneous snapshot of large cluster at 278 K, illustrating spatial inhomogeneities of different types of clusters.
The formations of structuring of water molecules have an implication on hydrophobic effect in liquid water 65 . The hydrophobic solvation may create an empty cavity in the hydrogen bond network and form stronger interhydrogen bonds 66 . To eludicate how water molecules behave inside the clusters, the local structure index (LSI) 25 has been introduced to characterize local structures in liquid water 7,12,22,48 , measuring the extent to which the gap opens between water molecules in the first shell. LSI is defined as, �(i) = r i+1 − r i and is the average over all molecules in the first shell. The molecules are numbered in order of distance from the center molecule, r 1 < r 2 < · · · < r i < 3.7 Å < r i + 1 . Large LSI values represent tetrahedral water molecules, whereas small LSI values correspond to disordered first shells 21 .
In Fig. 7a, the tetrahedrality of water molecules decreases with increasing temperature. The isosbestic point at LSI ≈ 0.03 Å 2 can divide two features of structures, representing the bimodality of LSI distributions having high-density-and low-density-like local structural environments 21 . The average tetrahedrality for water molecules in small clusters is slightly larger than that in large clusters. The voids inside small clusters, can be regarded as hydrophobic objects and hence the water molecules on small cluster surface have a preference for being more tetrahedral. The local structures of water molecules in the clusters show spatial nonuniformity and the tetrahedrality is negatively correlated with densities along the radius of clusters. The high-density patch in the clusters correspond to higher tetrahedrality, implying a more ordered first shell environment near non-polar hydrophobic solutes. It is also consistent with the results that long-range structural fluctuations of patches of four-coordinated molecules form from the liquid to supercooled water 67 .
There is compositional similarity between the water clusters and voids. The water molecules in the clusters can encircle the voids wrapped in the closed rings. The small clusters may form spherical voids; moreover, the large clusters may involve some void clusters. Both structures show similar properties of distribution and morphology. Similar with water molecules around the voids 48 , small clusters reveal similar correlations with density fluctuations. The deeper relationship between water clusters and voids needs to be explored further. www.nature.com/scientificreports/

Conclusions and discussions
The microscopic structures in liquid water remain a hot debate in recent researches. We search out water clusters in the hydrogen bond networks of liquid water using 100-thousand-particle molecular dynamic simulations by hierarchical clustering methods.
In the present paper, we show that there is underlying correlation between hydrogen bond networks and the properties of ring-based clusters within it. We demonstrate that the water clusters can intuitively identify density and structural fluctuations in the hydrogen bond networks, which leads to microscopic interpretations of waters anomalies. In particular, we show that the regions inside the clusters correspond to high-density patches with lower tetrahedrality, especially for small clusters. The small clusters have a preference for having more tetrahedral molecules. It implies a more ordered first shell environment near non-polar hydrophobic objects which is equivalent with the voids in small clusters. And we also show that some clusters are favored by hydrogen bond networks depending on hydrogen bond numbers and cluster topologies. The compatibility of adjacent clusters depends on conformation and size of surface rings bridging them. The structural preference for water clusters also explain the arrangement and dynamics of hydrogen bond networks.
The ring-based clusters in percolating hydrogen bond networks show scaling behaviors of cluster distributions and morphologies forming underlying fractal structures and self-similarity. We show that small clusters are non-fractal three-dimensional objects and large clusters slightly display surface fractals. And large clusters distribution obeys power law with fractal dimension equal to 2.44. It also implies that percolating transition of ring-based networks may cause the formation of infinite cluster at lower temperature than hydrogen bond networks. Also, temperatures largely influence cluster properties and dynamics in the hydrogen bond network. We demonstrate that heating up can destroy large, low-energy clusters into small, high-energy one because of increasing defects of topological structures.
One concern is a lack of mathematical rigor in defining water structures. The simplicial complex in complex network may provide a tool to propose unified mathematical definition of water clusters. Other concern is that the quantitative relationship between the dynamics of water clusters and water properties need to be explored further.
As a fundamental building unit of hydrogen bond networks, water cluster can detect structural and density fluctuations on different length scales in liquid water and also provide a new insight to explain the nucleation process of ice, liquid-liquid coexistence in supercooled water and hydrogen bond structures in the vicinity of a hydrophobic solute. Furthermore, it will be interesting in the future to discretize continuous liquid water as an assembly of large clusters, hierarchically forming a bridge between isolated molecules and bulk matter.

Molecular dynamics simulations.
The 100-thousand molecular dynamics simulations are conducted by LAMMPS software package to explore the properties of water clusters. The water model is modeled using SPC/E rigid water model 53 , which is widely utilized to simulate macroscopic properties and microscopic structures of liquid water. The bond length and the bond angle of water are fixed with SHAKE algorithm. The cutoff of nonbonded interactions is 1.0 nm, and the long-range Coulomb interactions are computed using a particle-particle particle-mesh solver (PPPM). The system contains 181,914 water molecules in a box with periodic boundary conditions. The initial box size is 28.0 nm along the x-direction, 10.0 nm along the y-direction and 20.0 nm along the z-direction. The equations of motion are integrated using velocity-Verlet algorithm 68 with a time step of 1.0 fs. A Nose-Hoover thermostat and barostat are used to control the temperature and pressure, respectively. The equilibration runs using an NPT ensemble are 2.5 ns. The temperatures of the system are kept constant at www.nature.com/scientificreports/ 278 K, 298 K and 318 K, and the pressure is 1.0 atm. The production runs output 500 configurations every 10.0 fs for structural analysis. The density, diffusion coefficient, viscosity and structural properties of simulated water agree well with the results of experiments and other molecular dynamic simulations, and the simulation data are shown in Table 1 and Fig. 1 of the supplementary material.
Extended hierarchical cluster methods. To detect specific topological structures in hydrogen-bonded networks, hydrogen bonds, rings and fragments, clusters are constructed as the first-, second-, third-and fourthlevel structures of the water network using hierarchical cluster methods 47 , among which the clusters are an assembly of three-dimensional cage-like structures. To ensure that water clusters can largely cover hydrogen bond networks, we modify the definition of hierarchical cluster structures and propose extended hierarchical cluster method as illustrated in Fig. 8.
Hydrogen bonds, the first-level structures, are defined using a popular geometric standard in which two water molecules are regarded as hydrogen bonded when the distance between oxygen atoms of one molecule and the hydrogen atoms of another is less than 0.25 nm 70 . The rings, the second-level structures, are cyclic paths along hydrogen bonds investigated by "shortest-path" (SP) criterion 71 . The maximum ring size is limited to 8. The fragments are three-dimensional cage-like building blocks using topological definition proposed by Matsumoto 34 . The fragments combined by 3, 4, and 5 rings are taken into consideration. If two fragments share common rings, they have an edge, forming the network of fragments. The water clusters are identified as the best graph communities using Louvain algorithm 54,55 . Alpha shape of water clusters. The alpha shape is a formalization of the intuitive notion of shape for spatial point sets. An alpha shape is a concrete geometric model which is mathematically well-defined and unique for a given point set. The parameter alpha dictates the level of refinement, with a larger alpha resulting in coarser fits and a smaller alpha in finer fits. The volume and surface area of water clusters are based on the alpha shapes 56 of water molecules in the clusters. The alpha shapes of clusters only take oxygen atoms into consideration. The alpha radius is chosen as 3.5 Å. The formation of alpha shape is implemented in MATLAB 72 .

Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.  www.nature.com/scientificreports/