High resolution multi-facies realizations of sedimentary reservoir and aquifer analogs

Geological structures are by nature inaccessible to direct observation. This can cause difficulties in applications where a spatially explicit representation of such structures is required, in particular when modelling fluid migration in geological formations. An increasing trend in recent years has been to use analogs to palliate this lack of knowledge, i.e., exploiting the spatial information from sites where the geology is accessible (outcrops, quarry sites) and transferring the observed properties to a study site deemed geologically similar. While this approach is appealing, it is difficult to put in place because of the lack of access to well-documented analog data. In this paper we present comprehensive analog data sets which characterize sedimentary structures from important groundwater hosting formations in Germany and Brazil. Multiple 2-D outcrop faces are described in terms of hydraulic, thermal and chemical properties and interpolated in 3-D using stochastic techniques. These unique data sets can be used by the wider community to implement analog approaches for characterizing reservoir and aquifer formations.


Background & Summary
Sedimentary reservoirs exhibit a rich diversity of composition and topology of internal structures, with physical and chemical properties varying significantly in space. As a result, the description of flow and transport processes can be challenging, affecting subsequent understanding of groundwater contaminant transport, geothermal circulations, hydrocarbon migration in reservoirs or carbon storage and sequestration processes. To date, highly parameterized numerical models are the most general tools to gain insights into processes taking place in the subsurface 1,2 . Often a limited number of boreholes are the only direct observations, and the space between these sparse measurements needs to be filled by interpolation. Additional information is sometimes available from hydrogeological, hydrochemical and geophysical field investigation, but the resolution of such field measurements rarely reaches that of numerical models. For instance, contaminant transport processes in aquifers generally require a resolution at the sub-meter scale, which is not accomplished by standard field testing [3][4][5] . Since transport models are most influenced by geological heterogeneity, they inevitably carry substantial uncertainty.
Natural analogs have proven invaluable to palliate the lack of data prevalent in many cases [6][7][8][9][10][11][12] . The underlying principle is simple: we learn from exposed geological formations to describe hidden ones, assuming that subsurface system and analog share similar properties. More specifically, the geological patterns learned from analogs can be used to constrain geostatistical models of spatial continuity [13][14][15][16] . At the same time, detailed analog models facilitate surrogate analysis of specific flow and transport processes [17][18][19][20] and enable cost-effective testing of new field techniques in virtual but realistic systems [21][22][23] . Despite the appeal of the analog approach, it is not widely used in geological or hydrogeological modelling. The main reason is the lack of free and publicly accessible databases 2,24 . Creating digitized high resolution characterizations by detailed mapping of archetypal outcrops is time consuming 25 . Moreover, such mapping only provides two-dimensional (2-D) analogs, while flow and transport modelling often requires three-dimensional (3-D) characterization. Therefore, it can be difficult to find a representative 3-D analog for a given site. Ideally a freely accessible repository of type locations would exist, spanning a broad range of characteristic subsurface environments. Here, we take a first step in this direction.
Solutions that have been proposed to develop 3-D analogs include the combination of point measurements, outcrop data and geophysical surveys 7,26 . Additionally, strong research efforts have been dedicated to developing advanced geostatistical techniques that are able to extract information from spatially distributed and incomplete data 15,27,28 . Such methods can be used to generate unconditional realizations of spatially distributed geological parameter fields, which serve as realistic but locationindependent representations of the subsurface. Ensembles of multiple such realizations can be used for quantifying uncertainty when modelling a given sedimentary system 29,30 .
In this work, we present multiple realizations of two different sedimentary analogs. We chose a moderately heterogeneous fluvial-aeolian deposit of the upper part of the Pirambóia Formation (Triassic) of south-eastern Brazil 31 , and a highly heterogeneous fluvio-glacial braided river sediment from the Pleistocene in the upper Rhine valley of southern Germany 32 . These analogs were mapped independently, but by following a similar procedure: In order to obtain 3-D images, multiple outcrop cross sections were sequentially mapped and digitized during ongoing excavation in gravel pits.
We focus on a range of different geological, physical and chemical properties, with the purpose of developing genuine portrayals of the selected sedimentary blocks at the sub-decimetre scale. These properties are associated with sets of facies types specific to each analog. Such facies represent the smallest homogenized units and therefore do not represent up-scaled values. Mapping their distribution on each outcrop results in a digitized facies mosaic. These mosaics allow for visualizing the sedimentary structures in the vertical outcrops, and the 3-D spatial changes are captured by combining all mapped cross-sections at a site. By sharing multiple realizations of both analogs, we open the door for any kind of stochastic flow and transport modelling in those types of geological environments. The analogs can serve as benchmarks for different levels of hydraulic, hydrochemical and thermal heterogeneity.

Definition of facies types
Four different facies types are distinguished in the analogs. These address sedimentological, hydrogeological, chemical and geothermal criteria: -Lithofacies: represents a subdivision of a stratigraphical unit, which stems from a distinct deposition event or environment 11,33 . -Hydrofacies: denotes a quasi-homogeneous unit that can be described by characteristic hydraulic properties 33,34 . -Chemofacies: classifies units with same chemical attributes [35][36][37] .
The facies types represent mappable clusters of the selected gravel bodies, at a minimum resolution of half a decimetre. They are determined based on a hierarchical approach: initially, lithofacies were predefined after sedimentological analysis at the site, including outcrops in the vicinity of the same www.nature.com/sdata/ SCIENTIFIC DATA | 2:150033 | DOI: 10.1038/sdata.2015.33 depositional environments 11 . The lithofacies represent the basic categories. These are further distinguished for defining hydrofacies units, when hydraulic properties reveal significant internal variability. This is especially the case when sediment grains show grading, or when a predefined lithofacies assembles small-scale interbedded strata. The hydrofacies-based classification is not further subdivided. This means chemical and thermal properties are estimated for the hydrofacies classes, which yields specific chemo-and thermofacies.
This hierarchical approach is chosen for several reasons. First, the different facies properties are often correlated. For instance, lithologically typical grain sizes result in similar hydraulic and thermal properties, and lithocomponents specific to a lithological strata often share similar chemical properties 40,41 . Second, consistent classification of different facies types simplifies their combined implementation in numerical models. Third, a crucial argument is the practicability of mapping, which is much easier when hierarchical classes are distinguished. This applies especially to field sampling and laboratory measurements 31 . Finally, qualitative classification guided by principal lithofacies types may lead to different hydro-, chemo-or thermofacies of sometimes similar quantitative properties. Therefore the presented classification offers the highest possible resolution, and depending on the model application facies types characterized by similar parameter values may be merged.

Field and laboratory work
Both analog data sets are reconstructed rectangular blocks of unconsolidated sediments, which are mapped by several (5-6) outcrop cross sections in the field and supported by laboratory measurements (Fig. 1). Each data gathering campaign took around six months, synchronized with the mining activities in the sand and gravel pits. The fluvial-glacial sediments of the Rhine valley were excavated close to the town of Herten (Herten-analog) in the summer of 1999 11,32 , and the fluvio-aolian deposits in Brazil close to the town of Descalvado (Descalvado-analog) in 2011 31 (Table 1). Outcrop photographs are shown in Fig. 2. Both case studies were mainly motivated from hydrogeology: the young Rhine gravels host among the most productive aquifers in central Europe, and the Brazilian Pirambóia formation belongs to those sequences that store the most important groundwater reservoir in South America, the Guarani Aquifer system (GAS). The original descriptions of highly resolved hydraulic and hydrochemical heterogeneity and earlier interpolated blocks hence served as aquifer analogs in several previous groundwater modelling and model inversion studies 17,19,20,22,28,42,43 . For instance, Maji 31 present a comprehensive report of the field campaigns, the laboratory measurements and the derived cross sections of the Descalvado-analog. The lateral width of the examined outcrop walls is 28 m and thus longer than that for the Herten-analog, given a similar vertical size (Table 1). Only three parallel profiles were mapped, however, at greater spacing of 3.5 m. These profiles are complemented by the two perpendicular lateral faces, which deliver a true 3-D picture of the structure. A fence diagram with the orientation of all five profiles is given by Höyng, et al. 31 , but no interpolated 3-D models have so far been constructed. Equivalent to the Herten case, five lithofacies are distinguished, four of these are sand dominated and one represents clay intraclasts ( Table 3). Three of these lithofacies are further subdivided due to internal variability of grain sizes, and thus in total nine different hydrofacies are   derived. In the entire mapped sediment block, three major architectural elements can be distinguished: the quasi homogeneous basis with well sorted medium to very fine aeolian sands (Sp), the more heterogeneous central part with cross-bedded coarse sand and gravel (SGt), and on top a laterally continuous layer of trough cross-bedded fine to medium sand facies (St). So in both analogs, highest variability is found in the centre. For the distinction and characterization of the different hydrofacies types, hydraulic conductivity and porosity values were determined using undisturbed samples collected in the field. The porosity was derived by direct measurement in the laboratory, using multiple samples of each hydrofacies. In order to obtain hydraulic conductivity values, different methods have been used at both sites. Directly measured laboratory values are considered most reliable. At the Herten site, the hydraulic conductivities were determined as mean values of repeated flowmeter measurements. The samples of the ten different Herten facies were collected at adjacent gravel pits, with outcrops from similar fluvial deposits of glacial origin from the alpine region (Triassic, Jurassic, Creataceous and Tertiary rock formations). For the Descalvado, only the Fm facies was examined by laboratory permeameter testing. When no directly measured laboratory value is available, empirical estimates are chosen. For each individual facies, repeated sieving of multiple samples combined with a laser diffraction method was applied to derive grain size distributions. These were utilized in the empirical Kozeny-Carman, Beyer, Panda and Lake and USBR formula 11,31,32,45 to estimate specific hydraulic conductivity values. The methods used for each individual hydrofacies are reported in detail in Bayer, et al. 32 and Höyng, et al. 31 Chemical heterogeneity is rarely reported for analogs, and it may be used to describe a broad range of variable chemical characteristics such as mineral composition, carbon content, etc. The selected  Table 3. Facies types and parameters of Descalvado analog (note: organic carbon content was below detection limit of 0.04 mg/g). chemofacies types of both analogs refer to different properties. At Herten, it is the organic carbon content, which is a main determinant of sorption capacity. For the Descalvado, which represents a more mature and quasi carbon free sediment, the iron content (Fe(III)) was examined. Sediment bound iron is abundant in most sediments, and Fe(III) is relevant, for instance, as a solid phase electron acceptor during degradation of organic contaminants in aquifers. For the Herten facies types, in great detail the specific mineral and organic carbon content for given grain size ranges were determined 40,45 . Carbon contents are available only for four facies (Gcm, Gcm,b, Gcg,o, S-x), whereas sieve curves were measured for all ten. In order to extrapolate to the other facies, the carbon content values from the most similar measured lithofacies were adopted, that is, data from Gcm was used for cGcm and sGcm, data from S-x for GS-x, data from Gcm,b for fGcm,b, and data from Gcg,o for cGcg,o, sGcg,o. Summing up the grain size-specific carbon allows obtaining the total organic carbon mass fraction (f oc , mg/g) as a parameter characterizing chemofacies. Since no carbon measurements were carried out at the Herten site, we list in Table 2 those from two related sites: f oc,S is based on an outcrop close to the town of Singen and f oc,H from one located in the vicinity of the village of Hüntwangen. The carbon content of the latter shows only little variability, whereas f oc,S spans a broader range. Since the same categories are used as for the hydrofacies, differences between some chemofacies are not significant, and thus these may be pooled in four or six major classes.
The Fe(III) content of the nine different Descalvado facies types was determined by laboratory testing of three samples per facies. For this, 0.5 g of sediment was filled in 58 ml serum bottles (triplicate of each sample). After adding 25 ml 0.5 M HCl, the samples were put on a shaker table for 1 h to dissolve amorphous and poorly crystalline Fe phases. Crystalline Fe was extracted by adding 6 M HCl to the sample and incubation for 24 h in a 70°C water bath. The dissolved Fe(II) and Fe(III) were determined in the liquid phase by ferrozine assay. The purple-coloured ferrozine complex was quantified spectrophometrically at 562 nm using a microtiter plate reader (FlashScan 550; Analytik Jena, Jena, Germany). The concentrations of Fe(III) were determined by the difference between Fe(II) and Fe(total). Amorphous and poorly crystalline Fe phases could not be quantified because concentrations did not reach measurable levels (>10 μM). The presented solid Fe(III) oxides refer to highly crystalline phases (i.e., goethite).
The thermofacies are characterized by thermal conductivity (K T , W/mK) and specific heat capacity (c P , MJ/m 3 K). These parameters are estimated indirectly based on the volumetric fractions of mineral components and porosity. The Herten facies mineral components (Quarz, Feldspar, Calcite) are determined with the same samples as used for the chemofacies characterization 40,45 . For these minerals, tabularized thermal properties are available 46 . Assuming water saturated conditions, the bulk thermal facies properties can be approximated by the geometric mean (K T ) and the arithmetic mean (c P ) of the individual components 46,47 . The same indirect method was used for the Descalvado case. The more mature sediment of this analog is assumed to be dominated by quartz. In the field, only minor local occurrence of calcite was found. As a rough approximation for the quartz arenitic facies 48 of the Descalvado analog, the feldspar content is assumed to be 1/10 of that of quartz, which is a ratio reported for similar sediments 46 .

Geostatistical modelling
Ensembles of multiple 3-D realizations of the Herten and the Descalvado analogs (Fig. 3) are obtained by multiple-point statistics (MPS) simulation 49,50 . MPS is a collection of geostatistical simulation tools that are specifically aimed at representing complex connected structures and curvilinear patterns. To apply MPS, a training image is normally required, i.e., a conceptual model of heterogeneity that has the same dimensionality as the domain to be modelled.

Data Records
For both analogs, we provide the results obtained with three MPS simulation settings. The first simulation setting consists of a model domain of the same size as the available mapped outcrops (320 × 200 × 140 voxels of 5 cm side for the Herten-analog; 280 × 70 × 58 voxels of 10 cm size for the Descalvado-analog), where the outcrops themselves are used as conditioning data to constrain the simulation. In the second setting, the same grid dimensions are used, but without considering the outcrops as conditioning data. For each of these two simulation settings, 100 equiprobable realizations are obtained by changing the simulation random seed. MPS uses the random numbers generated by a computer for the facies www.nature.com/sdata/ SCIENTIFIC DATA | 2:150033 | DOI: 10.1038/sdata.2015.33 simulation and for the choice of the sequential simulation path. Using another random seed for each realization allows obtaining different stochastic simulations that are representative of the geological uncertainty. For the third simulation setting only one realization per analog is provided, that is a grid of size 1000 × 1000 × 140 for the Herten analog, and a grid of size 420 × 420 × 58 for the Descalvado analog.

Technical Validation
The presented high resolution data sets exhibit several sources of uncertainty and inaccuracy, associated with each working step (Fig. 1). The sources may be grouped as those associated with mapping and facies assignment, those stemming from measurement inaccuracy, those related with parameter estimation and those originating from the geostatistical simulation procedure.

Mapping, measurement and parameter estimation techniques
During mapping, facies types have been allocated by visual inspection of the outcrop wall and of outcrop photos. Even though the structural and textural properties, as well as the colours, were characteristic for the different facies, there is always a risk of misallocation. If possible, the mapped sequences have been validated by sedimentological analysis and comparison to other outcrop analogs 44 . This was crucial for obtaining sedimentologically plausible reconstructions. When several perspectives were available, such as for the Descalvado, consistency of the mapped facies mosaics was scrutinized at the intersections of perpendicular profiles.
The facies types were distinguished based on the principle that they present quasi-homogeneous units for the given scale of mapping (>5 cm). However, within facies types, there always remains a certain natural variability. In order to account for this variability and to arrive at robust parameter values, measurements of porosity, hydraulic conductivity, organic carbon and iron were all repeated several times with multiple samples per facies 11,31,32,34,40,53 . The detected value ranges are reflected for the listed parameters in Tables 2 and 3.
When no direct measurements were conducted, parameter values were estimated based on empirical calculations and by utilizing values reported for neighbouring sites. Empirical calculations were applied for hydraulic conductivity estimation. Since the applied standardized formula are only approximate, they were either utilized for plausibility checking (Herten 32 ) or the derived values were successfully validated through in-situ flowmeter measurements (Descalvado 31 ). No thermal measurements were conducted, and the given ranges are propagated uncertainties of the porosity. The chemofacies for the Herten analog are based on carbon measurements in two different adjacent gravel pits. Mean site-specific values of both pits are listed in Table 2 in order to highlight the spatial variability of the chemofacies types. However, this also shows that the regional hydrofacies-based classification may be suitable to also categorize chemoand thermofacies at one site, but transferability to another site is in this case limited.

Geostatistical simulation
The ensembles of 3-D realizations are validated using visual inspection and in terms of lithofacies proportions, total connectivity and intrinsic connectivity indicators 54  3-D simulations is another important criterion to evaluate the simulations. A direct comparison of the connectivity indicators computed on the 2-D outcrops with the indicators computed on the 3-D simulations would be difficult to interpret. Therefore, while some preliminary tests are performed to compare the connectivity indicators of the 2-D datasets and 2-D MPS simulations, in the data repository we only attach the mean, the median and the standard deviation of the ensembles of 3-D s2Dcd simulations.

Usage Notes
The ensembles of 3D realization of the Herten and the Descalvado analogs are uploaded in compressed format (zip) in the Open Access library PANGAEA (www.pangaea.de, (Data Citation 1)). For the first two simulation settings (conditional and unconditional simulations with domain sizes delimited by the datasets) 100 separate files (one for each realization) are provided for each analog and for each simulation setting. Only one file per analog is provided for the third simulation setting, which is for simulations of domains extended beyond the volumes delimited by the mapped outcrops. The files are provided in the Visualisation ToolKit (VTK, www.vtk.org) structured grid format, which can easily be read with the open source visualisation platform Paraview, freely downloadable (www.paraview.org), and which is described in detail in the VTK user's guide 56 . The VTK files are provided in the ASCII format, with a header that contains a straightforward description of the size of the grid and its spacing (in meters). After the header, the facies codes are reported in a unique column starting from the values with lower x value, then lower y value, and then z. By simply changing the header, the files can be converted into the GSLIB file format (or simplified GeoEAS 57 ), or read by mathematical libraries and scripting languages such as Python, Matlab or Mathematica in a straightforward manner. The README file attached to the data set includes a description of the workflow required for the conversion from VTK to other formats using a text editor, Python, Matlab/Octave or the bash shell. For each outcrop section, in addition to the VTK files, a VTI (VTK ImageData) file is provided to allow for direct visualization of the spatial variability of the hydraulic, thermal and chemical properties.

Author Contributions
Peter Bayer provided the facies description of the Herten analog and prepared the manuscript. Alessandro Comunian conducted the 3-D interpolation, generated the realizations and prepared the manuscript. Dominik Höyng mapped the Descalvado analog and contributed to the chemofacies description of both analogs. Gregoire Mariethoz worked on the geostatistical analysis, figures and preparation of the manuscript.

Additional Information
Competing financial interests: The authors declare no competing financial interests.