A comprehensive database of active and potentially-active continental faults in Chile at 1:25,000 scale

In seismically-active regions, mapping active and potentially-active faults is the first step to assess seismic hazards and site selection for paleoseismic studies that will estimate recurrence rates. Here, we present a comprehensive database of active and potentially-active continental faults in Chile based on existing studies and new mapping at 1:25,000 scale using geologic and geomorphic criteria and digital elevation models derived from TanDEM-X and LiDAR data. The database includes 958 fault strands grouped into 17 fault systems and classified based on activity (81 proved, 589 probable, 288 possible). The database is a contribution to the world compilation of active faults with applications among others in seismic hazard assessments, territorial planning, paleoseismology, geodynamics, landscape evolution processes, geothermal exploration, and in the study of feedbacks between continental deformation and the plate-boundary seismic cycle along subduction zones.

Since the advent of modern instrumental seismology, earthquakes in Chile have accounted for >20% of the seismic moment release on Earth. The earthquakes accounted for in this estimate occurred along the megathrust fault that limits the Nazca and South American plates. However, the South American continental plate, as most upper-plates along subduction zones, includes numerous active faults, some associated with M w > 6 earthquakes. Interestingly, the historical record of such continental earthquakes in Chile is relatively small, including only seven instrumentally-recorded events of M w between 6 and 7 (Refs. 4,[27][28][29]. Out of these, only two (2001 Aroma and 2007 Aysén) occurred on mapped faults; another two were directly associated with larger earthquakes on the underlying plate-boundary megathrust (2010 Pichilemu and 2014 Pisagua) but occurred on unmapped faults. Only a few paleoseismic trenching studies have been carried out along four Chilean continental faults finding robust evidence for Mw > 6 paleoearthquakes 30,31 . It is however expected that the activity of further faults will be verified during forthcoming paleoseismic mapping and trenching endeavours. Recurrence periods of continental earthquakes estimated from paleoseismic and seismological studies are in the range of thousands of years (an order of magnitude larger than recurrence periods of megathrust earthquakes); nevertheless, considering the incipient knowledge of Chilean continental faults and their widespread spatial distribution, the seismic hazards posed by such structures should not be underestimated.
Research initiatives on active faults in Chile have so far only focused on specific faults or fault systems, and no unified and official database of active and potentially-active faults at national scale has been yet published. A regional assessment of neotectonic structures was first presented in the year 2000 including the first map of active faults and folds in Chile at 1:4,000,000 scale, as part of the World Map of Active Faults 32 ; maps of this database were included in review papers addressing Quaternary deformation processes in South America 33,34 and the neotectonics of Chile 35 . Subsequently, the South American Risk Project 36 , promoted by the Global Earthquake Model (GEM) project, incorporated those faults into a global database 37 . Recently, Santibañez et al. 38 produced the first map of faults in Chile 38 by compiling published studies including the 1:1,000,000 scale map of the Chilean Geological Survey 39 , and discussing the relation between regional tectonics, the recent instrumental crustal earthquakes, and mayor long-lived fault systems (active at >10 7 yr timescales).
Here, we present the CHilean Database of Active Faults (CHAF), a unified database of continental faults in Chile, within the South American continental plate (Figs. [1][2][3], which includes all the previous studies as well as newly-identified faults, using a common mapping scale and unified geomorphic criteria. We present basic statistics of fault and fault system geometrical characteristics, and a first-order estimate of maximum earthquake magnitudes using empirical relations. Our database is a contribution to the world compilation of active faults, with implications in various aspects of earth science research including geodynamics, volcanotectonics, paleoseismology, seismotectonics, studies of future earthquakes, exploration and exploitation of geothermal resources, structural control on landslides and volcanism, landscape evolution models, and seismic hazard assessments.

Methods
Fault mapping. Remote sensing data including aerial photographs, satellite images, and more recently Digital Elevation Models (DEM) have allowed the identification of geomorphic features as well as the application of quantitative morphological analyses to map topographic attributes with applications on active tectonic and structural geology studies [40][41][42][43][44] . We applied classical techniques in tectonic geomorphology summarized in seminal textbooks [45][46][47] for mapping newly-identified faults and remapping structures from previous studies at a uniform 1:25,000 scale. We rely on our past experience in mapping active faults in different tectonic environments using field observations and remote sensing data 16,[48][49][50][51][52][53][54][55][56] , in addition to the criteria used in previous active fault databases [57][58][59][60][61] . We paid special attention to interpreting fault trace continuity using a uniform mapping scale based on the surface expression of faults, not the inferred seismogenic expression at depth. The latter needs to be interpreted on the base of particular assumptions, and is therefore beyond the scope of our database. Our database is based on direct surface evidences. For mapping, we used hillshade and slope maps created using QGIS v. 3.10 (www.qgis.org) from DEMs derived from TanDEM-X data (12 m resolution) available for almost the entire region and from airborne LiDAR data (1 m, 2.5 m, and 5 m resolution) available along stretches of the Coastal Cordillera and along specific fault systems. TanDEM-X DEMs were provided by the German Aerospace Center (DLR) under Science Proposals GEOL0845, GEOL1209, GEOL1628, and GEOL0707 via the DLR science portal (https://tandemx-science.dlr.de/). LiDAR data was provided by Digimapas Chile and Forestal Arauco under collaboration agreements. Both datasets may be obtained from the authors on a reasonable request (see Usage Notes).
Data classification and analysis. The database described in this study contains a line vector and metadata associated with each fault. The fields included in the metadata are reported in detail in the Data Records section. Faults in the CHAF database are grouped into fault systems and classified in terms of their estimated activity.
Fault system classification. We define a fault system as the population of faults distributed in a particular region that bear similarities in strike, kinematics, length distribution, and age. Fault system names have been considered on the base of previous studies, when existing (Table 1). Fault systems may or may not have a specific fault linkage geometry. For example, faults grouped into the CCTF and EWTS systems are kinematically but not geometrically linked, whereas faults grouped into the LOFS and LOTF are geometrically and kinematically linked (Fig. 1). In general, faults strands grouped in a fault system have similar strikes (Fig. 1) and fault length distributions (Fig. 4c, d). The defined fault systems may or may not be related to a certain bedrock unit and should not be considered as tectonic provinces, which encompass larger temporal and spatial scales.
Fault traces have been assigned a type attribute based on four classes: (1) blind: faults that do not reach the surface as a break in the landscape, but may be associated with a fold or flexure; (2) covered: faults that are covered www.nature.com/scientificdata www.nature.com/scientificdata/ by undeformed young deposits; (3) inferred: faults whose surface expression is not clear and only estimated; and (4) observed: faults that have a clear surface expression at the 1:25,000 mapping scale.
Fault activity is classified as following. Active faults and folds are grouped in two categories: (1) Faults with proved activity (Proved faults), those associated with an historical earthquake or with robust published evidence of slip (either seismic or aseismic) during the Holocene; or (2) Faults with probable activity (Probable faults), those that exhibit direct geologic or geomorphic evidence of surface ruptures or deformation that allow to posit activity  Fig. 1 The CHAF database. Map of active and potentially-active faults colour-coded by fault system and shaded-relief topography from the SRTM30_plus dataset (http://topex.ucsd.edu/WWW_html/srtm30_plus. html). See Table 1  www.nature.com/scientificdata www.nature.com/scientificdata/ in the past 125,000 years. The age limit for Probably active faults is defined by the last interglacial period (Marine Isotope Stage 5e), when a distinct marine terrace was formed along most of the Pacific coastline 52,62 , which constitutes a suitable temporal geomorphic marker to observe fault offsets and classify fault activity. Fault with possible activity (Possible faults) integrate a third category, when geologic or geomorphic evidences of surface ruptures or deformation affecting the landscape allow to posit activity during the Quaternary period. For this latter case, deformed geomorphic markers include the fluvial network, pediment and alluvial surfaces, and glacial features.
The comprehensive mapping scale of the database allows estimating first-order statistics of fault traces and fault systems that may be relevant and of interest to various different disciplines in Earth sciences. Empirical relations estimated from the surface rupture length and magnitude of historical earthquakes 163 provide a first-order assessment of seismic hazard implications from the CHAF database (Fig. 4a). The distribution of fault length of the entire database (Fig. 4b) suggests that faults are self-similar until a length of ~60 km. Longer traces might require higher-resolution topography and/or mapping at a detailed scale for subdivision, or might reflect mature faults that have accumulated larger magnitudes of deformation resulting in higher geometrical connectivity.

technical Validation
The CHAF database is difficult to validated by any designed experiment. The metadata of our database follows criteria established and validated by governmental institutions of different countries (i.e., USGS 164 , GNS 165,166 , AIST 167 , GEM 57,61 , and CCAF 60 ). The most important validation procedure will be the occurrence of a forthcoming earthquake on a mapped and properly-classified fault. However, another validation procedure is the comparison with independently-made maps published in previous studies. All these references used in the compilation of the active fault database are included as "Codes" in the digital files, allowing to check the original publication and compare the fault traces. The complete list of references in the database is provided also separately in the Supplementary File 1. The original DEM data may be provided from the authors based on reasonable request, for any project that seeks an independent validation procedure. We validate the grouping of individual mapped fault traces into fault systems by analysing the variation in fault strike (Fig. 1) and fault length (Fig. 4b-d). All the individual fault systems have similar length distributions, both for probable and possible faults (Fig. 4b).
The present first version of the CHAF database is intended to be the start of a long-term community-based project. To achieve this goal, we created the website www.fallasactivas.cl that includes a map server to visualize the fault traces, fault systems, and associated metadata. Satellite imagery and hillshade maps created from the DEMs used for mapping have been also included. The website contains a blog aimed at obtaining feedback from the community and to allow for the submission of relevant new data on mapped faults or newly-identified unmapped faults, to update the database.

Usage Notes
The LiDAR data used for mapping previously-and newly-identified faults was in part acquired from the company Digimapas Chile and in part donated by Forestal Arauco to the CYCLO project under a confidentiality agreement. The data may be obtained from the corresponding author based on a reasonable request and a Memorandum of Understanding (MoU); a draft MoU may be found in the Supplementary Materials. TanDEM-X DEMs may be obtained from the corresponding author based on a reasonable request and from the German Aerospace Center (DLR).