Fault2SHA Central Apennines database and structuring active fault data for seismic hazard assessment

We present a database of field data for active faults in the central Apennines, Italy, including trace, fault and main fault locations with activity and location certainties, and slip-rate, slip-vector and surface geometry data. As advances occur in our capability to create more detailed fault-based hazard models, depending on the availability of primary data and observations, it is desirable that such data can be organized in a way that is easily understood and incorporated into present and future models. The database structure presented herein aims to assist this process. We recommend stating what observations have led to different location and activity certainty and presenting slip-rate data with point location coordinates of where the data were collected with the time periods over which they were calculated. Such data reporting allows more complete uncertainty analyses in hazard and risk modelling. The data and maps are available as kmz, kml, and geopackage files with the data presented in spreadsheet files and the map coordinates as txt files. The files are available at: 10.1594/PANGAEA.922582.


Background & Summary
Presenting a new conceptual framework to gather fault data useful for SHA (seismic hazard assessment) was identified as a key challenge by the Fault2SHA Working Group (http://fault2sha.net/) 1 , established within the European Seismological Commission (ESC) in 2016. To improve fault-based SHA, field geologists should provide the relevant observations, analysts should interpret field data appropriately 2 , and the full range of uncertainties associated with the characterization of faults should be correctly understood and propagated in computations 3 . To overcome the barriers that exist for the above, due to the different experience and expertise of participants, we, the Fault2SHA Central Apennines Laboratory, have brought together representatives from research groups across multiple institutions comprising field geologists, seismic hazard modellers and practitioners to create the Fault2SHA Central Apennines Database 4 .
Fault geometries and associated slip-rates are critical in determining earthquake locations and occurrence rates 2,3,5 in SHA: we acknowledge that including these data is key to improving earthquake hazard and risk estimates. Currently, most hazard models rely principally on historical earthquake rates, but these may not be representative of longer term rates (either overestimating or underestimating as typical recurrence intervals may be hundreds to many thousands of years yet historical records are rarely complete for more than a few hundred years), resulting in a bias in the probability and/or the potential magnitude of earthquakes (e.g. 2011 Great East Japan Earthquake, 2010 Haiti Earthquake 3 ). For example, in the central Italian Apennines, where average recurrence intervals on individual faults range from a few hundred to several thousand years [6][7][8][9] , there are discrepancies   Fig. 1 From primary data to modeller decision making. (a) An example showing the relationship between traces, faults and main faults. In the trace, fault and main fault maps, the thicker the line, the higher the location certainty and the colour shows the activity certainty (red, dark orange, light orange, yellow show levels 1, 2, 3 and 4 respectively) -see Tables 1 and 2 for details. (b) An example of how a modeller may turn main faults in the database into seismic sources using a multi-fault rupture model and a minimum activity certainty level criterium for inclusion; in this example, the maximum extent of ruptures across multiple faults is determined using a distance between faults criteria. Blue text signifies modelling decisions, black text represents information directly from the database. Locations affected by such modeller decisions are shown with blue ovals with the parts of the faults affected shown in blue. Note a modeller can alternatively choose to use the fault map to create alternate rupture scenarios. (c) An example of how a modeller may interpolate the point level slip-rate data for inclusion in a fault-based seismic hazard model. approximate path shown with evidence for a local fault but not locating the trace precisely (i.e. not continuously within a few or tens of metres), new evidence could allow for significant alterations. Level 4 represents the lowest level of certainty in location, the location is uncertain by tens of metres or more and there could be significant alterations in future versions of the fault map. Table 1 provides the full criteria for certainty levels at each location. All traces included in the map have some geologic or geomorphic evidence of activity, however not all have this constrained to the Late Pleistocene -Holocene with certainty (see Table 1 for details).
The "trace activity" criteria have been inspired by the classification schemes in previous works 39 , however we have identified specific requirements among criteria listed within the database so that users can see where the scale values have come from and what observations have been made along each trace (see Table 2). Level 1 represents traces that have displacement during the Late Pleistocene -Holocene that has been dated at points along the trace. There must be dated evidence of tectonic surface ruptures during historical or contemporary rupture. Therefore, at least one of the following must have been observed for a trace to be assigned activity level 1: ground-truthed real-time earthquake surface ruptures, earthquake displacements from InSAR or other remote sensing, a palaeoseismic trench with evidence of tectonic offsets that have date constraints (this can include palaeoseismic trench site dating techniques that rely on in situ samples that are geochemical matches, for example tephrachronology), or cosmogenic dating providing evidence of tectonic offsets within the Late Pleistocene -Holocene. Level 2 activity has evidence of Late Pleistocene displacement with dates inferred from regional level dating, but without in situ dated Late Pleistocene -Holocene displacement. For example, a scarp profile showing offsets features that have regional dating constraints. Level 3 activity for a trace is assigned where there is geologic Local displacement constrained to be within the Late Pleistocene -Holocene through regional dating constraints: a scarp profile showing offsets features that have regional dating constraints (Late Pleistocene -Holocene age constraints) but that have not been dated specifically along the trace (scarp profiles that offset Late Pleistocene-Holocene).

Light Orange
Geologic (displaced Middle Pleistocene deposits) or geomorphic evidence of potential fault activity, but this has not been confirmed as Late Pleistocene. The trace is on the same fault as a trace with activity (1) or (2).
Criteria needed for activity (1) or (2) are present for a different trace belonging to the same fault, but not present on this trace.

Yellow
Geologic (displaced Middle Pleistocene deposits) or geomorphic evidence of potential fault activity, but this has not been confirmed as Late Pleistocene There are not sufficient observations for levels 1, 2 or 3. www.nature.com/scientificdata www.nature.com/scientificdata/ (displaced Middle Pleistocene deposits) or geomorphic evidence of potential fault activity, but this has not been confirmed as Late Pleistocene -Holocene, this can represent traces where the trace is on the same fault as a trace with activity 1or 2, but itself does not meet the criteria for activity level 1 or 2. Level 4 activity traces have geologic (displaced Middle Pleistocene deposits) or geomorphic evidence of potential fault activity, but this has not been confirmed as Late Pleistocene -Holocene. www.nature.com/scientificdata www.nature.com/scientificdata/ Faults. Faults display how the traces are connected at the surface and/or at depth. How the traces connect to form faults is based on fault geometry continuation (two traces can start and finish within a few metres from each other or may be kilometres apart), continuation of Late Pleistocene-Holocene offset and total offset across the fault and known earthquake ruptures. In showing how traces are connected, the fault map is of use to those wanting to understand the structures of the region.
All traces belong to one of the faults in the database. However, each fault is represented by a single, continuous map line in the "Fault" map, therefore what are interpreted as secondary traces parallel, quasi parallel or branching to the main fault rather than forming a continuation of the fault, interpreted as a branch, splay or other synthetic or antithetic structures do not appear at the "fault" level (see Figs. 1a, 2c,d). Traces may overlap in an en echelon form, in these cases either one or more of the overlapping traces will be excluded from the fault map and/ or adapted so that a single line can be drawn. The single lines allow the faults to be more easily incorporated into models. Within the database, faults have a minimum length of 1 km. Note that faults can contain traces with different levels of location and activity certainty. Faults are assigned the highest level of activity among their constituent traces, while they have the lowest level of location certainty from among the constituent traces. This is because if activity can be confirmed along a fault, it is assumed the fault has that activity level and other traces either have not been studied sufficiently or the evidence is not apparent. All traces belonging to a fault contribute to the fault activity score. For the location certainty, only the traces contributing directly to the mapped line are considered (so as to avoid decreasing the certainty due to less well constrained splays, synthetic and antithetic structures).
Main faults. Main faults represent how faults have been interpreted to be linked at depth. Connection of one or more faults into a single main fault has been inferred using factors such as continuation of surface geometry (fewer constraints than for faults), total offsets across the faults with the total offsets decreasing to zero at main fault tips, surface slip vectors convergence, and laterally-continuous rupture during contemporary, historical and palaeo earthquakes. We recommend that this scale of mapping is useful for input into current seismic hazard models.
In some cases more than one main fault is known to have ruptured in a single earthquake, in such cases, multiple faults across more than one main fault may represent a rupture, such as during the 1915 M 7.0 earthquake which ruptured the Fucino Ovindoli Pezza, Magnola (or Fucino Magnola), Cerchio Pescina Parasano, and Trasacco main faults comprising (among others), the San Benedetto Dei Marsi, Marsicana Highway, Parasano, Magnola and Trasacco and Luco Dei Marsi faults 57,60,69,96 . In those cases, the identification and naming of faults and main faults is guided by both geological and historical literature reasons, in the sense that some faults are left separated as defined and named in the literature, even though they have ruptured contemporaneously in past earthquakes. Therefore, there is no one-to-one relation between faults or main faults and "earthquake segments" within which ruptures are expected to be confined. To allow for different interpretations of seismic sources, modellers may wish to utilise either the main faults ( Fig. 2d) or the faults (Fig. 2c). Each main fault is represented by a single, continuous map line in the "main faults" map, therefore the geographic representation of the main fault on the map may differ locally from its constituent faults as some detail is necessarily simplified if there are, for example, en echelon faults, and main faults may comprise synthetic or antithetic faults, but only the main fault line will be displayed at the main fault level map (see Figs. 1a, 2b,c). Within the database, main faults have a minimum length of 5 km. Note that main faults can contain faults with different levels of location and activity certainty. Main faults are assigned the highest level of activity among their constituent faults. Main faults have the lowest level of location certainty from among the constituent faults which form the mapped main fault (excluding synthetic and antithetic faults).
Further considerations for the maps. We recognise that there may be different interpretations of how faults join to form main faults. Therefore, we have provided three sets of options for particular main faults in order to allow different interpretations to be considered in hazard modelling. We suggest that any hazard calculations must consider the alternate hypotheses as there is not a consensus for these main faults.
Providing the three map levels allows users to interrogate how the fault and main fault maps have been built up from the trace level observations. The trace map shows where observations have been made, to what levels of certainty have been achieved, and where gaps in knowledge exist. This differs from the approach of the existing seismogenic fault mapping for the region. ITHACA (ITaly Hazards from CApable faulting) is an Italian database produced by ISPRA (Istituto Superiore per la Protezione e la Ricerca Ambientale) that provides detailed fault traces for capable faults in Italy and surrounding areas (http://sgi2.isprambiente.it/ithacaweb/Mappatura.asp). The ITHACA map aims to include all faults referred to in the literature and provides geological information at a high spatial resolution; however, it does not display how the mapped structures connect and the high detail www.nature.com/scientificdata www.nature.com/scientificdata/ -though a highly valuable resource for local study and geological investigations -may be beyond current seismic hazard model capability. Conversely, DISS (Database of Individual Seismogenic Sources) 98 provides a lower resolution map, which aims to give both single-rupture (individual) or multi-rupture (composite) seismic sources (within DISS a seismogenic source is an existing or hypothesized fault capable of producing magnitude Mw 5.5 and larger earthquakes, but a seismogenic source does not necessarily correspond/coincide with real fault field data). To accomplish the complexities and divergency in data interpretations (shown to be needed by the L' Aquila 2009 and Amatrice 2016 earthquake sequences) DISS now provides a table of "debated" seismogenic sources which comprise faults within the literature for which the authors of DISS do not consider have sufficient evidence, or are "reliable enough" for inclusion in the main database. Despite this, the "seismic sources" in the DISS are inferred simple structures and do not show how faults (and higher resolution traces) have been mapped in the field. The resolution of the data is not at a scale that we advocate as being needed for improving seismic hazard assessments, especially for the case of high-resolution calculations required for critical infrastructure, and it does not allow for modellers to build alternate rupture scenarios through consideration of higher resolution geological structures.
Point data: slip rate, local surface geometry, and kinematics. Slip rate, local surface geometry and kinematic data were extracted from existing publications 9,11,13,17,18,32,[40][41][42]68,80,99,100 . The database does not aim to include all data that has been published; for data to be included, it must: (1) have been published in a peer-reviewed journal prior to the publication of the database; (2) have been provided with sufficient location precision for point data location to be recorded (if the coordinates were not published (for example a publication may only show a map with points marked), but the original authors have locations recorded, this data can be included in the database; an attempt was made to contact authors for inclusion of such data, but a lack of recorded coordinates prevailed); (3) pass quality control (for example data with better constraints is available for the same location, the less well-constrained data are considered redundant); and (4) slip-rate or throw-rate measurements, either published directly or as slip or throw measurements, must have the time period for the calculation recorded. We hope the database will encourage future data to be published with location coordinates so it can be included. The authors of the database recognise that there may be data in the literature that was missed, but ought to have been included, in compiling the database. If readers have data to contribute to the database, we ask that they fill in the "Fault2SHA_CentralAPennines_Database_NewDataForm_2021_v1" and send it to the corresponding author via email with title "Fault2SHA Central Apennines Database New Data Form". In the form the columns are editable where not reliant on other columns for information, but cells shown in yellow should not be edited. The user will be contacted if further information is required.
We compared the data included from different sources to check whether there was consistency. We found consistency between, for example, slip-rates measured using different techniques which adds to the robustness of the data. For example, along the PianoDiCampoFelice trace of the CampoFelice fault, the Forme trace along the Magnola fault, the ParasanoEast trace along the Parasano fault, the SanSebastianoOrtonaDeiMarsi trace along the San Sebastiano fault, the TrasaccoCollelongo trace along the Trasacco fault, the TreMontiPaterno trace along the Tre Monti fault, and the Fiamignano trace along the ValleDelSalto fault, the slip-rate data in the database measured from topographic offsets are consistent with that reported using cosmogenic dating of the fault scarp 8 .

Methods used within the database for values within the point data tables.
The original values within the SlipRate and LocalGeometryKinematics tables come from published sources. Within the database, the method used to collect the original data is cited and we direct readers interested in particular values to the original sources. For each method listed in the "Technique" table we provide a short summary. We also state where data have been converted and how -for example slip may be derived from throw data using a nearby measurement of fault dip or plunge as stated.

Creation of gpkg, kmz and xlxs files. The data tables were compiled using Google Sheets and Microsoft
Excel and then extracted as individual csv files. The maps were drawn in GoogleEarth. The map kml files were extracted for use within QGIS, where they were joined to the corresponding data tables. The map lines were assigned colours and thicknesses based on the values in the attributes tables for the activity and location certainty. The geopackage and kml files were then extracted from QGIS and the final kmz was saved within GoogleEarth.  Tables 1 and 2 for the key to the colours and line www.nature.com/scientificdata www.nature.com/scientificdata/ thickness. The "Trace", "Fault" and "MainFault" Table fields associated with each trace, fault and main fault can be accessed by clicking on the trace, fault and main fault name respectively within the folders or clicking on the relevant line on the map. The "SlipRate" and "LocalGeometryKinematics" folders contain map points for each measurement in the database, named according to the trace that the measurement belongs to. The fields from the tables can be accessed through clicking the entry within the list in the folder or the relevant point on the map. We have provided alternate MainFault selection options in the shapefiles. Therefore, only one option from each option group should be chosen when displaying a map at the Main Fault level. This file has been tested in GoogleEarth Pro 7.3.3.7699 (2) Geopackage (Fault2SHA_CentralApennines_Database_2021_v1.gpkg) file compatible with open access GIS programmes containing both the maps and attribute tables for the data in the database. The "Main-Fault", "Fault" and "Trace" layers contain both the maps of the main faults, faults and traces respectively and their attributes tables. The "SlipRate" and "LocalGeometryKinematics" layers contain both the point locations for the relevant measurements and their attributes tables. The remaining tables (TraceLocation, TraceActivity, Earthquakes, CoordinateSystem, Technique, Reference) appear as attribute tables as no geometry layers. Within the gpkg file, the user will need to select which main faults to display is the Main-Fault layer to avoid duplication of faults (the MainFault layer contains all the possible main faults). For example, within QGIS 3.8 this can be achieved through selecting the MainFault layer, opening Layer -Layer Properties -Source "Query Builder" and using a query to select the desired MainFaults, for example to select option 1 within each group use the query: "mainFaultOption" = 0 OR "mainFaultOption" = "A1" OR "mainFaultOption" = "B1" OR "mainFaultOption" = "C1". Any fields in other tables which are dependent on the MainFaultOptionChoice have the different possibilities listed within the tables with the options they belong to stated in brackets. This file has been tested in QGIS 3.8. Additional files: (1) An excel spreadsheet ("Fault2SHA_CentralApennines_Database_NewDataForm_2021_v1.xlbs" and "Fault2SHA_CentralApennines_Database_NewDataForm_2021_v1.xlsx) in the format of the database but with most fields empty. This file is available to anyone who would like to propose data to be included in future versions of the database. (2) A MATLAB script (data2FiSH_2021_v1.m) for extraction of the data from the database in a ready-to-use format for the FiSH code and an accompanying readme text file (README_script_data2FISH_2021_ v1.txt) with instructions for its use. These are contained within the "Data2FiSH_2021_v1" folder along with 2 supporting MATLAB scripts: "utm2deg.m" and "deg2utm.m".
If the user has compatibility issues with software versions please contact the corresponding author.
Database tables. Here we provide an overview of the database structure, how the tables are connected and the different fields within each table (Fig. 3). The unique identifier for each table is underlined. Fields that connect to other tables are in italics. Derived quantities calculated from other fields are shown in blue and fields that are dynamic depending on the Main Fault option selection made are shown in red in the database. The type of data within each column is shown in brackets following the name of the column. options (String) -a unique identifier for each available option for main faults where there is more than one main fault configuration provided. Each option is split into a lettered group and within the group each option is given a number, such that each option has a string form of XX, where the first digit is a capital letter pertaining to the group of options and the second digit is an integer referring to the options within that group. optionDescription (String) -describes the main fault configuration within the option optionName (String) -provides a descriptive name for the option group, it has the form XXX -X, where "XXX" before the dash represents the name (can be of any length) and the "X" after the dash is the option group letter faultsAffected (String) -lists the different faults that will be affected by the selection made within the option group mainFaultOptionChoice (Dropdown, String) -this is a dropdown list in which one of the options within the group must be selected. The choice selected here will change the dynamic fields in other tables that are affected by the main fault selection.  MainFaultSelection table). mainFaultKey (Integer) -a unique identifier for each main fault mainFaultName (String) -a unique identifier for each main fault, the name assigned to each main fault in the database, provided in alphabetical order. Names were chosen including consideration of prevalence in the literature and geographic or political features to help identify them. mainFaultAverageSurfaceOrientation (Integer) -an approximate measure of the strike of the main fault, measured in degrees from north, taken as the angle between the fault tip locations at the surface. Although this value gives an overall gauge of the orientation of the fault, we note that the geometry of individual traces is preferable in modelling and that the fault average strike has uncertainties due to uncertainties regarding the locations of fault tips.  . Names were chosen including consideration of prevalence in the literature and geographic or political features to help identify them. Some main faults comprise only one fault and hence share the same properties, in these cases the main fault and fault will share a name. alternativeNames (String) -as some faults are referred to (either entirely or in part) by different names within the literature, here we provide a list of common alternative names used to describe the fault. Note that the alternative names may refer to different faults, main faults or traces by different authors too. This list is not exhaustive, but we hope it captures the majority of references to the faults and thus helps readers by identifying where fault names may have changed and to help them when searching for further information on the desired faults. mainFaultName (String) -this is a dynamic field, see explanation under Main Fault table. Each fault is assigned to a main fault. faultNumber (String) -this is a dynamic field, each fault is provided with a fault number showing its position along the main fault it has been assigned to. If there is only one fault for a main fault, then the fault number will be 1. Where a main fault has been interpreted as comprising multiple faults, the faults are ordered sequentially from one end to the other (for the central Apennines database this is generally ordered from northwest to southeast). Note there are some examples of main faults where not every fault is considered to be part of the main fault line shown at the surface, for example, the faults may be across strike from each other arranged as synthetic or antithetic structures to the main fault at the surface; in such cases the faults are assigned a number in the hundreds to differentiate them from the faults aligned along strike from each other. Note that for faults along strike from each other the main fault line may not simply be the individual sections amalgamated together so as to allow a single line representation of the fault, for example en echelon structures may be simplified. faultAverageSurfaceOrientation (Integer) -an approximate measure of the strike of the fault, measured in degrees from north, taken as the angle between the fault tip locations at the surface. Although this value gives an overall gauge of the orientation of the fault, we note that the geometry of individual traces is preferable in modelling and that the tip-to-tip strike of the fault has uncertainty due to uncertainties regarding the locations of the fault tips. faultLocationScale (Integer) -see TraceLocation table. This provides a gauge on how certain the location of the fault is. The score is derived from the observations in the Trace table, being the lowest level of certainty of the traces that form the fault (excluding branches, splays, synthetic or antithetic structures).
www.nature.com/scientificdata www.nature.com/scientificdata/ faultActivityScale (Integer) -see TraceActivity table. This provides a gauge on how certain the activity of fault is during the Late Pleistocene-Holocene. The score is derived from the observations in the Trace table, being the highest level of activity of the traces that form the fault (including branches or splays, synthetic or antithetic structures). mainFaultOptionGroup (String) -this field shows whether the main fault a fault belongs to is affected by a main fault option. If there is a one to one or many to one mapping from the fault to the main fault then the field appears as "NULL". However, if there is a many to one or many to many, i.e. that the main fault the fault belongs to can change depending on the main fault option selected, then the main fault option group (a capital letter) is listed.
Trace. The trace table provides a list of individual fault traces, the faults and main faults they have been assigned to, their location and activity certainty, and the attributes that have been considered to determine these. The traces are provided in the order of the faultName and traceNumber (see below). A trace is the lowest level of observation and is based on primary observations. traceKey (Integer) -a unique identifier for each trace, presented in increasing integer order. traceName (String) -a name assigned to each trace to help identify it. A combination of geographical and political locations as well as natural and man-made features has been used.  Table. traceActivityScale (Integer) -see TraceActivity table. This provides a gauge on how certain the activity of the trace is during the Late Pleistocene-Holocene. The score is derived from the observations in the Trace Table. slipSense (String) -the type of slip experienced in the Late Pleistocene -Holocene along the fault -i.e. normal, strike or reverse bedrockFaultScarpGroundTruthed (Integer) -a significant proportion of the trace has a bedrock fault scarp exposed, or bedrock fault scarps are exposed at regular intervals along the trace, these have been checked directly (on site). This field is displayed as "1" if the observation has been made and "0" or NULL if not. sharpQuaternaryTectonicScarpGroundTruthed (Integer) -a significant proportion of the trace has a sharp Quaternary tectonic scarp, or scarps are exposed at regular intervals along the trace, these have been checked directly (on site). This field is displayed as "1" if the observation has been made and "0" or NULL if not. bedrockFaultScarpIdentifiedThroughRemoteSensingOrSatelliteImagery (Integer) -a significant proportion of the trace has a bedrock fault scarp exposed, or bedrock fault scarps are exposed at regular intervals along the trace, these have been checked through remote techniques, but not necessarily in situ. This field is displayed as "1" if the observation has been made and "0" or NULL if not. sharpQuaternaryTectonicScarpIdentifiedThroughRemoteSensingOrSatelliteImagery (Integer) -a significant proportion of the trace has a sharp Quaternary tectonic scarp, or scarps are exposed at regular intervals along the trace, these have been checked through remote techniques, but not necessarily in situ. This field is displayed as "1" if the observation has been made and "0" or NULL if not. changeInSlope (Integer) -there is a change of slope identified along the trace. This can have been identified in situ or through remote techniques such as using a DEM. This field is displayed as "1" if the observation has been made and "0" or NULL if not. incisedDrainage (Integer) -incised drainage has been identified along the trace either in situ or via remote techniques such as satellite imagery or using a DEM. This field is displayed as "1" if the observation has been made and "0" or NULL if not. changeInSlopeAndIncisedDrainageAndAlongFaultGroundTruthedScarpOrPrimarySurfaceRuptures (Integer) -a change in slope and incised drainage have been identified along the trace, and on other traces belonging to the same fault (only including along-strike traces, i.e. only those with the same hundred number in the trace-Number), that a ground-truthed scarp (bedrock or other Quaternary) or primary earthquake surface ruptures have been identified. This field is displayed as "1" if the observation has been made and "0" or NULL if not. geophysicalData (Integer) -data from geophysical equipment such as ground penetrating radar or seismic reflection profiling has been used to locate the fault trace at particular site(s). This field is displayed as "1" if the observation has been made and "0" or NULL if not. realTimePrimaryEarthquakeSurfaceRupturesDirectlyObservedInField (Integer) -primary earthquake ruptures have been identified through direct observation (e.g. during a field campaign) within good time following an event. This field is displayed as "1" if the observation has been made and "0" or NULL if not. earthquakeDisplacementFromInSAROrSimilar (Integer) -primary earthquake ruptures have been identified through remote sensing such as InSAR within good time following an event. This field is displayed as "1" if the observation has been made and "0" or NULL if not. (2021) 8:87 | https://doi.org/10.1038/s41597-021-00868-0 www.nature.com/scientificdata www.nature.com/scientificdata/ palaeoseismicTrench (Integer) -tectonic offsets have been identified in at least one trench site along the trace, with some form of dating constraining the offset to within the Late Pleistocene -Holocene. This field is displayed as "1" if the observation has been made and "0" or NULL if not. cosmogenicDating (Integer) -cosmogenic isotope exposure dating has identified tectonic offset at least one site along the trace with the modelled data constraining offsets to within the Late Pleistocene -Holocene. This field is displayed as "1" if the observation has been made and "0" or NULL if not. offsetLatePleistoceneHoloceneDepositsMesauredThroughTopographicOffsets (Integer) -tectonic offsets have been identified, the ages have been constrained through regional dating such as dating of a palaeosurface. This field is displayed as "1" if the observation has been made and "0" or NULL if not. datedOffsetAlongFaultWithLocalOrRegionalDating (Integer) -along other traces belonging to the same fault (only including along-strike traces, i.e.only those with the same hundred number in the traceNumber), tectonic offsets have been identified; these will have either local dating or dating constrained through regional markers. This field is displayed as "1" if the observation has been made and "0" or NULL if not.
TraceLocation. The trace location table contains the scale used to describe how certain the location of the trace is. slipRatePreferredMmYr (Number) -the "preferred" slip-rate expressed in mm yr −1 . This is the slip-rate calculated using the author's most likely slip and most likely time window over which the slip was accumulated. Note the "preferred" value is interpreted to be the mean value, unless an explicit "preferred" value is reported. Calculated as slipPreferredM/timeWindowPreferred, unless a direct measurement for slip-rate itself is provided. Note the slip-rate has to be defined at a specific location. slipRateMinimumMmYr (Number) -the "minimum" slip-rate expressed in mm yr −1 . This is the slip-rate calculated using the author's minimum slip (ie "preferred" or "mean" slip minus the error) and maximum time window over which the slip was accumulated (i.e. the time window + error in the time window; this is calculated as the time between the oldest older age and youngest younger age deemed possible). Calculated as slipMini-mumM/timeWindowMaximum, unless a direct measurement for the minimum slip-rate itself is provided. Note the slip-rate has to be defined at a specific location. slipRateMaximumMmYr (Number) -the "maximum" slip-rate expressed in mm yr −1 . This is the slip-rate calculated using the author's maximum slip (ie "preferred" or "mean" slip plus the error) and minimum time window over which the slip was accumulated (i.e. the time window -error in the time window; this is calculated as the time between the youngest older age and oldest younger age deemed possible). Calculated as slipMaximumM/ timeWindowMinimum, unless a direct measurement for the maximum slip-rate itself is provided. Note the slip-rate has to be defined at a specific location. See coordi-nateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. yCoordinateUTM (Number) -the y UTM coordinate of the site where the measurement was taken. See coordi-nateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. gridUTM (String) -the UTM grid box of the UTM coordinates.
www.nature.com/scientificdata www.nature.com/scientificdata/ xCoordinateWGS84 (Number) -the longitude coordinate of the site where the measurement was taken in decimal degrees in the WGS 84 system. See coordinateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. yCoordinateWGS84 (Number) -the latitude coordinate of the site where the measurement was taken in decimal degrees in the WGS 84 system. See coordinateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. coordinateErrorM (Number) -the estimated error, measured in metres, of the location (ie in the x and y coordinate) coordinateSystemKey (Number) -specifies the original coordinate system used to measure the location -see Coordinate System Table. techniqueKeyThrowSlip (Integer) -the method used for measuring the throw or slip. See Technique Table. referenceKeyThrowSlip (String) -the publication from which the throw or slip measurement has been taken. See Reference Table. techniqueKeyDate (Integer) -the method used for measuring the time period over which the throw and/or slip occurred. See Technique Table. referenceKeyDate (String) -the publication from which the time period has been taken. See Reference Table. slipPreferredM (Number) -the "preferred" slip expressed in metres. Note the "preferred" value is interpreted to be the mean value, unless an explicit "preferred" value is reported. See originalMeasurement to determine whether the slip was measured directly, or calculated from the throw measurement. The younger and older age between which the slip occurred is shown in the Age columns. slipMinimumM (Number) -the minimum slip expressed in metres. See originalMeasurement to determine whether the slip was measured directly, or calculated from the throw measurement. slipMaximumM (Number) -the maximum slip expressed in metres. See originalMeasurement to determine whether the slip was measured directly, or calculated from the throw measurement. throwPreferredM (Number) -the "preferred" throw expressed in metres. Note the "preferred" value is interpreted to be the mean value, unless an explicit "preferred" value is reported. See originalMeasurement to determine whether the throw was measured directly, or calculated from the slip measurement. The younger and older age between which the throw occurred is shown in the Age columns. throwMinimumM (Number) -the minimum throw expressed in metres. See originalMeasurement to determine whether the throw was measured directly, or calculated from the slip measurement. throwMaximumM (Number) -the maximum throw expressed in metres. See originalMeasurement to determine whether the throw was measured directly, or calculated from the slip measurement. youngerAgePreferredKA (Number) -the "preferred" age for the younger age of the time window over which the slip or throw was measured, measured in thousands of years before present. Note the "preferred" value is interpreted to be the mean value, unless an explicit "preferred" value is reported. youngerAgeMinimumKA (Number) -the youngest age for the younger age of the time window over which the slip or throw was measured, measured in thousands of years before present. youngerAgeMaximumKA (Number) -the oldest age for the younger age of the time window over which the slip or throw was measured, measured in thousands of years before present. olderAgePreferredKA (Number) -the "preferred" age for the older age of the time window over which the slip or throw was measured, measured in thousands of years before present. Note the "preferred" value is interpreted to be the mean value, unless an explicit "preferred" value is reported. olderAgeMinimumKA (Number) -the youngest age for the older age of the time window over which the slip or throw was measured, measured in thousands of years before present. olderAgeMaximumKA (Number) -the oldest age for the older age of the time window over which the slip or throw was measured, measured in thousands of years before present. timeWindowPreferredKA (Number) -the "preferred" time window over which the slip or throw was measured, measured in thousands of years. Calculated as olderAgePreferredKA -youngerAgePreferredKA. timeWindowMinimumKA (Number) -the minimum time window over which the slip or throw was measured, measured in thousands of years. Calculated as olderAgeMinimumKA -youngerAgeMaximumKA. timeWindowMaximumKA (Number) -the maximum time window over which the slip or throw was measured, measured in thousands of years. Calculated as olderAgeMaximumKA -youngerAgeMinimumKA. originalMeasurement (String) -states whether the original measurement stated in the reference was a throw or slip. throwSlipConversionMethod (Number) -refers to the technique used to convert from throw to slip or slip to throw. See technique table.
LocalGeometryKinematics. The local geometry and kinematics table provides local surface slip vector, strike and dip measurements and the locations at which they were measured. Note only direct point measurements are included. Geometry measurements are associated with individual traces and grouped by fault. There are no data included regarding the depth of the fault or how surface measurements may project to depth as we do not consider that currently there are sufficient constraints for this to be included; a modeller will need to add their own assumptions. www.nature.com/scientificdata www.nature.com/scientificdata/ traceNumber (Integer) -obtained from the trace name, see Trace table. faultNumber (Integer) -this is a dynamic field obtained from the fault name, see Fault table. strike (Integer) -strike measurement at the location specified surfaceDip (Integer) -dip measurement at the location specified, measured at the surface slipVectorAzimuth (Integer) -slip vector azimuth at the location specified, measured at the surface slipVectorPlunge (Integer) -slip vector plunge at the location specified, measured at the surface xCoordinateUTM (Number) -the x UTM coordinate of the site where the measurement was taken. See coordi-nateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. yCoordinateUTM (Number) -the y UTM coordinate of the site where the measurement was taken. See coordi-nateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. gridUTM (String) -the UTM grid box of the UTM coordinates. xCoordinateWGS84 (Number) -the longitude coordinate of the site where the measurement was taken in decimal degrees in the WGS 84 system. See coordinateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. yCoordinateWGS84 (Number) -the latitude coordinate of the site where the measurement was taken in decimal degrees in the WGS 84 system. See coordinateSystemKey that informs whether this was the original coordinate system used to determine the location or whether it has been derived. coordinateErrorM (Number) -the estimated error, measured in metres, of the location (ie in the x and y coordinate) coordinateSystemKey (Number) -specifies the original coordinate system used to measure the location -see Coordinate System table. techniqueKeyGeometry (Integer) -the method used for measuring the slip vector or surface fault geometry. See Technique Table. referenceKeyGeometry (String) -the publication from which the measurements have been taken. See Reference Table. Earthquakes. The earthquakes table provides a summary of references to instrumental, historical and palaeoearthquakes along the traces. Note the list is not exhaustive. CoordinateSystem. States the coordinate system that relates to the coordinate system key, which is used in point data tables to identify the original coordinate system used for locating data collection sites and the technique for converting between the original coordinate system of the measurement and those displayed in the data tables. coordinateSystemKey (Integer)-unique identifier for each coordinate system of the original location reported in the reference originalCoordinateSystem (String) -the coordinate system corresponding to the coordinateSystemKey finalCoordinateSystem (String) -the coordinate system that the original coordinates were converted to coordinateConversionTechnique (String) -the method used for converting the coordinates  Table 3 provides the percentage of traces, faults and main faults with the different levels of location certainty and activity. The activity level for each main fault is assigned as the highest from among its constituent faults and fault activity is assigned as the highest from among its constituent traces, so that all evidence of activity is included in defining the activity of the fault. Of the 43/44 main faults included within any set of main fault selections, 38/39 have confirmed Holocene -Late Pleistocene activity (Activity Level 1 or 2) (44 of the 49 possible main faults listed). The remaining 5 main faults do not have confirmed activity in the Late Pleistocene-Holocene; they are included to encourage further study and so their potential activity can be included in hazard models. As the assigned location certainty for each main fault is the lowest level from among its constituent faults (and likewise for faults from among its traces), a main fault and fault will have low location certainty even if parts of them are well-constrained if there are traces along a fault which are less-well constrained, as is common towards the surface tips.
Main faults in the region vary in length from 6.0 km up to 44.4 km or 6.5 km up to 46.9 km depending whether length is measured as a tip-to-tip length, or if the detailed surface geometry is considered (Fig. 4a). Faults vary in length from 1.2 km up to 44.4 km or 1.3 km up to 46.9 km for tip-to-tip and detailed surface geometry respectively. Errors in the main fault and fault lengths are hard to define as surface fault tips and the lateral extent of total displacement are difficult to locate with precision, especially for faults with low slip-rates and small total offsets relative to the thickness of geological units. Table 4 provides a summary of the proportion of traces, faults and main faults that have at least one point measurement of (a) throw and/or slip-rate, (b) strike, (c) surface dip, (d) slip vector azimuth, and (e) slip vector plunge.
The database contains 149 point measurements of throw-rate and/or slip-rate. These are distributed among 61 traces along 40 faults and 31 main faults. Therefore, 20% of traces, 46% of faults, and 70-72% of main faults have the slip-rate and/or throw-rate constrained in at least one location. Measured preferred throw-rates and slip-rates at individual sites vary up to 1.7mmyr −1 and 2.0mmyr −1 respectively (Fig. 4b) and the mean measured maximum throw-rate is 0.7 mmyr −1 on both main faults and faults. There is a modest correlation (R 2~0 .4 depending on mainFault selections) between the maximum measured throw-rates and slip-rates and the lengths of the main faults (Fig. 4c,d). Note there are some further constraints on slip-rates from the literature, but the database only contains data that meet the criteria for inclusion.
The database contains 468 point measurements of strike. These are distributed among 57 traces along 39 faults and 31 main faults. Therefore, 18% of traces, 45% of faults and 70-72% of main faults have local strike constrained in at least one location. The modal group of individual strike measurements is 120° -135°, while for the surface tip-to-tip orientations it is 135°-150° for faults and main faults. 81%, 85% and 80% of the individual strike measurements, tip-to-tip surface main fault orientations and tip-to-tip surface fault orientations respectively lie between 105° and 165°, while 93%, 96% and 90% respectively lie between 90° and 180°, and hence can be described as having a SE strike (Fig. 4e).
The database contains 470 point measurements of surface dip. These are distributed among 56 traces along 39 faults and 31 main faults. Therefore, 18% of traces, 45% faults and 70-72% of main faults have the dip constrained in at least one location. The mean and median of the individual surface dip measurements are 56°, with a standard deviation of 10° and all measurements having a range between 31° and 85°. The weighted average mean dip for main faults in the region is 59°, with a standard deviation of 7°, the median is 60° and the range is 43°-74° (Fig. 4f).
The database contains 290 locations with the slip vector azimuth constrained. These are distributed among 77 traces along 40 faults and 32 main faults. Therefore, 25% of traces, 46% of faults and 73-74% of main faults have the local slip vector azimuth constrained in at least one location. The mean of the measured slip vector azimuths is 211°, with a standard deviation of 37° (Fig. 4e); the measured slip vector azimuths are consistent with dip slip faults with converging patterns of slip (Fig. 4g,h), although only a weak correlation (R 2~0 .3) is shown (note the correlation for Fig. 4g increases to 0.4 if only southwest dipping main faults are included).
The database contains 254 locations with the slip vector plunge constrained. These are distributed among 76 traces along 40 faults and 32 main faults. Therefore, 24% of traces, 46% of faults and 73-74% main faults have the local slip vector plunge constrained in at least one location. Individual measurements range between 23° and 82°, with a median value of 56°, mean value of 55° and standard deviation of 10° (Fig. 4f).  www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
The data can be extracted for use in fault-based hazard assessments, in particular the modeller has access to the primary data so has the potential to include detailed uncertainty analyses. The modeller can decide which level of mapping (main faults, faults, traces) to incorporate within their hazard model (Fig. 1a). At the main fault level, we have provided some alternate options where there are different interpretations of how to connect faults. The modeller will need to decide how they allow earthquakes to rupture, for example whether multi-fault ruptures are possible and if so, what are the rules governing such ruptures (Fig. 1b). Depending on model capability, the modeller will have to determine how to extrapolate slip-rate measurements along the main fault or and/or fault (Fig. 1c). How the data is used may be dependent on the scale of the model being created -for example whether local surface rupture and high precision are required or if it is a lower resolution hazard map, and whether the model is aiming to provide information on annual probabilities or investigating a worst credible event.
Examples of tools for fault-based SHA include SHERIFS 101 and FiSH codes 27 . To aid readers use the data for both these codes, we provide tools for extracting the data into the appropriate formats. (For an example of using the data for SHA input see 37 ).
The database structure can be used as a template for data reporting within the central Apennines and with possible adaptations for other regions. We suggest that data from field investigations should be reported such that the data are useful for and can be easily included into SHA calculations both within the capabilities of current models and in the future. Therefore, we advocate that all authors reporting fault data -in particular slip-rate and fault geometry -should do so in a manner that would be fit for incorporation in a database structure that we recommend herein. We suggest fault traces should be displayed with accompanying notes on the location and activity certainty and the evidence for these (in auxiliary files if required). We propose that all slip-rate and fault geometry measurements should be reported with their spot location details including precise coordinates, the primary data collected, the methods used, and the uncertainties in the data. Slip-rates should provide time periods over which they are estimated including the techniques and errors used to infer these. Such transparency will allow better incorporation of primary data uncertainties and the propagation of these uncertainties within hazard modelling. In turn, this can lead to better understanding of the uncertainty in risk calculations used in catastrophe modelling and risk assessments used by governments, civil protection and the insurance industry to manage and reduce risk to residents, industry and society as a whole.
The database should also be useful to those planning field investigations requiring fault data for purposes beyond SHA.
It is our intention to update the database with new information and expand the geographical area covered. Note new releases are unlikely to be more frequent than annually. The next release will likely include (1) additional areas to the northwest and southeast of the current area covered; (2) more details within the Earthquakes tables regarding individual events; and (3) more data input for current traces if received from contributors. Contributors will be acknowledged. There will be consideration as to whether to include slip-rate data that has not been provided with a specific grid reference, but can constrain the slip-rate along a fault.

Code availability
A MATLAB script (data2FiSH.m) for extraction of the data from the database in a ready-to-use format for the FiSH code and an accompanying readme text file (README_script_data2FISH.txt) with instructions for its use. These are contained within the "Data2FiSH" folder along with 2 supporting MATLAB scripts: "utm2deg.m" and "deg2utm.m". slip vector azimuths minus the slip projection direction (i.e. the difference between the slip vector and the dipslip slip vector azimuth of the fault assuming the dip-slip value is perpendicular to the tip-to-tip strike of the main fault) for the main fault versus normalised distances along main faults.  Table 4. Percentage of traces, faults and main faults that contain at least one point measurement of various parameters within the database. The range of values for the main faults arises from the different combinations of main fault selection options.