Gas fields and large shallow seismogenic reverse faults are anticorrelated

We investigated the spatial relationships among 18 known seismogenic faults and 1651 wells drilled for gas exploitation in the main hydrocarbon province of northern-central Italy, a unique dataset worldwide. We adopted a GIS approach and a robust statistical technique, and found a significant anticorrelation between the location of productive wells and of the considered seismogenic faults, which are often overlain or encircled by unproductive wells. Our observations suggest that (a) earthquake ruptures encompassing much of the upper crust may cause gas to be lost to the atmosphere over geological time, and that (b) reservoirs underlain by smaller or aseismic faults are more likely to be intact. These findings, which are of inherently global relevance, have crucial implications for future hydrocarbon exploitation, for assessing the seismic–aseismic behaviour of large reverse faults, and for the public acceptance of underground energy and CO2 storage facilities—a pillar of future low carbon energy systems—in tectonically active areas.

. Overview of the study area, showing the location of selected gas wells and seismogenic faults. (a) The study area, shown by a black polygon, was delineated following the structural scheme proposed by Mantovani et al. 17 . Gas reservoirs are selected from the ViDEPI database and lie mostly within the external foothill belt between the leading thrusts of the internal zone and the outermost thrusts in the Adriatic foreland basin. The empty boxes are the surface projections of Individual Seismogenic Sources from the DISS database: the 18 faults selected for this study are shown in orange, whereas all others are shown in brown. The elongated light brown polygons are Composite Seismogenic Sources from the DISS database. The blue box shows the location of panel (b). The white dashed rectangle indicates a 120 km-long coastal strip where seismic hazard is consistently higher than in structurally similar, adjacent areas of the Italian peninsula, but where there exist only fewer-than-average unproductive wells and virtually no productive wells (see text). Please refer to the "Methods" section for further details. More specifically, the northern and central Apennines hydrocarbon province is a result of the progressive thrusting and folding, hosting hydrocarbon reservoirs at the core of growing anticlines/antiforms. The whole Italian peninsula is one of the richest hydrocarbon-producing regions in southern Europe 13,14 . Major gas fields lie parallel to the structural trends, and structural traps usually occur all throughout the thrust belt, most commonly along its external boundary, in the adjacent foredeep basin, and in the Adriatic foreland ( Fig. 1). They concentrate in the Po Plain and in the northern and central Adriatic Basin, which are the targets of our investigation. Oil fields occur mostly in the western Po Plain, in the southern Apennines and in Sicily 15,16 .
Most of the Italian gas originates in Northern Adriatic reservoirs and is associated with two main source rock types. The older one is thermogenic gas-prone, the main gas pools being found in turbiditic sands occurring in the Apennines foothills 16 , whereas the younger one produces biogenic gas hosted in the outer Plio-Pleistocene foredeep domain and feeds the most important fields 14 . Biogenic gas is trapped in the highly efficient Plio-Pleistocene turbidite systems of the Apennines foredeep 18 and within synsedimentary traps on both the inner and outer flanks of the folds 19 .
The occurrence of the 20 and 29 May 2012 thrust faulting earthquakes in the southern Po Plain (M w 6.1 and 5.9, respectively) prompted a debate on the potential triggering role of hydrocarbon exploitation 20 . Mucciarelli and coworkers 21 investigated the 2D spatial relationships between known seismogenic sources and the distribution of 455 gas fields in a ∼ 10,000 km 2 portion of the central-southern Po Plain, and proposed the existence of an anticorrelation between productive reservoirs and the presence of relatively large seismogenic faults, i.e. faults capable of Mw 5.5 + earthquakes.
We aim at substantiating these hypotheses and results by investigating a four times larger-and unique, to our knowledge-sample of gas wells, occurring over a five times larger area, and a twofold number of seismogenic faults; we analysed them using a more detailed metric based on fully 3D distances between wells and faults and through a more robust statistical approach.

Results
We explored in a GIS-environment the spatial relationships among 1651 wells drilled for gas exploitation and 18 known seismogenic faults ( Fig. 1a and Table 1: see also the "Methods" section) occurring over an area of over 50,000 km 2 . We first calculated both minimum planar (2D) distances, from the well head to the closest point of a 2.0 km buffer drawn around the surface projection of each seismogenic fault, and minimum three-dimensional (3D) distances, from the well bottom to the closest point on the actual fault plane. The 2.0 km size of the buffer Table 1. Main parameters of Individual Seismogenic Sources falling within our study area. Summary of the seismogenic faults occurring in the study area, showing also the number of wells falling within the surface projection of each source (see the "Methods" section). All earthquake magnitudes (M w ) are from the CFTI5Med catalogue (http:// stori ng. ingv. it/ cfti/ cfti5/), except for the 20 and 29 May 2012 events, whose magnitude is from the CPTI15 catalogue (https:// emidi us. mi. ingv. it/ CPTI15-DBMI15/). www.nature.com/scientificreports/ zone corresponds to 20% of the average length of the selected ISSs: we assumed it as a reasonable measure of the variability in the location of the rupture plane, in the absence of firmer constraints on the actual uncertainty. We then used the Weight of Evidence method (WofE 22 ), a bivariate statistical analysis based on the Bayesian probability framework, to detect any statistically significant spatial correlation/anticorrelation between faults and productive/unproductive gas fields ( Fig. 1b: see the "Methods" section for details). A positive C implies a good spatial correlation (the higher the value, the larger the correlation) between the well location and the fault plane; a C around 0 indicates a non-significant correlation; a negative C indicates absence of correlation. The Studentised contrast C/S(C), where S(C) is the standard deviation of C, provides insight into the statistical significance of the results (please refer to the "Methods" section for the other definitions used in the following). Table 2a and Fig. 2a, b summarise the results obtained using the training dataset for the eight adopted distance bins in the 2D distance case.

2D distances.
For productive wells, C values are negative for distances < 5 km (down to − 2.075); they then increase, reaching a maximum in the 20-30 km bin, and become negative again for distances > 50 km. These results indicate that productive wells generally locate rather far from seismogenic faults but not beyond 50 km, a distance that may exceed the half-width of the hydrocarbon province investigated in this work. Table 2. Summary of the results obtained through the Weight of Evidence test. Summary of the results obtained for the 2D (a) and 3D (b) distance scenarios, respectively for productive and unproductive wells.

Distance bin (km)
Contrast weight (C) C/S (C) www.nature.com/scientificreports/ In marked contrast, for unproductive wells the largest C values (1.233) and the largest Studentised contrast (9.077) are found for the first distance bin (0-2 km). C drops quickly for increasing distance ranges, becoming negative (− 1.683: C/S(C) − 9.847) for a distance > 50 km. These results indicate that the wells falling very close to a seismogenic fault are likely to be unproductive. Table 2b and Fig. 2c, d summarise the results obtained using the training dataset for the eight adopted distance bins in the 3D case.

3D distances.
The results we obtained for productive wells indicate a non-significant correlation (C = 0.058) for the shortest distance bin (< 4 km); the ratio between the contrast weight and the standard deviation (0.180) for the closest bin of the 3D distances, however, indicates that an interpretation of the results should not be attempted. C is negative in the 4-6 km and 6-10 km bins (− 1.163 and − 0.493) and becomes positive C for distances > 10 km, implying that productive wells are generally drilled > 10 km from seismogenic faults. For distances > 50 km we found a negative C (− 1.930), as in the 2D case. The results suggest no statistical correlation, or a negative correlation, between the location of the productive wells and that of seismogenic faults.
As for unproductive wells, the largest C (1.177) is found for the shortest distance bin (< 4 km). C then decreases up to a 10 km distance, increases again in the distance range 10-20 km, and drops beyond 50 km, becoming fully negative. www.nature.com/scientificreports/ Success-rate curves. As detailed in the "Methods" section, we calculated the success-rate curve using our training dataset, then used the testing dataset for estimating the prediction-rate curve (Fig. 3).
In the 2D case, 80.3% and 78.3% of the data in the training dataset are found in the 20% and 35% largest C bins, respectively for productive and unproductive wells. The Area Under the Curve (AUC: see the "Methods" section for its exact definition) estimated for the success-rate curve is very good for productive wells (92.5%), and good for unproductive wells (80.0%).
In the 3D case, 85.5% and 68.8% of the data in the training dataset are found in the 20% and 30% of the largest C bins, respectively for productive and unproductive wells. The AUC estimated for the success rate curve is very good for productive wells (90.2%), and good for unproductive wells (77.2%).
In both the 2D and 3D cases the AUCs calculated from the prediction-rate curves for the testing dataset are similar to those estimated based on the training dataset, suggesting that the model performs well in predicting the typology of the productive/unproductive nature of a well depending on its 2D/3D distance from a seismogenic fault.
The results of our statistical analysis show a clear anticorrelation between the spatial distribution of productive wells and of seismogenic faults, whereas the correlation between unproductive wells and faults is more tenuous, albeit still statistically significant. In our opinion this circumstances reflect a specific characteristic of the data that will be elucidated later on.
Overview of the results. Unlike other applications of the WofE method, in this work we had to deal with two widely independent sources of uncertainty: (a) the incompleteness of the dataset of gas reservoirs, mainly concerning their depth, and (b) the inevitably limited and fragmentary knowledge on seismogenic faults; their location and geometry generally rely on good quality geological and exploration seismology data, but their activity is documented by earthquakes that are difficult to assign to a specific causative fault as they occurred up to four centuries ago.
Nevertheless, our analyses document a statistically significant relationship between the location of productive/unproductive gas wells and the occurrence of seismogenic faults deemed capable of M w 5.5 + earthquakes.
More specifically, the WofE shows that the highest spatial correlation between the vast majority of productive wells and the selected seismogenic faults is obtained for a 10-50 distance: in 402 out of 499 cases (80%) for the 2D distance scenario, and in 405 out of 499 (81%) for the 3D scenario. These figures imply that the existence of a productive well above an identified seismogenic source is a statistically rare occurrence throughout our study region. Conversely, unproductive wells show a high spatial correlation with the location of seismogenic faults, although-somewhat unexpectedly-the results of the WofE method for 2D distances are more robust than those obtained using 3D distances. These results do support our working hypotheses, but their interpretation requires a closer inspection of the geographical distribution of the data. . Success-rate curves and prediction-rate curves for the 3D scenario. Success-rate curves (in red for productive and black for unproductive) and prediction-rate curves (in orange for productive and blue for unproductive) of C distributions calculated for the 3D distance scheme. www.nature.com/scientificreports/ In summary, productive wells generally locate away from large seismogenic faults, whereas unproductive wells are often associated with them. The only significant exceptions reported for the 2D scenario are 14 productive wells lying in the class 0-2 km (Table 2a, training dataset only), 11 of which are located within the surface projection of the individual source ITIS100 Bagnacavallo, two fall within the ITIS141 Argenta, and one falls within the ITIS070 Offida (Fig. 1, Table 1). In the 3D scenario we find only 10 wells falling within 4 km of the fault.

Discussion
Statistical significance of the data and of the results. Our elaborations show that for productive wells, large C values are obtained only for relatively large well-to-fault distances, whereas unproductive wells correlate with seismogenic faults primarily for short distance ranges, but exhibit relatively high C values also for more distant ranges, up to 30 km. Also, productive wells tend to cluster in space, whereas unproductive wells exhibit a more scattered distribution over the whole study area (this is clearly seen in Figs. 1, 2 and S3-S6). These two conditions are a clear indication that both productive and unproductive wells may not provide a totally reliable portrait of their own distribution.
Why does this happen? Generally, once a newly drilled well is found productive, additional wells are drilled around it; conversely, if that well is found unproductive, the exploration generally stops, as further activity nearby is unlikely to be economically attractive. The result of this quite understandable imbalance is that, at least in our study area, the exploration has carefully circumvented large areas that in principle were suitable for hydrocarbon accumulation, based on their geological setting, but had already shown to be unproductive. This is the case of the nearly 120 km-long coastal strip stretching from Rimini to Ancona, a region where seismic hazard is consistently higher than in structurally similar, adjacent areas (http:// zones ismic he. mi. ingv. it/), but where there exist only a handful of unproductive wells and virtually no productive wells (Figs. 1, 2).
The combination of these circumstances shows (a) that the wells are not randomly distributed in space, as an ideal statistical analysis would require, and (b) that certain "extreme" areas, where the absence of productive reservoirs has been known for decades, are severely undersampled. Nevertheless, we maintain that this inevitable imbalance of our sample is at least partially compensated by its richness.
Our elaborations also show that productive gas wells-and therefore, major gas reservoirs-occasionally occur above a seismogenic fault. This departure from their expected behaviour could be related to local heterogeneities with respect to the predominant Plio-Pleistocene stratigraphic setting of the Apennines fold and thrust belt and foredeep. Such variations could indeed occur at a local scale, but the dataset we analysed reveals the nearly exclusive occurrence of Plio-Pleistocene gas-charged or sterile sandy-silty sedimentary successions 15,18,19 .
Aside from stratigraphic considerations, the occurrence of a gas reservoir above a seismogenic fault may simply indicate that the fault is poorly located, for example because it is deeper than assumed in the DISS database, or alternatively, that there is a flaw in the physical behaviour implied in the proposed anticorrelation between faults and productive wells. For ITIS100 and ITIS141 the first scenario is more likely, as these two sources are held responsible for earthquakes that occurred over three centuries ago; for ITIS070 one might consider that the productive well overlying it reaches a depth of 2.5 km, whereas the fault is assumed to extend between 4.5 and 8.7 km depth.
Since all fault data are inherently uncertain within a few km, and in consideration of the large number of data analysed in this work, we consider that the trend highlighted by the statistical analysis, and confirmed by the additional tests described in the "Methods" section and shown in the Supplementary Information, describes well a common behaviour. Individual outliers will of course deserve future, more focused investigations.

Structural interpretation of the results.
Having established the statistical significance of the results, the next question is: what relates earthquakes and hydrocarbon reservoirs? Unquestionably, the success or failure of hydrocarbon exploration in a fold and thrust belt is primarily controlled by the relative timing of source rock maturation, hydrocarbon migration and trap development. The outcomes of our study, however, strongly suggest that an equally crucial pre-requisite for the formation of an efficient gas reservoir is that the deposits overlying the reservoir-bearing formations be unaffected by active fractures and faults which might allow fluids to escape to the atmosphere. This condition is not warranted in earthquake-prone areas 21 , where sudden (seismogenic) fault slip may exceed the ductility of the sealing layers, causing brittle failure and fluid pressure breach.
A close inspection of Fig. 1-and of the ample literature that lies behind it (see the Introduction)-shows that most productive reservoirs are hosted in small-scale anticlines (Fig. 4a), generated by faults that are shorter and narrower with respect to the deep and large faults driving long-wavelength folds that may generate significant earthquakes, and where gas is generally not found (Fig. 4b). In the DISS database the larger and positively seismogenic faults responsible for creating potentially large folds are listed as Individual Seismogenic Sources (ISSs). Smaller folds/reservoirs are either delineated as Composite Seismogenic Sources (CSSs), implying that they may generate M w 5.5 + earthquakes but their ability to slip seismically vs. aseismically is currently unknown, or are not mapped at all, implying that they are considered just too small to generate M w 5.5 + earthquakes.
Mucciarelli et al. 21 contended that if the fault that controls the evolution of a potential hydrocarbon trap generates a significant earthquake, the integrity of the sealing horizons will inevitably be jeopardised. They also stressed that any subsequent gas loss is not caused by the earthquake shaking per se, but rather by the actual fault slip and its consequences. A M w 5.5 + earthquake is necessarily the result of a few tens of cm of coseismic slip of a fault that may extend over a considerable thickness of the upper crust-generally between 2-3 and 8-10 km depth in our study area-and potentially up to the surface; larger events will cause larger slip over a bigger fault, whereas the slip associated with smaller events will simply be too small and too deep to affect the integrity of the sealing horizons. www.nature.com/scientificreports/ Coseismic strain may propagate upward following different mechanisms, including the presence of overpressured fluids, leading to the creation or rejuvenation of fractures and faults that may immediately act as pathways for fluid escape. Such mechanisms are compatible with the limited distance we calculated between the location of unproductive wells and the location of known seismogenic faults-less than 2 km for the 2D case, less than 6 km for the 3D case (see Tables 2 and S2)-even taking into account the uncertainties discussed earlier. As we stated earlier, pushing our interpretations to areas where only earthquakes smaller than M w 5.5 are expected would be inappropriate, as the coseismic strain caused by their causative faults would simply be too small to activate the mechanisms advocated in this work.
Notice that the crustal volume located at the tip of a thrust fault has long been known to be characterised by extensive fracturing, allowing for an efficient migration of fluids toward the surface [24][25][26][27][28] . The density of fractures and the wide range of their orientation is typical of the fault damage zone, a volume of intense rock deformation resulting from the initiation, propagation and build-up of slip along the main underlying fault [29][30][31][32] .
All of these processes are well described in the literature and have been extensively investigated with analogue and analytical models, but they have seldom been observed in the field. A notable exception is discussed by Sciarra et al. 33 , who investigated in detail the release of underground fluids, including CH 4 , following the 29 May 2012, M w 5.9 earthquake near Mirandola (southern Po Plain). The event was generated by a thrust fault that lies beneath the Mirandola anticline and was assumed to be potentially seismogenic as early as 2000 in a prototype of the DISS database. At least six unproductive wells (Bignardi, Camurana, Cavone, Concordia, Medolla, San Biagio) lie on the surface projection of the fault, supporting the anticorrelation advocated in this work. The area located just northeast of the village of Medolla has been known at least since 1838 for the existence of gas macroseeps located exactly above the culmination of the buried Mirandola anticline (Fig. 5), as suggested by subsurface geology and topographic evidence and by the pattern of coseismic elevation changes following the 2012 earthquake 34 . The combination of this evidence with the data presented by Sciarra et al. 33 suggests that these macroseeps are the surface evidence of extrados normal faults associated with the buried Mirandola anticline (Fig. 5).
Their CH 4 flux measurements show that the sealing horizon of the local reservoir is partially efficient during the interseismic period, but fails catastrophically following an earthquake and the subsequent remobilization of the extrados faults, causing accelerated degassing for at least five years.
The case of Medolla illustrates a mechanism of progressive loss of underground fluid pressure, ultimately motivated by earthquake activity but not restricted to the coseismic phase. The leakage occurs along a vertical conduit, as observed also by Gartrell et al. 26 , as a result of the reactivation of smaller extrados faults by the underlying master and seismogenic fault.
Based on the number of wells that have been unsuccessfully drilled in the area one should conclude that the geological setting of the area is fully compatible with the existence of an effective hydrocarbon reservoir associated with the Mirandola anticline, but also that the integrity of this reservoir is long gone. www.nature.com/scientificreports/ Seismic hazard implications. Our dataset shows that in a fold and thrust hydrocarbon province the lack of productive gas reservoirs is likely to be controlled by seismogenic faulting. Conversely, the presence of significant productive reservoirs is in itself an indication of a predominantly aseismic behaviour of the underlying faults; in other words, the ability of a thrust fault to generate significant earthquakes depends primarily on the rheology of the basin deposits. Their behaviour ultimately controls the fate of the reservoirs formed at the core of the largest anticlines; favouring their integrity by promoting aseismic slip over the underlying fault, or causing gas to be lost to the atmosphere by promoting stick-slip. Alternatively, the presence of small-to mid-size reservoirs could be compatible with the presence of faults displaying stick-slip behaviour but capable of smaller earthquakes (< M w 5.5), i.e. faults whose width and slip-per-event are too small to jeopardise the reservoir integrity.
These results are consistent with the observations of Carafa et al. 7 , who showed that the seismic moment released by Italian contractional domains is about 50% of the total moment expected based on fault geometries and slip rates. Such conclusions may imply that Italian thrust faults alternate seismic and aseismic phases in equal proportions, but our findings suggest that this is an unlikely scenario. If a reservoir is unproductive, this is not necessarily an indication that the underlying fault is active, and even less so, that it may generate significant earthquakes, but the reciprocal still holds: if the reservoir is (or has been, before exploitation) fully productive, its controlling fault may be active but is very unlikely to be seismogenic, which is what counts in any seismic showing that the emissions fall on a local altimetric culmination and in the area of largest coseismic uplift. Such mechanism of continuous degassing is most likely due to extrados faulting; a mechanism that has been shown to dominate in the northern Apennines, at the western end of our study area 35,36 , and may be common elsewhere. We contend that a similar mechanism might explain many of our observations over the entire study area, and over any similar areas around the globe. See text for further discussion. Base map and profile courtesy of Google Earth Pro v. 7 www.nature.com/scientificreports/ hazard analysis. If no other indication of aseismic behaviour is available, for instance from GPS, this evidence alone could justify a significant seismic hazard reduction over thrust faulting systems that exhibit little or no long-term seismicity.

Worldwide analogs. Our hypotheses are qualitatively supported by other hydrocarbon-bearing fold and
thrust belts developed at convergent plate boundaries worldwide (see Figure 2 of Cooper 37 , and Figure 1 of Goffey et al. 38 ); they contain 14% of the global known reserves, half of which (49%) lie beneath the Zagros fold belt of Iran, Iraq, and Turkey 37 . We selected three specific actively deforming areas that share a comparable geological setting with our study area and reaveal, even at a rather low resolution, that an anticorrelation between the largest earthquakes and the location of gas fields does exist: the Aquitaine Basin (southwestern France), the Zagros-Persian Gulf area (southern Iran), and the Bengal Basin (Bangladesh-India boundary).
Aquitaine basin. This historical hydrocarbon province is located in the southwest of France, covering an area of about 35,000 km 2 (see Figure 1 in Bahnan et al. 39 ). It comprises several geological domains, including the Pyrenean active fold and thrust belt and its foreland, formed as a consequence of the subduction of the Iberian plate beneath the European plate 40 . To the north, the leading edge of the fold and thrust belt hosts the most important hydrocarbon field of metropolitan France, the Lacq gas field 41 . Gas was confined in a structural trap within the pre-Aptian carbonate sequence located in an anticline at a depth of 3500-4050 m, below the oil field, located at 650-700 m depth. Gas was exploited until 2013, when the field became one of the most important pilot sites for Carbon Capture and Storage (CCS 42 ).
Zagros fold and thrust belt. This large belt forms the external portion of the Zagros active orogenic wedge. It is the most recent result of the convergence and closure of the Neo-Tethys oceanic domain between Arabia and Eurasia 43 . In the Zagros fold and thrust belt the Arabian passive margin sequence has been decoupled from its basement and deformed by large-scale folding and thrusting. A rich seismicity record indicates that within the underlying Panafrican basement, shortening is presently accommodated by reverse faulting 5,44,45 .
The Zagros fold and thrust belt comprises one of the most prolific hydrocarbon provinces worldwide (see map by Esrafili-Dizaji and Rabbani 46 ). Most of the gas fields are concentrated in the Fars region and in the contiguous offshore area and accumulated in the Permo-Triassic carbonates, within the Kangan and Dalan formations, the most important carbonate reservoir rocks in the southwestern part of Iran 47,48 . Gas reservoirs lie at depths ranging from 2600 to 3600 m (e.g. Salman gas field 48 ).
Bengal basin. This large, highly productive and still largely unexplored hydrocarbon province lies on the eastern side of the Indian subcontinent, between the Shillong Plateau to the north, and the Indo-Burman Ranges to the east, in the Tripura district (e.g. Figure 2 in Brahma et al. 49 ). This basin originated during the collision of India with Eurasia and Burma, building the extensive Himalayan and Indo-Burman mountain ranges, and thereby loading the lithosphere to form flanking sedimentary basins 50 . The Bengal Basin is a prolific petroleumbearing basin where 25 economically-viable fields provide about 18% of India's known reserves. It contains up to 22,000 m of Cretaceous to Holocene sedimentary fill 51 , which includes 4000-5000 m of hydrocarbon-bearing Surma Group sandstones 52 , deposited as a consequence of major Miocene uplift in the Himalayas which funneled large volumes of sediments into the basin.
Unfortunately, none of these regions features rich, reliable and openly accessible databases such as ViDEPI and DISS. For this reason, our hypotheses here will need to be supported by targeted investigations. In Italy, hydrocarbon data have become available after a 60 years-phase of intense exploitation, and solid earthquake and seismogenic source data are regularly made available by INGV through an agreement with the Italian Civil Protection Department. Yet, the lesson learned from Italian data may be useful for scientists and hydrocarbon industry practitioners who are puzzled by the relations between the exploitation of gas reservoirs and the occurrence of significant earthquakes.
Understanding the relationships between gas-dominated hydrocarbon fields and seismogenic sources has crucial implications for three independent issues of great societal relevance: (a) supporting the decision-making in the exploitation of as yet unexplored gas pools; (b) offering crucial insight into the seismic hazard of areas undergoing active crustal shortening, and specifically on the seismic/aseismic ratio of the local strain release; and (c) providing guidelines for the identification of suitable underground energy and CO 2 storage sites, which represent one of the viable options for a gradual transition from fossil fuels to a carbon-neutral economy 53 . Our findings indicate that the best option in planning such facilities is to stay away from large seismogenic faults and opt for a depleted reservoir, as choosing a reservoir whose past performance is unknown would greatly increase the risk of dispersion of CH 4 (or CO 2 ) to the atmosphere. In fact, all existing Italian gas storage sites run by Stogit-SNAM and by Edison Stoccaggio SpA totally leave out the Rimini-Ancona stretch (white rectangle in Fig. 1), the highest seismic hazard portion of our study area.

Methods
To investigate any spatial correlation between the location of productive gas fields and the occurrence of strong earthquakes (M w 5.5 +) in our study area (Fig. 1) we adopted the Weight of Evidence (WofE) method, a bivariate statistical analysis based on the Bayesian probability framework 22  The DISS contains information on seismogenic faults located in the Italian peninsula and the surrounding regions. The information is supplied through two main categories of seismogenic sources 54 : Individual (ISSs), and Composite (CSSs). The ISSs are simplified (rectangular) 3D representations of seismogenic fault planes, generally associated with significant known earthquakes. For each ISS the database provides a full set of geometric (strike, dip, length, width, and depth), kinematic (rake) and seismological parameters (single event displacement, slip rate, recurrence interval), including the expected maximum earthquake size. In contrast, the CSSs are simplified 3D representations of crustal fault systems that are not segmented nor associated with one or more specific earthquakes: each CSS may hence contain an unknown number of ISSs. For this study we considered the eighteen ISSs falling within the study area (Fig. 1a), whose parameters (including the most recent earthquakes that can be associated with each of them) are summarised in Table 1. All ISS are also shown using a 3D perspective in Fig. 1b, which allows the reader to appreciate the relationships between their position and the location of the productive and unproductive gas wells, considered as proxies for the relevant reservoirs. The CSSs are not used in this work, but are shown in Fig. 1 to highlight the continuity of the main seismogenic trends.
The ViDEPI database of Italian hydrocarbon data. We carried out a comprehensive analysis of data from 1808 wells, a subset of a large body of information acquired since 1957 by several oil companies and made available by the Italian Ministry of Economic Development within the ViDEPI project (http:// www. videpi. com/). Current Italian regulations establish that the oil companies shall provide the Ministry with technical reports on the activities that have been carried out. The data, which were originally available only as hard copies, have been scanned, geo-referenced and distributed free of charge through the project website. The available data consist of composite logs containing the following information: (1) lithology, (2) geological formation name(s), (3) formation(s) age(s), (4) depths, (5) litho-stratigraphy, (6) fluid occurrence, (7) depositional environment of each formation, (8) biostratigraphy, and (9) geophysical logs. Pressure and temperature values are sometimes reported, but little or no information is provided concerning the actual exploitation history of the hydrocarbon.
We analysed in detail each well falling within the study area, gathering the parameters needed for our investigation and for assessing their reliability based on the quality of accompanying information. We first removed 49 poorly documented wells, 59 shallow gas wells (< 500 m) and 49 dominantly oil wells, as they do not provide relevant information for our scopes, thereby focusing exclusively on deeper gas wells. We then subdivided the remaining 1651 wells into two categories: (1) positively unproductive, when they encountered no exploitable hydrocarbons, and (2) positively productive, when they did encounter hydrocarbons, regardless of whether they have been or are currently being exploited.
Building a model for the GIS analysis. Information regarding the selected wells was loaded to ArcGis along with the 18 selected Individual Seismogenic Sources from the DISS database (out of a total of 127 ISSs in DISS' current version: Fig. 1a). Thanks to the information provided by the ViDEPI, all wells were first projected down to their actual maximum depth, then classified based on the criteria delineated above and on the depth of the exploited hydrocarbon pool (Fig. 1b).

Weight of Evidence (WofE) method. This method consists of a bivariate statistical analysis based on the
Bayesian probability framework 22 . It allows the association of potential evidence in support of a given hypothesis and is easy to run within common GIS software packages. We adopted it to evaluate the spatial relationships between the distribution of productive or unproductive gas wells and the location of major seismogenic faults.
The WofE method is commonly used in the geostatistic field, for example for identifying potentially productive mining areas, locating flowing wells, or mapping cliff instabilities associated with mine subsidence 57 . Recently it was used also for assessing landslide susceptibility [58][59][60] . The computational framework of the method 22,57 is briefly summarised below.
The method estimates the weight for evidence (W), which can be positive or negative: where P{F|E} and P F|E are the conditional probabilities of being and not being within the factor F, given the presence or the absence of the event E. P F|E and P F|E indicate the conditional probability of not being within the factor F, given the presence or the absence of the event E. The difference between the positive and negative weights is defined as the weight contrast C (= W+ − W−), which provides a measure of the spatial correlation between a certain factor and the occurrence of an event (see Tables 2, S2, and Figs. 2, S2). A negative C indicates the absence of a factor where an event occurs, whereas a C around zero implies no significant relationship with the occurrence of the event.
(1)  61 . Finally, the Studentised Contrast C/S(C) provides an assessment of the quality of the results.
To estimate the conditional probabilities of Eqs. (1) and (2) we first discretised the entire study area with a regular 10 × 10 m grid comprising 31,541 × 39,330 pixels (lines × columns). We then calculated the distance of each pixel to the nearest seismogenic fault, following two strategies: in 2D (planar), or in full 3D, which we consider more appropriate for investigating the motivations of the statistical evidence. These distances correspond to the factor F in Eqs. (1) and (2).
For the 3D case, we calculated the shortest distance from the bottom of each well to the surface of the nearest seismogenic fault plane. In contrast, for the 2D case we calculated the shortest distance between the head of each well and a point along the external boundary of a 2 km-wide buffer area encircling the surface projection of the nearest seismogenic fault. All calculated distances were then grouped into eight bins: 0-2, 2-5, 5-10, 10-20, 20-30, 30-40, 40-50, 10-50 and > 50 km for the 2D case; and 0-4, 4-6, 6-10, 10-20, 20-30, 30-40, 40-50, 10-50, > 50 km for the 3D case, respectively. Notice that the width of the distance bins was decided based on a series of trial-and-error tests, aimed at exploring any possible bias in the results due to the choice of the bin size. We maintain that the selected bin widths allow for a fine sampling of well-fault distances for the shorter distance values, which is crucial for any geological and geophysical interpretation of the results, without generating any undesired bias. Finally, each pixel was assigned to a group, depending on the above classification. Tables S1-S2 and Figs. S1-S2 show the distribution of gas wells and the spatial distribution of the different classes of pixels with respect to the binning scheme (Fig. S2 shows the 3D case only).
Based on Eqs. (1) and (2), the weights of evidence can be modified (when analysing one group at a time) in numbers of pixels, as in the following: where Npix 1 is the number of pixels-that is to say, wells-falling within the considered distance bin (e.g., the 0-2 km interval); Npix 2 is the number of pixels that correspond to a gas well (either productive or unproductive) but fall outside the considered distance bin; Npix 3 is the number of pixels related to a potential event predictive factor (selected group of distance) but without any wells; and Npix 4 is the number of pixels where neither the considered potential event predictive factor nor a well are observed.
The predictive capabilities of the proposed model were checked through a cross-validation procedure which consists in subdividing a data sample into two subsets 62 . The analysis is then performed on one subset, referred to as training dataset, which includes 60% of the wells, randomly selected from the whole available sample, while the validation is carried out on the remaining 40% of the data, referred to as test dataset. In order to cover the entirety of the training/validation data subsets, this random procedure was repeated ten times for the 2D case and 10 times for the 3D case: Table S3 summarises the variability of the C obtained for the ten tests and the associated standard deviation, which are encourangingly low.
Area under the curve test. The validity and the accuracy of the C distributions were tested using the success-rate and prediction-rate curves in combination with the Area Under the Curve (AUC): a method that provides a measure of the total accuracy based on the rate curves, where a total area equal to one indicates perfect accuracy 63 (Fig. 3). The success-rate curve, which is based on the training dataset, shows how good the selected parameter curves are in fitting the known wells (productive or unproductive). The prediction rate curve, which is based on the testing dataset, provides quantitative information for forecasting the position of a productive or unproductive well. The most common procedure involves sorting in descending order the calculated index values that refer to the total number of cells in the study area.
Moran's Index tests. Finally, we carried out both global and local Moran's Index tests, highlighting (a) that the dataset is clustered, and (b) that the unproductive and productive wells cluster differently as their location relative to the seismogenic faults changes (see Figs. S3-S6).

Data availability
All hydrocarbon data used in this work come from the ViDEPI project database (https:// www. videpi. com/ videpi/ videpi. asp). All seismogenic source data were obtained from DISS, the Database of Individual Seismogenic Sources, v. 3.3.0 (http:// diss. rm. ingv. it/ diss/). The underground storage facilities operated by Stogit-SNAM SpA