An updated and unified earthquake catalog from 1787 to 2018 for seismic hazard assessment studies in Mexico

Here we present a new updated and unified Poissonian earthquake catalog for Mexico. The details about the catalog compilation, the removal of duplicate events, unifying the magnitude scales, removal of dependent events through the declustering process and its completeness analysis are presented. Earthquake and focal mechanism data have been compiled from various local, regional and international sources. Large earthquake events (MW ≥ 6.5) have been carefully revised for their epicentral locations and magnitudes from trusted publications. Different magnitude-conversion relationships, compatible with available local and regional ones, has been established to obtain unified moment magnitude estimates for the whole catalog. Completeness periods for the declustered catalog were estimated for the definition of appropriate seismic source models for the whole territory. The final unified Poissonian earthquake catalog spans from 1787 to 2018, covering a spatial extent of 13° to 33°N and 91° to 117°W. This catalog is compatible with other published catalogs providing basis for new analysis related to seismicity, seismotectonics and seismic hazard assessment in Mexico.


Background and Summary
The occurrence of seismic events in numerous regions of the world, especially those with resultant losses in human lives, have highlighted the urgent necessity of implementing specific regulations on the seismic design codes for each specific region. Long-term earthquake hazard assessment is one of the most important tools for seismic risk mitigation and the reduction of financial and life losses related to such catastrophic events. Besides that, the construction of an early warning system together with the public awareness for natural disasters are essential complementary actions.
The fundamental information necessary for any seismic hazard study is the most complete seismic record possible. This record, also termed the seismic catalog, should include at least the spatial coordinates of the epicenters, times of occurrence together with magnitudes of the earthquakes that took place in the region of interest. The quality and homogeneity of such information is reflected directly in the final seismic hazard results. Therefore, earthquake catalogs as well as focal-mechanism catalogs to provide a deep understanding of the seismotectonic setting of the area of interest are basic to develop a reliable seismic source model. A seismic source model together with a representative ground motion attenuation model considering the local site conditions are the primary components required to carry out an appropriate seismic hazard study. Instrumental earthquake catalogs show the overall seismicity of the Earth since about 1904 (e.g., ISC-GEM catalog). However, examining and inspecting the regional historical earthquakes, in addition to the instrumental recorded events, is essential to understand the long-term seismicity.
Mexico is situated in one of the most active seismic belts of the planet. Its tectonic setting is highly complex. Most of the active seismic regions in and around Mexico are related to the interaction among five tectonic plates (Supplementary Figure I). One of the most important is the subduction of the Cocos and the Rivera tectonic plates beneath the North American plate along the Middle America Trench in the southern coast of Mexico.
www.nature.com/scientificdata www.nature.com/scientificdata/ Methods Catalog compilation. During the past few decades, large efforts from many different researchers and institutions has been made in order to improve specific earthquake catalogs (specially for the largest events) for specific regions and states in Mexico (e.g. [4][5][6][7] , among many others). A number of local and national catalogs using different criteria and with different characteristics, time periods, data formats, and completeness intervals have been achieved. One of the major objectives of this work for Mexico is to develop a new updated and unified earthquake catalog based on the integration of international data sources, the SSN Mexican national network and any other related earthquake bulletins.
The first step towards the unified earthquake catalog was surveying all the available national and international data sources. Next, to unify the used format for the collected data from the different bulletins. All available parameters (e.g., origin time, geographic location, reported magnitude sizes and formats, and reference code for each data provider) have been included. The initial compiled data included all earthquake data with magnitudes equal to or greater than 4.0. The compiled catalog (also considering duplicated events) included about 84,000 events. This work implied a major struggling/challenge due to the huge number of the reported earthquakes and the major differences in the data formats and quality among the data sources. The compiled data suffered from duplication, incompleteness, and errors in both the geographic locations and the focal depths. Large effort and much time were necessary to evaluate and choose the information between the different data sources (specially for events over M 6.5) and erase the duplicate earthquake records. For those historical events (before the year 1900) and instrumental large earthquakes (over M 6.5), a detailed inspection for the previously-available publications has been made to check both the most reliable location (latitude and longitude), depths and the magnitude sizes.
In the following, we are listing the different bulletins, catalogs and sources that have been used (arranged according to priority) in the compilation of our earthquake and focal mechanism catalogs.
Published peer-reviewed articles (for M ≥ 6.5 events). For those historical events (before the year 1900) and instrumental large earthquakes (over M 6.5), a detailed inspection for the previously-available publications has been made to check both the most reliable location (latitude and longitude), depths and the magnitudes. The following published works have been inspected specifically for Mexican earthquakes:  31 and Ambraseys and Adams 32 : The first paper presented the results of the Ms computation of Central American earthquakes for the period from 1898 to 1930, while the second paper discussed the re-examination of macroseismic information for large earthquakes (≥Ms 7.0) for the same region and for the time period from 1898 to 1994. They mentioned that the locations of the more important earthquakes were revised using a combination of macroseismic information and instrumental readings. • Santoyo et al. 5 : In this work, an estimation for the center of the rupture area of 24 shallow thrust earthquakes (Ms ≥ 6.9) was presented. This estimation was mainly based on their aftershock areas, or inferred from empirical relationships, e.g., Utsu and Seki 33 and Wells and Coopersmith 34 . This useful information has been considered in the final catalog. • Other references have been considered for specific regions during the compilation and revision of large earthquake events (M 6.5) in this work. For example, for those events that occurred in northwestern Mexico, in the region of Baja California 6,35,36 . In addition, other references for the largest events along the Mexican subduction zone 30,37-41 were accounted. Moreover, some global and regional catalogs 42,43 were also considered, in addition to the previously listed sources.
www.nature.com/scientificdata www.nature.com/scientificdata/  Catalog merging. During merging of the previously-mentioned earthquake data, collected from different catalogs and bulletins, it was crucial to avoid any possibility for earthquake duplication. The merged earthquake data has been presented displaying for each event its date (year/month/day), time (hour/minute/second), geographic location (longitude, latitude), depth (in km), and reported magnitudes (Mb, Ms, Mw, MD, and ML) (see Table 1). Different codes have been included to define the source for the magnitude sizes for each event. Duplicated earthquakes were identified based on their geographic location and date/time of the earthquake, and finally lower-priority events have been removed from the compiled catalog. This has been done by carefully inspecting the records that correspond to the same event in the obtained catalog.
The merging process has been performed following the same criteria in Sawires et al. [65][66][67][68] . Potential duplicate events displaying a difference in the origin time less than one minute and a difference in their locations less than one latitude/longitude degree have been identified. All such records that are satisfying these two conditions have been examined manually to analyze individual cases. In this regard, because the ISC bulletin uses earthquake data collected by different seismological networks all over the globe, their locations are generally considered by the user as the basis for this work. For other events not included in the ISC bulletin, the location provided to these events by a local agency is considered. However, a preference for the parameters (geographic coordinates and origin date/time) reported by local and national sources (especially for large events already studied and reported in published papers) has been taken into account rather than those come from regional or international sources.
In terms of magnitudes (Table 1), the compiled earthquake data has been described by a number of different magnitude scales. All these magnitude types have been included for the collected events, as well as a specific code assigned to each magnitude source. However, the Mw has been preferred, followed by the Ms and Mb magnitudes. In some cases, more than one Mw magnitude are available for the same earthquake but from different sources. In such cases, the value coming from the Global CMT catalog has been chosen.
Unifying the catalog. Different magnitude scales have been considered by several researchers (e.g. 9,10,[69][70][71][72] during the past decades. ML is the earliest magnitude scale used as an instrumentally-measured estimation of the earthquake size 69 . In the 1960s, the Mb was introduced to be reported in the ISC and NEIC bulletins by the USGS and the National Oceanic and Atmospheric Administration (NOAA), in conjunction with the establishment of the World-Wide Standard Seismograph Network (WWSSN). Later, the Ms was introduced by the NEIC bulletin and it was accepted later to be used by the ISC bulletin 73 . The main problem in the application of these scales is that they saturate for large earthquakes, which leads to the underestimation of magnitude for large earthquake events. In addition to this question, their behaviors are different over the whole magnitude range 74,75 . To overcome such problems, a new non-saturating magnitude scale (Mw) was proposed by Hanks and Kanamori 71 . This scale is based on the total scalar seismic moment released during the rupture of an earthquake. Seismic moment, and thus the Mw, is mainly controlled by both the fault/rupture area, the average dislocation, and the rigidity of the medium.
In the present work, it was required to unify the magnitude scale and homogenize the earthquake catalog, as much as possible, with respect to the Mw scale. This is because the prevailing seismic hazard assessment accept only a non-saturated magnitude. A huge number of empirical relationships is presented in the literature between the Mw and other classical magnitude scales. Some of these relationships were derived from global earthquake data sets (e.g. [76][77][78], and others by using earthquake records from different seismotectonic environments (e.g. [66][67][68][79][80][81][82][83][84][85][86][87]. In this work, a number of well-established regression relationships between the different reported magnitudes and the Mw has been specifically developed. Such relationships are those established from our database after studying and comparing with other magnitude relationships (e.g. 80 ) in the scientific literature. In our final catalog (see the uploaded Microsoft Excel file entitled "Earthquake catalog (1787-2018) for Mexico" 88 ), the initially reported magnitude scales have been included, in addition to the final equivalent Mw*. This allows interested researchers to use other type of magnitude scales to unify the catalog, or to use directly other empirical relationships to estimate the unified magnitude.
In this work, the equivalent Mw* values were computed for each earthquake dataset from the reported magnitudes. First of all, for events that were defined originally with a reported Mw, this was finally used as the equivalent one. Second, for those earthquakes that were defined by a reported Ms magnitude, a second-degree polynomial www.nature.com/scientificdata www.nature.com/scientificdata/ fitting between the Ms and the Mw magnitudes (Eq. 1 and Fig. 1a) was assessed from the current catalog, using 458 events (4.0 ≤ Ms ≤ 7.9) and covering the time period from 1900 to 2017 ( Table 2). The derived empirical relationship is similar to the Johnston 80 , and Scordilis 77 equations. Then, the obtained equation was used to convert such reported Ms values to the equivalent Mw* scale. Third, for those events that were defined with the reported Mb magnitude, a linear "Ordinary Least Square OLS bisector" fitting 89 (Eq. 2 and Fig. 1b) between the Mb and the Mw values was performed. 712 earthquakes (4.0 ≤ Mb ≤ 7.1) covering the time period from 1976 to 2017 (Table 2) were employed to assess this fitting. Finally, for earthquakes with reported mD and ML magnitudes, the OLS bisector method has been used, as in the previous case, to establish a linear relationship (Equation 3 and  (Table 2) has been used to develop this relationship. This relation fits jointly both MD (from SSN-Mexico and SNET) and ML data (from SNET-UCR). There is no a remarkable difference in the behavior of both used data sets. So, we applied the same relationship for all MD and ML data to be converted into Mw magnitude scale.
In the final unified catalog (see the attached Microsoft Excel File entitled "Earthquake catalog (1787-2018) for Mexico" 88 ), a specific code has been included to show the fitting relationship that has been applied to obtain the final equivalent Mw* for each reported event. The temporal distribution of the unified earthquakes included in the up-to-date catalog is plotted according to their magnitude (Supplementary Figure IVa) and number (Supplementary Figure IVb). The obtained unified catalog defined with the Mw scale has been plotted in Fig. (2a) for different magnitude ranges. Although the largest earthquakes are mainly concentrated along the plate boundaries (Fig. 2a), seismicity also occurs in other regions. The quietest seismic areas are mainly located far from the plate boundaries, towards the north and northeastern regions of Mexico.

Catalog declustering. The spatial and temporal distribution of earthquakes is in general inhomogeneous.
Computations of probabilistic seismic hazard for any region is usually based on the assumption that earthquake recurrence follows an independent distribution (memory-less process) in space and time (Poissonian distribution) (e.g. 90,91 ). Therefore, foreshocks, aftershocks and seismic swarms (as dependent events) should be identified and erased through out what is called a "declustering process" since they violate the assumption of independency for earthquakes 92 . Foreshocks and aftershocks are temporally and spatially dependent on the mainshock. However, their identification is to a large degree subjective, since there are no physical differences between the foreshocks and aftershocks, as dependent events from one side, and the mainshock from the other side. As a result, earthquake clusters are typically defined by their closeness in space and time. In the declustering process, being the earthquakes arranged in space and time, the mainshock is considered as the event having the highest magnitude in a specific seismic sequence, i.e., in a specific spatial and temporal window. This process will result in a new declustered catalog containing only independent events, i.e., mainshocks.
Concerning the declustering process, there are different methodologies and algorithms that have been proposed by several researchers (e.g. 3,[93][94][95] ). The main difference among these statistical methods is the selection of the size of spatial and temporal windows, while the common factor among these methods is that the larger is the magnitude of the independent "mainshock" event, the larger is the defined spatial and temporal windows size. In this work, the dependent events have been identified and erased from the compiled catalog by using the same spatial and temporal windows parameters proposed by Gardner and Knopoff 3 . Given a certain Mw-earthquake, a full  www.nature.com/scientificdata www.nature.com/scientificdata/ scan within a specified distance L(Mw) and time T(Mw) was performed for the whole unified catalog (e.g. 66,96,97 ). Throughout this scan (see the uploaded compressed file entitled "FORTRAN CODES" 88 ), the earthquake having the largest magnitude is considered to be the mainshock, and all events occurring within the L(Mw) and T(Mw) windows are declared as dependent events and erased from the catalog. Spatial and temporal window sizes of 36 km and 188 days for an Mw 4.0 event, and 100 km and 900 days for an Mw 8.0 event were used in the current declustering process. For earthquakes having in-between magnitudes, spatial (L) and temporal (T) windows sizes are computed according to the following equations: www.nature.com/scientificdata www.nature.com/scientificdata/ Applying the previously-mentioned Gardner and Knopff 3 algorithm, a total of 5,160 events are representing the final number of mainshocks (≥Mw 4.0) in the declustered catalog for Mexico, covering the spatial area between 91° and 117°W longitudes, and 13° and 33°N latitudes, during the time period from 1787 to 2018 (see the uploaded Microsoft Excel File entitled "Earthquake catalog (1787-2018) for Mexico" 88 ). Magnitudes below 4.0 are not considered in the current work, due to these events are usually not included in seismic hazard studies and having a very low completeness period.
The epicentral distribution for the mainshocks has been plotted in Fig. (2a). In addition, the uploaded supplementary Microsoft Excel ™ file entitled (Largest Earthquakes 88 ) displays the most energetic (≥Mw 6.5) earthquakes taken place in Mexico throughout the catalog period (1787-2018). References has been included specifically for each event (for the epicentral location, magnitude and depth values).
Focal-Mechanism solutions. Earthquake focal mechanisms are essential in seismotectonic studies. They are illustrating the relationship between earthquakes and their causative fault. Thus, they provide very useful information about the tectonic activity of the studied region. Focal-mechanism solutions for significant earthquakes that taken place in Mexico were collected mainly from the Global CMT catalog and the ISC online bulletin, as well as peer-reviewed articles (e.g. 35,[98][99][100][101]. For the Global CMT catalog, solutions are provided by Harvard University 102,103 (http://www.globalcmt.org/). This catalog covers the time period from 1976 to 2014. All events included in this catalog are expressed using the Mw scale computed according to the Kanamori 70 procedure. In addition, Mb and Ms magnitudes are also included for some earthquakes. A number of 784 (over Mw 4.0) solutions have been compiled from the Global CMT catalog (Fig. 2b) for the Mexican earthquakes, expressed by the two nodal planes; for each nodal plane, the strike, dip and rake values are displayed. On the other hand, for those solutions gathered from the ISC bulletin, they are aggregated mainly from a number of national and international sources (e.g., Global CMT and NEIC-USGS bulletins). A number of 1,545 solutions expressed in the Mw scale (over Mw 4.0), and covering the time period from 1963 to 2015, have been compiled for events taking place in and around Mexico (Fig. 2b).
Altogether, a number of 1,236 of events (over Mw 4.0, and from 1963 to 2015) have been obtained from both Global CMT and ISC sources, as well as published papers, after the removal of duplicated focal-mechanism solutions. An electronic supplement (see the uploaded Microsoft Excel ™ file entitled "A catalog of focal mechanism solutions (1963-2015) for Mexico" 88 ) has been attached to this work to show the focal mechanism solutions (values of strike, dip, and rake) for the studied events that have been collected from different sources and publications and have been plotted in Fig. (2b).

Completeness analysis.
An earthquake catalog must be as complete as possible with respect to relative frequency of the earthquake occurrence with time. Threshold or cutoff magnitude, also known as completeness magnitude (Mc) is defined as the lowest magnitude value at which all earthquakes in a specific space-time domain are reported 104 . Mc is a critical parameter in the estimation of the seismicity parameters (a-and b-values) when using the cumulative linear Gutenberg and Richter 105 relationship. Without appropriate completeness intervals for the catalog, estimated seismicity recurrence parameters would be biased, and hence will lead to skewed estimations during the assessment of probabilistic seismic hazard. It is well known that earthquake catalogs get sparser and more uncertain once looking backward in time. In fact, completeness periods vary with time. For large earthquakes, the completeness period extends back to the pre-instrumental or the historical times, while for small-magnitude earthquakes, the completeness period is achieved only within the most recent decades of the instrumental epoch. This change in the level of completeness is mainly related to the deployment and development of the seismic networks, the increasing in the sensitivity of seismographs, and also to the significant increasing in the network coverage during the recent decades.
Identifying threshold magnitude and its spatial and temporal variations is a controversial task which does not has a single procedure to address it. The cumulative method (e.g. [106][107][108] ) is used here for the estimating of the completeness periods. By applying such method, a simple graph is usually plotted between the cumulative number of earthquakes vs. time for a specific magnitude range (e.g., ≥Mw 4.0 or ≥Mw 6.0). The catalog is considered complete (for this particular magnitude range) with respect to time when there is approximately a straight trend (constant average slope) of the plotted data. In this case, the completeness period will be the number of years from the start of this straight-slope segment until the last year of the catalog. This method is considered to be accurate and efficient even when it is applied to a small set of earthquake data.
Completeness periods and threshold magnitudes were estimated for the entire catalog. Figure (3) shows the plotting of the cumulative number of earthquakes above different magnitude levels (4.0, 4.5, 5.0, 5.5, 6.0, and 6.5) against time for the current catalog. Completeness periods for different magnitude intervals have been tabulated in Table ( The obtained completeness periods in the current work (Table 3) appear to be in a good agreement with those values mentioned by Singh et al. 4 , Zúñiga et al. 7 and Salgado-Gálvez et al. 108 . Throughout the compilation of a catalog for shallow (h ≤ 65 km) earthquakes covering the spatial region of 15° to 20°N latitudes, and 94.5° to 105.5 °W longitudes, Singh et al. 4 stated that the catalog is mostly complete for earthquakes with Ms ≥ 6.5 from 1906 to 1981. Zúñiga et al. 7 , throughout their work about the seismotectonic regionalization of Mexico, compiled www.nature.com/scientificdata www.nature.com/scientificdata/ a catalog (until 2014) from the ISC bulletin, the Mexican SSN, Red Sísmica del Noroeste de México "RESNOM", PDE, and CMT catalogs in the form of Ms magnitudes. They noticed changes during the completeness analysis of their catalog on the years 1935, 1965, 1970, 1982 and 2003. According to their work, the catalog was considered to be complete for magnitudes Mw ≥ 6.5 and 7.0 since 1935 and 1900, respectively. On the other hand, in the course of a probabilistic seismic hazard analysis for Latin America and the Caribbean, Salgado-Gálvez et al. 109 assembled a catalog (from 1900 to 2015) which comes mainly from international sources 64,72,110 . They stated that, for Mexico and Central America, the catalog is complete for Mw 4.0, 4.5 and 5.5 since 1972, and for Mw 6.5 and 7.5 since 1934 and 1906, respectively 44,110 .
Some of the obtained completeness intervals directly coincide with the establishment, improvement or increase in the number of seismic stations in seismic networks locally and globally. For example, 1918 (the end of World War I), mid-1960s (the deployment and operation of the WWSSN), and mid-1990s (activation of the Comprehensive Nuclear-Test-Ban Treaty Organization). Locally, the data availability increases significantly after the large coverage of Mexican SSN on the year 1925.

Data Records
The final obtained declustered and unified earthquake catalog obtained in the current study was uploaded in the figshare repository under the title (Earthquake catalog (1787-2018) for Mexico 88 ): it is a Microsoft Excel ™ worksheet consisting three sheets; the first two sheets are for the codes and references of the earthquakes, while the third sheet consists of 5160 rows organized into 25 columns. Each row describes a single main earthquake event, while each column describes the related parameters for this earthquake. The names of the columns mentioned in third Microsoft Excel ™ sheet are the following: www.nature.com/scientificdata www.nature.com/scientificdata/

Technical Validation
Original reported magnitudes for all earthquakes in our catalog are included in the final database as a reference for those researchers who might prefer to use other empirical relationships to unify the catalog other than those applied in the current study.
Declustering approach that has been used in the current work has been included throughout the uploaded "FORTRAN CODES" on figshare 88 , in order to give the possibility to check them or to apply another declustering algorithm for the entire catalog.
All references used during the compilation of the earthquake catalog are included as "Codes" in the final dataset, specifically for each parameter for the largest earthquakes. This allows to check event by event from their original published references and bulletins.

Code availability
The input data in this work can be accessed at the following website pages: Global CMT catalog, available at http:// www.globalcmt.org/ (last accessed on April 2019); ISC bulletin, available at http://www.isc.ac.uk/iscbulletin/ (last accessed on April 2019); ISC-GEM catalog, available at http://www.isc.ac.uk/iscgem/ (last accessed on April 2019); and the USGS catalog, available at http://earthquake.usgs.gov/data/centennial/ (last accessed on April 2019). The SSN data was provided by the Mexican SSN authorities by direct request. The FORTRAN CODES 88 used for the declustering process as well as the final obtained earthquake 88 and focal mechanism 88 catalogs published in this study are available through the Supplementary Data Files on figshare.