Discriminating the occurrence of inundation in tsunami early warning with one-dimensional convolutional neural networks

Tsunamis are natural phenomena that, although occasional, can have large impacts on coastal environments and settlements, especially in terms of loss of life. An accurate, detailed and timely assessment of the hazard is essential as input for mitigation strategies both in the long term and during emergencies. This goal is compounded by the high computational cost of simulating an adequate number of scenarios to make robust assessments. To reduce this handicap, alternative methods could be used. Here, an enhanced method for estimating tsunami time series using a one-dimensional convolutional neural network model (1D CNN) is considered. While the use of deep learning for this problem is not new, most of existing research has focused on assessing the capability of a network to reproduce inundation metrics extrema. However, for the context of Tsunami Early Warning, it is equally relevant to assess whether the networks can accurately predict whether inundation would occur or not, and its time series if it does. Hence, a set of 6776 scenarios with magnitudes in the range \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_w$$\end{document}Mw 8.0–9.2 were used to design several 1D CNN models at two bays that have different hydrodynamic behavior, that would use as input inexpensive low-resolution numerical modeling of tsunami propagation to predict inundation time series at pinpoint locations. In addition, different configuration parameters were also analyzed to outline a methodology for model testing and design, that could be applied elsewhere. The results show that the network models are capable of reproducing inundation time series well, either for small or large flow depths, but also when no inundation was forecast, with minimal instances of false alarms or missed alarms. To further assess the performance, the model was tested with two past tsunamis and compared with actual inundation metrics. The results obtained are promising, and the proposed model could become a reliable alternative for the calculation of tsunami intensity measures in a faster than real time manner. This could complement existing early warning system, by means of an approximate and fast procedure that could allow simulating a larger number of scenarios within the always restricting time frame of tsunami emergencies.


Input data for the 1-CNN. Stochastic seismic sources
In this study, a database of 6776 scenarios with magnitudes in the range M w 8.0-9.2 has been generated. These scenarios are the initial condition used to model tsunami inundation using Tsunami-HySEA 1 with four-level nested grids with higher resolutions in the coastal cities of Valparaíso, Viña del Mar, La Serena and Coquimbo.
For the present implementation, the first step is to define a geometry along where synthetic (stochastic) seismic sources will be generated. Here, the geometry of megathrust seismogenic zone is within the mid-southern portion of the Zone 2 (Z2) in Poulos et al 2 , as shown with ZV in Figure 1 (main text). This segment has been discretized in 1418 subfaults of 10 km x 10 km, having enough resolution to resolve the slip distribution. For this task, the MudPy open-source code 3,4 has been used.
For the set of magnitudes M w 8.0-9.2, ruptures size require scaling relations that could transform magnitudes to areas 5 , more specifically length and width. The MudPy code generates a variable slip pattern distribution assuming a normal distribution of the slip on each subfault, with a vector containing the slip for each realization limited to positive values using an exponential of the normal distribution, this is, a log-normal distribution. The resulting slip distribution has a mean vector (µ) and a covariance matrix that are a fraction of the mean slip and a rupture correlation function. As shown by Leveque et al. 6 , these parameters control the spatial statistics of slip variability. This generation of the stochastic seismic scenarios is similar to the approach described in Zamora et al. 7 , assuming a maximum slip truncated at ∼ 50 m, which corresponds to a larger value of the expected geodetic slip deficit of the central Chile Region 8 (ZV geometry). This region is of interest because it corresponds to an area where large asperities have been identified 9 , and corresponds with one of the suggested rupture areas of the 1730 M w 9.1-9.3 10 . This area has been also used in a probabilistic tsunami hazard assessment in central Chile 11 .
Tsunami numerical simulations are done using the Tsunami-HySEA code 1 . Tsunami-HySEA solves the 2D nonlinear one-layer shallow water system in both spherical and Cartesian coordinates, based on a finite volume method. Here, the spherical-coordinates version is used. The Okada analytical equations 12 are implemented to obtain the ground deformations, and therefore, sea level initial condition for the numerical simulations. More information can be found at https://edanya.uma.es/hysea.
The following figures show the summary of statistics related to the generation of the input data which are finite fault seismic sources. These data are used for the testing and training of the 1-CNN. Figure S1 shows the location of the centroids of 6776 ruptures in the range of magnitudes M w 8.0-9.2 (small cyan triangles). The location of the centroids are generated during the first step of the Mudpy 3 procedure. The location of the centroids of the two historical events are shown as large cyan triangles, taken from the Global Centroid Moment Tensor (GCMT) 13 . Figure S1. Centroid location of synthetic data and real cases used for testing the algorithms. The slab contours (in grey) are taken from Hayes et al. 14 .
As an estimate of the variability among scenarios, the slip distribution for the 6776 seismic scenarios are analyzed in unison with different statistical metrics, as shown in Figure S2. Figure S2a) shows some areas of large slip up to the maximum value, but without any significant structure suggesting low correlation. The 95% percentile (Fig. S2b) shows a significant decrease in its value, that is nearly a half of the peak value. It shows a tendency to concentrate larger slip towards the shallower section of the interface, which is considered a conservative situation as shallower events should lead to larger tsunamis 15 . The median slip is well distributed, with a small variation in magnitude (Fig.  S2c,d). These results suggest a good areal coverage and distribution of the slip among the 6776 sources, thereby providing a reasonable range of tsunami conditions.
To further explore the database, Figures S3 and S4 show the distribution of all events in terms of their length and width, as a function of moment magnitude. Length and width were estimated from each scenario as the distance between the extreme non-zero slip locations in the along strike and along dip directions, respectively. These distributions compare well with those presented by De Risi and Goda 16 , for instance. It can be noted that the width saturates at large magnitude, which is a result of the finite along dip extent of the rupture zone. This saturation induces either larger slip and/or longer ruptures 17 . Figure S5 shows the finite fault models of the six historical source solutions used for testing. These are three    20 . Most of the models can be found at the SRCMOD database 19 .

6/9 2 Complementary Figures
In the main text, a sample figure is presented to show the distribution of scenarios among the training, validation and testing data sets. Here, Figure S6 shows the same plots for the remainder cases are presented for completeness. It can be noted that in all cases, the distributions are similar along the cases. The most noticeable differences occur when flow depths are very small, which does not have an impact on the hazard categorization. On the other hand, the training data sets (red lines) typically reach larger extreme values for flow depth, which reduces the possibility of extrapolation. Finally, in Fig. S7, the histograms of the performance metrics on the testing data set are shown for CoB and ViB. It can be seen that most of the MSE and G are very small, indicating a good accuracy in reproducing the time series.