GNGTS 2023 - Atti del 41° Convegno Nazionale
Session 1.1 - POSTER GNGTS 2023 errors that are expected to be present and admissible, as stated also by the authors. Therefore, before proceeding to operate with deep learning techniques, we visualized 1 ˙ 000 random selected traces, estimating that 7% of them were affected by different problems, such as wrong P-wave pickings, wrong polarity determination or no clear P-wave onset. We then performed two unsupervised learning methods, the PCA and the SOM, in order to perform a clustering procedure, assigning each trace to the nearest SOM node. This procedure allowed us to exclude all those traces falling in nodes that were evaluated as unsuitable for our purposes. The map was given a representation of the data in feature space. Features were extracted either by the PCA technique (to which normalized traces were provided, clipped in time windows starting from 15 samples before P-wave arrival, of different amplitudes) or manually, by evaluating averages of 21-samples-long moving time windows. We analyzed traces with upward polarity separately from those with downward polarity. The clustering procedure discriminated nodes containing unsuitable traces from suitable ones. We discarded the unsuitable nodes by removing all traces belonging to them. We will refer to the remaining 150 ˙ 320 traces as “Dataset_1”. We used a second dataset (Napolitano et al., 2021, Fig. 1b) to further test our network, evaluating the performance of the model using different data, not specifically designed for Machine Learning techniques, such as, instead, the INSTANCE catalog. The waveforms by Napolitano et al. (2021) have been manually picked. We discarded traces with uncertain polarity. To achieve a reliable dataset, we selected only waveforms sampled at 100 Hz and weight less than 2. The number of useful traces contained in this dataset, hereafter named “Dataset_2”, is 4 ˙ 072. We made sure that no waveforms were shared between the two datasets by removing the common traces from Dataset_1. Fig. 1 – Localization of earthquakes of the two datasets used. (a) The histogram referring to Dataset_1. Each bin have dimensions of degrees in latitude and longitude. The orange and the magenta boxes identify the portion of 6. 5·10 −2 the seismic traces used as validation and test set, respectively. (b) Map referred to Dataset_2, in which seismic events analyzed are shown with red circles. Dataset_2 is used only as a second test set.
Made with FlippingBook
RkJQdWJsaXNoZXIy MjQ4NzI=