GNGTS 2015 - Atti del 34° Convegno Nazionale

the cutting level was chosen on the basis of some criteria that involve the maximum slope detectable of the level bar chart and the width of the gap identifiable between two successive levels of the hierarchy detectable in the dendrogram (Gan et al. , 2007; Everitt, et al. , 2011). After defining the average HVSR curves a second multi-parametric clustering procedure has been used to group peaks attributable to the same origin (stratigraphic, tectonic, topographic, anthropogenic or other sources). A nonhierarchical centroid-based algorithm has been implemented (Capizzi et al. , 2014). This clustering is carried out in order to delineate the areas inside of which it is possible to assume a continuous trend of the parameters used to describe the subsoil and of the seismic response of the medium. Hypotheses on the cause of the HVSR peaks are basic to extract from such kind of data reliable information on the subsurface (Capizzi et al. , 2014; Di Stefano et al. , 2014; Martorana et al., 2014). In centroid-based methods, clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed, the clustering can be formally regarded as an optimization problem: find the cluster centers and assign each object to the cluster, such that the parameter distances from the cluster centroid are minimized, and calculate the new means to be the centroids of the observations in the new clusters. The algorithm converges to a (local) optimum when the assignments no longer change. In this procedure there is no guarantee that the global optimum is found using this algorithm. The number of clusters, the presence of outliers and the type parameters used for distance measures mostly affect the results of cluster analysis. Centroid-based algorithms generally require the number k of clusters and the initial centroid coordinates to be specified in advance. This aspect is considered one of the biggest drawbacks of these algorithms because an inappropriate choice of k may yield poor results. Really, is hard to choose the k parameter when missing external constraints. The proposed algorithm does not fix the number of clusters and choose automatically, for each possible value of k the initial centroids from data set. The distance of each unit from the initial centroids and those obtained after each iteration was calculated as the weighted sum of the Euclidean normalized distances of all the variables considered: coordinates ( x , y and z ), Fig. 1 – Dendrogram relative to HVSR curves determined in sliding time windows. 52 GNGTS 2015 S essione 3.2