GNGTS 2015 - Atti del 34° Convegno Nazionale

- shape, size (absolute and relative) and number of clusters; - the presence of outliers; - the level of overlap between clusters; - the type of measure of similarity / distance chosen. Various studies (Rand, 1971; Ohsumi, 1980) suggest that different grouping strategies often lead to results not dissimilar while others highlight specific cases of strong divergence (Everitt, 2011; Fabbris, 1983). However, the criteria for choosing between the two types of algorithm (Hierarchical Clustering and Non-Hierarchical Clustering) have not yet been sufficiently explored and literature are very different positions. Anyhow, the criteria suggested by the authors include objectivity, for which researchers working independently on the same set of data must arrive to same results and stability of operating results of the partition of data equivalent (Silvestri and Hill, 1964). In practice, you should choose the methods that are more insensitive to small changes in the data. For example, it is considered important if, subtracting an individual from the analysis, the partition little change (of course the elimination of outlier produces greater variations within groups), or if, by repeating the analysis without an entire branch of the dendrogram, the structure of the other branches remains unchanged or almost. Broadly, we can say that, if you seek groups of statistical units, characterized by high internal consistency, hierarchical techniques are less effective than not hierarchical ones. Cluster analysis of HVSR data. Many HVSR data sets were acquired for studies of seismic microzoning in various Sicilian urban centers. After many tests to assess the best clustering techniques for our dataset and purposes, we have chosen to apply an AHC (Agglomerative Hierarchical Clustering) algorithm to extract frequency and amplitude of HVSR curves determined in sliding time windows (D’Alessandro et al. 2014) and a HC algorithm to group peaks attributable to the same seismic surface. The choice is motivated by the fact that the HC are explorative methods, which do not need to define a priori the number of clusters, and allows to use any proximity measure considered suitable for the data. In HC the process of agglomeration or separation is done on the basis of a measure of proximity and of linkage criteria. The proximity between two objects is measured by measuring at what point they are similar (similarity) or dissimilar (dissimilarity). Several measure of proximity was proposed in literature to measure the similarity/dissimilarity between different types of object (Gan et al. , 2007; Everit, et al. , 2011). Clearly, the choice of the type of measure of proximity must respect specific criteria and would be done on the basis of the main aims of the clusterization (Gan et al. , 2007; Everit, et al. , 2011). Windows selection for best average HVSR curve estimation, are generally done by visual inspection of the HVSR curves as function of time. Starting from the full-length records, the HVSR curve are determine in consecutive time windows of appropriate lengths. Time windows that at a simple visual inspection showing HVSR curve considered “anomalous” are generally deleted and therefore not included in the calculation of the average HVSR curve. Often it is very difficult to identify the correct time window to be used for the calculation of the mean HVSR. The lack of a not arbitrary selection criteria making the result clearly operator dependent and therefore not optimal. To overcome the this problem we applied the AHC to our data, using as proximity measure the Standard Correlation ( SC xy ) defined as: , where x i and y i indicate the values of the spectral ratios relative to the i -th frequency and the generic pair of analysis windows. The main return of the hierarchical clustering is the dendrogram, which shows the progressive grouping of the data (Fig. 1). The selection of clusters is done cutting the dendrogram at specific level of similarity/dissimilarity. In this application GNGTS 2015 S essione 3.2 51