<< Click here to display Table of Contents >> Navigation: Transformations > K means (Cluster Analysis):

Defining a k-means cluster analysis

Contents

k-means cluster analysis is performed in FCS Express through the Transformation button within the Tools tab→ Transformations. The steps below outline the process of creating a new k-means cluster analysis on your data.

1. Open the Transformation window by selecting the Tools tab→Transformations →Transformations command (Figure below, Step 1).

2. Click on the drop down arrow adjacent to the blue plus button (Figure below, Step 2).

3. Click on Kmeans in the drop down list (Figure 29.27, Step 3).

(Figure 29.27) - A Kmeans transformation can be created via the Transformations navigator.

The Kmeans dialog will now appear in the Transformation window as seen in Figure 29.28.

Figure 29.28 - The Kmeans dialog opens when a new k-means transformation is created.

The new Kmeans transformation is named as New Kmeans by default and it can be easily renamed by either of the following methods:

•Right click on the Kmeans transformation that needs to be renamed→Select Rename.

•Select the Kmeans transformation that need to be renamed via left click and press F2 on your keyboard.

•Select the Kmeans transformation that need to be renamed via left click and click on the button in the Transformations tool bar.

4. Choose a Template File for the k-means clustering by clicking on the ellipsis to the right of the Template File edit box.

Once the file of interest has been selected, the dialog will be populated with a list of available parameters to perform a k-means clustering based on available parameters from the template file (Figure 29.29).

Figure 29.29 - A template file has been loaded into the New Kmeans dialog.

5.Choose the parameters to define the Kmeans clustering by checking the boxes next to the parameter names (Figure 29.30, Step 1).

6.Choose or change to the appropriate Parameter Scaling Options for each of the selected parameters if required (Figure below, Step 2).

Figure 29.30 - Choosing the parameters to perform k-means cluster analysis.

A number of options can be set at this point:

6. (Optional) Select a gate from the Gate drop down if you are interested in performing the clustering only for the events belonging to it.

7. Sampling Options (optional). Although multiple Sampling Methods are available (please refer to the Sampling Options chapter for more details), the default Downsampling Method for k-Means is None, thus all available events (i.e. the ones defined with the Gate drop-down menu in Step 6) will be used.

8. Set the Number of Clusters the events should be clustered by. It's equivalent to the number of centroids.

9. Set the number of Maximum Iterations. This value indicate the maximum number of iterations that FCS Express should perform before quitting the clustering process.

10. Set the Minimum Percent of Cells Per Cluster. This value indicates the minimum size, in term of percentage, of clusters.

10. Set Output Style for Kmeans clustering.

The available choices are Clustering Assignments and Clustering Statistics. Please see the detailed list below explaining each option.

•Clustering Assignments: By selecting this output style, the following statistics can be assigned to each event the cluster has been performed on:

oCluster Assignment: An internal label indicating the cluster membership is automatically assigned to each event. This label is internally used by FCS Express for heatmap representation of clusters but it is also accessible through a Data grid.

oDistance to Cluster Centroid: The distance of a given event to the centroid of its cluster. Values are plotted as unique histogram.

oDistance to centroid Histograms: The same as the Distance to Cluster Centroid but in this case values are grouped, and thus plotted, separately for each cluster. Clusters are numbered from 1 to Number of Clusters.

oGaussian CV SSE Scaled: In this representation each event is plotted based on its cluster membership, thus clusters appear as separated histograms. Given that cluster membership is an integer value that goes from 1 to Number of Clusters, an artificial noise is introduced to spread out the data. Gaussian SSE scaled means that the standard deviation of the Gaussian (normal distribution) used to represent each cluster is proportional to the Sum of Squared Error of the cluster.

oGaussian CV Unscaled: In this representation each event is plotted based on its cluster membership, thus clusters appear as separated histograms. Given that cluster membership is an integer value that goes from 1 to Number of Clusters, an artificial noise is introduced to spread out the data. Gaussian CV unscaled means that the standard deviation of the Gaussian (normal distribution) used to represent each cluster is the same among clusters.

oGaussian Y Data: Events are plotted following a Gaussian (normal distribution).

•Clustering Statistic: By selecting this output style, the following statistics will be accessible through plots and Data grid:

oKmeans Cluster ID: An integer value that goes from 0 to Number of Clusters-1.

oCentroid coordinates: The coordinates of each centroid are given for each of the parameters considered for the clustering.

oSum Of Square (WCSS): The Within-Cluster Sum of Squares is reported for each cluster.

The following table lists the objects suitable for displaying each output style:

		Can be displayed as
		Heatmap	1D or 2D plots	Data grid
Clustering Assignments	Cluster Assignment	Y		Y
	Distance to Cluster Centroid		Y	Y
	Distance to Centroid Histograms		Y	Y
	Gaussian CV SSe Scaled		Y	Y
	Gaussian CV Unscaled		Y	Y
	Gaussian Y Data		Y	Y
Clustering Statistics			Y	Y

11. The k-means clustering has now been defined and automatically calculated. k-means clustering can now be applied to plots for display and analysis. (see Applying a k-means Cluster Analysis).