The pipeline steps outlined in the table below are available in FCS Express. Please click on a step link or category to jump to a description and information on the variables accessible in the step.

 

 

Pipeline Step Table of Contents:

 

Category

Step

Pre-Defined Algorithms

 

FlowAI

 

FlowSOM

 

SPADE

 

Downsampling

 

Interval Downsampling

 

Mask Downsampling

 

Random Downsampling

 

Target Density Downsampling

 

Weighted Density Downsampling

 

Dimensionality Reduction

 

PCA

 

tSNE

 

UMAP

 

Clustering

 

Consensus Clustering

 

Hierarchical Clustering

 

Kmeans

 

Self-Organizing Map

 

Visualization

 

Graph layout

 

Minimum Spanning Tree

 

Mathematical

 

0 to 1 Scaling

 

Normalization

 

Scaling

 

Simple Parameter Math

 

Thresholding

 

Quality Controls

 

Dynamic Range Downsampling

 

Flow Rate Check Downsampling

 

Signal Acquisition Downsampling

 

Miscellaneous

 

Folder

 

Merge to Spectra

 

Parameter Removal

 

Unmixing

 

Virtual bandpass

 

 

 

 

Pipeline step full descriptions:

 

 Pre-defined Algorithms - For a full description please see the chapter on Pre-defined Algorithms

 

 

 

Downsampling

 

 

Step

 

 

Description

 

 

Interval Downsampling

 

Performs Interval Sampling on the population selected in the main pipeline body.

Please refer to the Sampling Options chapter of the manual for more details on interval sampling.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value in that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

 

pipeline_interval

 

 

Mask Downsampling

 

Mask downsampling allows users to remove events with a value of zero. This may be useful if any of the downstream steps can be negatively affected by zero values. This step performs downsampling on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

The Mask Merging Style section of the dialog allows the user to choose between the following options:

Cell must be non-zero in ALL selected parameter. Any values that are a zero for the selected parameters will be excluded and only non-zero values will be kept. If a cell has at least one zero value among the selected parameters, then that cell is removed.

Cell must be non-zero in ANY of the selected parameter. If a cell has a value of zero in all the selected parameters, than that cell is removed. If a cell has at least one value which is not zero, than the cell is kept.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter, while unsampled events will be assigned with a "0" value in that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

pipeline_maskdownsampling

 

 

Random Downsampling

 

Performs Random Downsampling on the population selected in the main pipeline body.

An internal seed is set so that when reopening the layout, the same sampled events are selected.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value in that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

pipeline_randomsampling

 

 

Target Density Downsampling

 

Performs Density-Dependent Downsampling on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Please refer to the Sampling Options chapter of the manual for more details on Target Density Downsampling.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value for that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

pipeline_target

 

 

Weighted Density Downsampling

 

Performs Weighted Density Downsampling on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Please refer to the Sampling Options chapter of the manual for more details on Weighted Density Downsampling.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value for that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

pipeline_weighted

 

 

 

 

 

Dimensionality Reduction

 

 

Step

 

 

Description

 

 

PCA

 

Calculates Principal Components of the population selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

The Number of Principal Components to display can be set in the Number of Principal Components to keep field.

 

pipeline_PCA

 

tSNE

 

Performs dimensionality reduction using the tSNE algorithm on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list (see image below). Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

For additional detail on the tSNE algorithm please refer to the tSNE (viSNE) chapter of the manual.

 

pipeline_tSNE

 

The following settings may be customized:

 

tSNE Method to Use

Amount of Approximation (Applies Only to Barnes-Hut)

Perplexity

Number of Iterations

Use Opt-SNE

Generate New Random Seed

 

Please refer to the Defining a tSNE transformation chapter for additional details on tSNE settings.

 

 

UMAP

Performs dimensionality reduction using the UMAP (Uniform Manifold Approximation and Projection) algorithm on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list (see image below).Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.  

 

UMAP is a dimensionality reduction technique that allows users to create new UMAP X and UMAP Y parameters from a high-dimensional dataset. The two main steps of UMAP are:

1.Creation of an high-dimensional graph. A weighted graph in which a point is linked to its nearest neighbors. The amplitude of the neighborhood is a crucial parameter and is dictated by the Number of neighbors parameter (see below).

2.Creation of a low-dimensional (2 dimensional) graph as similar as possible to the high-dimensional graph resulting in UMAP X vs UMAP Y parameters.

 

The following settings may be customized by the user:

 

Number of neighbors. Sets the number of approximate nearest neighbors used to create the initial high-dimension graph. Number of neighbors is a crucial parameter as low values will instruct the UMAP algorithm to focus more on local structure by constraining the number of neighboring points considered when analyzing the data in high dimensions, while high values will instruct the UMAP algorithm towards representing the overall structure while sacrificing fine detail.

Min Low Dim Distance. Sets the Minimum Distance between points in the low-dimensional map (i.e. in the UMAP map). By setting low values, points will be clustered closer together.

Number of Iterations. The number of cycles that the UMAP algorithm performs to refine the results.

Generate New Random Seed. The UMAP algorithm is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and the setting are used, by retaining the same Random Seed value, the same result will be achieved. UMAP may be run multiple times with different Random Seeds to evaluate the stability and the consistency of the population separations.

 

pipeline_UMAP

 

For more details on UMAP algorithm, please refer to the following resources:

 

Leland McInnes et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2018 arXiv:1802.03426

Becht E. et al. Dimensionality reduction for visualizing single-cell data using UMAP Nat Biotechnol. 2018;10.1038

Andy Coenen, Adam Pearce. Understanding UMAP. https://pair-code.github.io/understanding-umap/

 

 

 

 

 

Clustering

 

 

Step

 

 

Description

 

Consensus Clustering

 

Performs Consensus Clustering, on the population selected in the main pipeline body, using the parameters selected in the Parameter Options list (see image below). Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Consensus clustering is a technique to find a single clustering result (i.e. a consensus) when many clustering runs are performed.

Consensus clustering in FCS Express is performed with the Monti method (Monti et al. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning. 52. 91-118. 10.1023/A:1023949509487). Briefly, subsets of points are sampled from the input dataset and clustering is performed on each subset. A pairwise consensus matrix is then created to calculate the frequency with which every pair of points are found in the same cluster. The consensus matrix (M) can then be converted to be a distance/dissimilarity matrix (1-M) and further used to perform a final clustering step, which finally give the consensus clustering.

 

This clustering approach is quite computationally heavy, the highest number of objects that can be clustered is set to 8000. So, the Consensus Clustering step is suitable to cluster clusters generated by other clustering steps (which number is usually lower than 8000). This procedure is also called Meta-clustering.

 

The output of the Consensus Clustering step is a list of Cluster Assignment, which is an internal label indicating the cluster membership that is automatically assigned to each event. The cluster assignment can be displayed by the user on 1D plots, 2D plots, Plate Heatmap plots, and Data Grid.

 

The Cluster Assignment can be used as Input parameter for the Minimum Spanning Tree pipeline step.

 

When the input parameter is a cluster assignment (common occurrence with this type of clustering; see comment above), the current clustering steps performs a meta-clustering. In this case, the following options are particularly useful:

Add Meta-Clustering as New Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be added as new parameter and thus be displayed on 1D plots, 2D plots, Plate Heatmap plots, and Data Grids.

Add Meta-Clustering to Selected Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be visualized as a colored border around nodes, when the input parameter is displayed on a Heatmap plot.

 

The following settings may also be customized by the user:

Number of Clusters. This defines the number of clusters which should be found.

Sampling Percentage. This defines the fraction of events sampled at every repetition.

Number of Samplings. This defines the number of repetition.

New Parameter Name. The name of the new parameter (i.e. the one containing the cluster assignment for each input data) can be specified in this field.

Clustering Algorithm. This option allows a user to define whether the consensus clustering should be performed using k-Means or Hierarchical clustering.

Generate New Random Seed. Setting a Seed allows to ensure reproducibility with algorithms that rely on random initialization. Changing the Seed is a good way to evaluate the impact of randomness on the clustering result.

 

pipeline_consensusclustering

 

Once the Consensus clustering is displayed on the Heat Map, the following actions can be performed on the Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create Well gates to select one or more clusters (i.e. wells) and use those gates/clusters for downstream analysis.

 

 

Hierarchical Clustering

 

Performs Single-linkage agglomerative clustering, on the population selected in the main pipeline body, using the parameter(s) selected in the Parameter Options list (see image below). Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

This clustering approach is quite computationally heavy, the highest number of objects that can be clustered is set to 8000. So, the Hierarchical Clustering step is suitable to further cluster clusters generated by other clustering steps (which number is usually lower than 8000). This procedure is also called Meta-clustering.

 

The output of the Hierarchical Clustering step is a list of Cluster Assignment, which is an internal label indicating the cluster membership that is automatically assigned to each event. The cluster assignment can be displayed by the user on 1D plots, 2D plots, Plate Heatmap plots, and Data Grid.

 

The Cluster Assignment can be used as Input parameter for the Minimum Spanning Tree pipeline step.

 

When the input parameter is a cluster assignment (common occurrence with this type of clustering), the current clustering steps perform a meta-clustering. In this case, the following options are particularly useful:

Add Meta-Clustering as New Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be added as new parameter and thus be displayed on 1D plots, 2D plots, Plate Heatmap plots, and Data Grids.

Add Meta-Clustering to Selected Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be visualized as a colored border around nodes, when the input parameter is displayed on a Heatmap plot.

 

The following settings may also be customized by the user:

 

Number of Clusters. Defines the number of clusters to attempt to find.

Linkage Method. Possible options for linkage are:

oSingle. The distance between two clusters is defined by the smallest distance between a point in the first cluster and a point in the second clusters. Also referred to as the "nearest-neighbor" method.

oComplete. The distance between two clusters is defined by the larger distance between a point in the first cluster and a point in the second clusters. Also referred to as the "furthest-neighbor" method.

oAverage. The distance between two clusters is defined by the average distance between the points in the first cluster and the points in the second cluster.

oWard. The Ward method (also called minimum-variance method) minimizes the within-cluster variance. Two clusters are merged if the variance of the resulting cluster is the smallest variance possible as compared to the variance resulting from merging any other pair of clusters. The Ward method aims at finding compact, spherical clusters.

 

 

pipeline_hierarchical

 

Once the Hierarchical clustering is displayed on the Heat Map, the following actions can be performed on the Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create Well gates to select one or more clusters (i.e. wells) and use those gates/clusters for downstream analysis.

 

 

Kmeans

 

Performs Kmeans clustering, on the population selected in the main pipeline body, using the parameter(s) selected in the Parameter Options list (see image below). Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

The following settings may also be customized by the user (please refer to the Defining a k-Means cluster analysis chapter for more details):

Number of Clusters

Min percent Cells Per Cluster

Maximum Iterations

Number of Attempts

Keep-Non-Converged Data

 

The output of the Kmeans step is a list of Cluster Assignment, which is an internal label indicating the cluster membership that is automatically assigned to each event. The cluster assignment can be displayed by the user on 1D plots, 2D plots, Plate Heatmap plots, and Data Grid.

The Cluster Assignment can be used as Input parameter for the Minimum Spanning Tree pipeline step.

 

When the input parameter is a cluster assignment, the current clustering steps perform a meta-clustering. In this case, the following options are particularly useful:

Add Meta-Clustering as New Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be added as new parameter and thus be displayed on 1D plots, 2D plots, Plate Heatmap plots and Data Grid.

Add Meta-Clustering to Selected Parameter. By checking this check box, the meta-cluster assignment generated by this clustering step can be visualized as a colored border around nodes, when the input parameter is displayed on an Heatmap plot.

 

Finally, the Kmeans algorithm is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and the setting are used, by retaining the same Random Seed value, the same result will be achieved. Kmeans may be run multiple times with different Random Seeds to evaluate the stability and the consistency of the result.

 

pipeline_Kmeans

 

Once the k-Means clustering is displayed on the Heat Map, the following actions can be performed on the Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create Well gates to select one or more clusters (i.e. wells) and use those gates/clusters for downstream analysis.

 

 

Self-Organizing Map

 

The output of the SOM (Self-Organizing Map) step is a list of Cluster Assignment, which is an internal label indicating the SOM node membership that is automatically assigned to each event. The cluster assignment can be displayed by the user on 1D plots, 2D plots, Plate Heatmap plots, and Data Grid.

 

The high-dimensional locations of the SOM nodes can be used as Input parameter for the Minimum Spanning Tree pipeline step.

 

The Input parameters for the SOM step can be selected individually by clicking on the boxes or by using the right click menu. The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Once the SOM clustering is displayed on the Plate Heat Map, the following actions can be performed on the Plate Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create Well gates to select one or more clusters (i.e. wells) and use those gates/clusters for downstream analysis.

 

 

Some background on Self-Organizing Map (i.e. SOM) can be found below, together with a description of all the settings available in FCS Express to customize the SOM calculations.

 

Self-Organizing Map (i.e. SOM) is a special class of artificial neural networks first introduced by Teuvo Kohonen and is used extensively as a clustering, dimensionality-reduction and visualization tool in exploratory data analysis.

The input and the output data are defined in the same high-dimensional space but the number of output data k (i.e. "nodes" or "neurons") is smaller than the number of input data n. For the sake of simplicity, output nodes can be thought of as centroids in the k-Means clustering and much like k-Means, output nodes for SOM are generally randomly initialized at the initialization of the SOM process.

 

One of the key points of SOM is that, output nodes also exist in a bi-dimensional space to form a grid in which each node is connected to its neighborhood nodes. The size of the grid is defined by the user (see the Grid Width and the Grid Height options in the image below; the default setting is a 10x10 grid).

 

The basic steps of Kohonen’s SOM algorithm can be summarized by the following iterative procedure:

1.Initialization. The location in the high-dimensional space for each of the k output nodes is initialized by picking k random points in high dimensional space. They can be k random locations of the space, generally chosen in a small ‘ball’, or k random points from the input dataset (see the Cluster Centroid Initialization Method in the table below).

2.Sampling. Randomly select a point from the input dataset to present to the SOM nodes.

3.Similarity Matching. Compute the Euclidean distances between the selected point and each SOM node to find the nearest node (i.e. the winner; also called Best Matching Unit, or BMU).

4.Weight Updating. The location of all SOM nodes in the high dimensional space is updated so that nodes move towards the point sampled in Step 2. Each node is updated by a fraction of the current sample point. Said fraction is calculated for each node by a Learning function, which is a function of both the 2D distance to all other nodes and a learning rate(see below). The Learning function is a decreasing function which decreases at every iteration t (see Step 5).

5.Repeat Step 2, 3 and 4 until training is complete .

 

The Learning function in Step 4 above defines the amplitude of the movement for each node of the 2D grid towards the sample point (i.e. in other words defines how the 2D grid "learns" the shape of the multidimensional dataset). A generalized formula for this function is the following:

 

Learning = DistHD * α(t) *  h(t)

 

Where DistHD is the distance, calculated in the high-dimensional space, between the sample point and the node, α(t) is the Learning Rate function and h(t) is the Neighborhood function. Both the Learning Rate and the Neighborhood Functions exist in the range 0 to 1 and are decreasing functions (i.e. they both decrease at every iteration t). Since the Neighborhood function is always equal to 1 for the winner node (see below), the Learning Rate function can be viewed as the as the fraction of the distance covered by the winner node towards the sampled point. For every other nodes, the distance covered towards the sampled points will be equal to, or shorter than that (because of the further penalty caused by h(t), which is based on distance of the nodes to the winner node).

 

 

Parameter Options

SOM will be run on the population selected in the main pipeline body, using the parameter(s) selected in the Parameter Options list (see image below). Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

 

Transformation Options

 

Add Meta-Clustering...

The option allows users to control how the clustering information is accessed when the clustering is performed as Meta-Clustering (i.e. when clusters, and not single cell events, are clustered). Two (non-mutually exclusive) options are available:

 

Add Meta-Clustering as New Parameter. By checking this check box, the meta-cluster assignment generated by the clustering step will be added as new parameters and thus be displayed on 1D plots, 2D plots, Plate Heat Maps and Data Grids.

 

Add Meta-Clustering to Selected Parameter. By checking this check box, the meta-cluster assignment generated by the clustering step can be visualized as a colored border around nodes, when the input parameter is displayed on an Plate Heat Map plot.

 

Meta-clusters may also be automatically gated using Heat Map Well Gates.

Training Set Size

Defines how many events from the dataset will be sampled and thus how many iterations will be run. Possible choices are:

When Absolute is selected, the defined number of sampling will be performed (default is 1000). An number higher than the total number of input events can also be set.

When Percent (default) is selected, the number of samplings is defined as a percentage of the total number of input events (default is 100 when SOM is inserted into a pipeline as single step, while it’s 1000 when SOM is inserted as part of the FlowSOM Pre-Defined Algorithm step). Values higher than 100 are allowed. E.g. 1000 means that the number of sampling will be 10 times higher than the number of total events in the input dataset.

SOM 2D Grid

Defines the size of the 2D grid (see description above).

Note: In FCS Express the 2D grid is always rectangular, which means that every node has at most 4 direct neighbors.

Learning Rate

This user-defined parameter (also called α; alpha) is a main parameter controlling the learning of SOM since it scales the contribution of the new information. Practically, it can be viewed as the fraction of the distance covered by the winner node towards the sampled point.

Learning rate ranges between 0 and 1, with both the Initial and the Final value being defined by the user.

The Learning rate decreases with time (i.e. at every iteration t) following on a Decay Function (see below).

Neighborhood Spread

This user-defined parameter (also called σ; sigma) defines the amplitude of the neighborhood around the winner node in the 2D grid. The Neighborhood Spread is one of the variables of the Neighborhood Function (see below).

Both the initial and the final value can be defined by the user via the Neighborhood Spread fields (see image below) or automatically calculated by checking the Automatic Neighborhood Spread check box. The automatic calculation uses the 67th percentile as the initial value of the Cumulative Distribution Function for the entire set of fixed distances on the 2D map and 0 as Final value. Distances can be calculated using either the Euclidean or the Chebyshev method (see 2D Grid Distance Metric below).

As with the Learning Rate, the neighborhood spread also decreases with time (i.e. at every iteration t) following on a Decay Function (see below).

New Parameter Name

The name of the new parameter (i.e. the one containing the cluster assignment for each input data) can be specified in the field.

Cluster centroid initialization Method

Defines how the centroid initialization occurs.

When Random is selected, centroids will be initialized with random values between 0.1 and 1. The idea of this approach is to let mapping evolve with the data.  

When Random Cells is selected, centroids will be initialized with random points (i.e. cells) from the input dataset.

Training Decay Function

Both the Learning Rate α and the Neighborhood Spread σ values decrease with time following this function.

When Asymptotic is selected, both α and σ decrease proportionally to 1/t, where t is the current iteration.

When Linear is selected, both α and σ decrease proportionally to t/T, where t is the current iteration and T is the total number of Iterations that has to be performed.

2D Grid Neighborhood Function

Allows to vary the learning for the different nodes of the 2D grid based on their distance to the winner node. Ideally, the farther a node is from the winner, the less it will learn. The farther a node is from the winner, the lower will be the magnitude of the movement of that node toward the sampled point.

The 2D Grid Neighborhood Function ( h(t) ) can be either a Gaussian or Boxcar (also known as Bubble) function.

 

When Gaussian is selected the following formula will be used:

 

h(t) = exp (- Dist2D2 / 2σ(t)2 )

 

Where Dist2D is the distance on the 2D grid between the node and the winner node. σ(t) is the Neighborhood Spread, which starts at the value defined in the Initial Neighborhood Spread field (see above) and decreases following a Decay Function (see below) until reaching the value defined in the Final Neighborhood Spread field.

By carefully looking at the above formula, the user can see that, since Dist2D is 0 for the winner node, the value for h(t) for the winner node is always 1.

 

When Boxcar is selected the following formula will be used:

 

h(t)=1 when Dist2D ≤σ(t)

h(t)=0 when Dist2D >σ(t)

 

Where Dist2D is still the distance on the 2D grid between the node and the winner node and σ(t) is the Neighborhood Spread. When using Boxcar every node within the neighborhood will move as the winner node does. Every node outside the neighborhood will not move.

2D Grid Distance Metric

Defines the metric used to calculate distances on the 2D grid.

Provided that two horizontally-aligned adjacent nodes, or two vertically-aligned adjacent nodes, are 1 unit apart, the distance between a node and every other node can be calculated as either the Euclidean or the Chebyshev distance. With Euclidean distance every node has at most 4 direct neighbors, while with Chebyshev distance every node (with rectangular 2D grid) has at most 8 direct neighbors.

Generate New Random Seed

The Random Seed is a number which is used by the random number generator to generate the N random values. The random number generator will use the Random Seed value and perform a series of mathematical operations on it which result in N random values. 

Since SOM relies on a random initialization step to initialize the nodes into the multi-dimensional space, a seed can be set to make results reproducible over the time when SOM is applied to the same data. The seed can be changed by either clicking on the Change Random Seed button or by manually inserting a seed in the field.

In FCS Express, the default Random Seed is 6.

 

 

 

pipeline_SOM

 

The output of the SOM step is a list of Cluster Assignment, which is an internal label indicating the SOM node membership that is automatically assigned to each event. The cluster assignment can be displayed by the user on 1D plots, 2D plots, Plate Heat Map plots, and Data Grid.

 

The high-dimensional locations of the nodes can be used as Input parameter for the Minimum Spanning Tree pipeline step.

 

The name of the new parameter created by the SOM algorithm can also be customized by the user via the "New Parameter Name" field.

 

The SOM algorithm is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and the settings are used, by retaining the same Random Seed value, the same result will be achieved. SOM may be run multiple times with different Random Seeds to evaluate the stability and the consistency of the result.

 

Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Once the SOM clustering is displayed on the Plate Heat Map, the following actions can be performed on the Plate Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create Well gates to select one or more clusters (i.e. wells) and use those gates/clusters for downstream analysis. Meta-clusters may also be automatically gated using Heat Map Well Gates

 

 

 

 

 

 

 

 

Visualization

 

 

Step

 

 

Description

 

 

Graph Layout

 

Graph Layout allows you to change the visualization of your Minimum Spanning Tree (MST) on a Heat Map plot using a MST output as input parameter.

 

pipeline_graphlayout

 

Two different Graph Layout Methods can be selected: Arch (also called "U-shape") or Radio.

 

graph_layout_methods

 

Once the Graph is displayed on the Heat Map, the following actions can be performed on the Heat Map:

Modify the well size

Have well size dependent on a statistic such as number of events in the node

Set the Parameter and Statistic for display

Change the Color Level and Color Scheme

Create gate(s) to identify node(s) of interest, and edit other features

 

 

Minimum Spanning Tree

 

Creates a Minimum Spanning Tree (MST) using a Cluster Assignment as the input parameter. The Cluster Assignment parameter can be generated with any of the clustering steps available.

The MST step generates a new parameter that can be visualize on an Heat Map plot.

For more details on MST, please refer to this page on Wikipedia.

 

pipeline_MST

 

 

 

 

 

 

Mathematical

 

 

Step

 

 

Description

 

 

0 to 1 Scaling

 

The 0 to 1 Scaling step allows users to scale the minimum of the selected parameter to 0 and the maximum to 1.

Scaling is performed for each input parameter on the population (gate) selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

Specifically, for each of the selected input parameters, the minimum is subtracted from all the input events and the resulting values are then divided by the maximum of that parameters.

 

For each of the input parameters, this step generates a new parameter containing the scaled values. The new parameter is labeled with the 0 to 1 Scaled suffix.

 

pipeline_Scaling0to1

 

 

 

 

Normalization

 

The Normalization step allows users to normalize the selected input parameters by dividing single-cell (event by event) values by:

Standard Deviation

Root Mean Square

None

 

In addition, the Mean can also be subtracted.

 

The Standard Deviation, Root Mean Square, and Mean are calculated independently for the population (gate) selected in the main pipeline body, using the parameters selected in the Parameter Options list. Parameters can be selected individually by clicking on the boxes or by using the right click menu. The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

For each of the input parameters, this step generates a new parameter containing the normalized values. The new parameter is labeled with the Normalized suffix.

 

pipeline_normalization

 

 

Scaling

 

The Scaling step allows users to scale the selected input parameters  in the Parameter Options list with any of the scales available in FCS Express. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

This is useful when the downstream steps of the pipeline benefit from input parameters to be scaled with a specific scale.

 

If the Automatic check box is selected, the Instrument Specific Settings defined in the User Options will be used.

If the Automatic check box is unchecked, a scale from the available scales in FCS Express can be selected. The scale will be applied to all the selected parameters.

 

For each of the input parameters, this step generates a new parameter containing the scaled values. The new parameter is labeled with the scaled suffix.

 

pipeline_scaling_Fig1 pipeline_scaling_Fig2

 

 

Simple Parameter Math

 

The Simple Parameter Math step is a simplified version of the Parameter Math transformation and is intended to perform operations on parameter derived values (and/or on constants values) at the single-cell (event by event) level.

 

pipeline_parametermath

 

The two Operand drop-down menus allow users to select any of the input parameter defined in the main pipeline step. Alternatively, the Operands may also be constant values by selecting the constant value radio buttons.

 

Allowed operations are sum, subtraction, multiplication and division(+, -, * and / respectively).

 

The pipeline step generates a new parameter for which the name can be specified in the New parameter name field with all the single-cell result of the defined calculation.

 

*Note: This Simple Parameter Math pipeline step can not use customized equations.  Customized equations can be used in Parameter Math transformations.  Please see Defining Parameter Math for step by step instructions on creating a custom formula.

 

 

Thresholding

 

The Thresholding step allows users to replace values above a defined maximum threshold with a custom value and/or replace values below a defined minimum threshold with a custom value. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

The Max/Min thresholds values are defined with the Maximum Allowed Value and the Minimum Allowed Value fields. Values above the former and below the latter will be replaced with the values defined in the Replacement Higher/lower Values with respectively.

 

* Note: when defining a threshold, please consider that input values may have been scaled either in the main pipeline body and/or in a previous Scaling step. This means that a raw values of 1,000,001 scaled with Log scale, will require a threshold of 6 (i.e. 10^6 = 1,000,000) to be entered.

 

pipeline_thresholding

 

 

 

 

 

 

 

 

Quality Controls

 

 

Step

 

 

Description

 

 

Dynamic Range Downsampling

 

Dynamic Range Downsampling allows users to remove events that are equal to, or higher than, the defined upper limit, and/or to remove events that are equal to, or lower than, the defined lower limit.

 

Downsampling is performed parameter-by-parameter among the selected input parameters, on the population selected in the main pipeline body. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Note: The Dynamic Range Downsampling method is used by the FlowAI pre-defined step to perform the "Dynamic Range check". When the step is run independently from the FlowAI algorithm, the Dynamic Range Downsampling step should be run on Linearly-scaled non-compensated data (please see the FlowAI pre-defined step section for further details on how to use Linearly-scaled non-compensated data as input data).

 

The following options are available:

Use PnR for Upper Limit: When checked, the range (i.e. the $PnR keyword) of the selected parameter(s) will be used as the upper limit. When unchecked, an Upper Limit field becomes active and allows a user defined upper range.

Use Z-Score for Lower Limit: When checked, the Lower Limit will be set to be the Z-Score method published in the original FlowAI paper (please see the FlowAI pre-defined step for more details and the bibliographic reference). When unchecked, a Lower Limit field becomes active and allows a user defined the lower range.

 

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value in that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

 

pipeline_dynamicrange

 

 

Flow rate Check Downsampling

 

With the Flow Rate Check Downsampling step the steadiness of the flow (i.e. the number of events acquired per unit of time) can be checked. The flow rate is reconstructed using the Time parameter and the $TIMESTEP keyword from the data file (which is contained in all standard FCS file with version 3.0 or higher). If the $TIMESTEP keyword is not available, a default timestep of 1/10 second will be used to recreate the flow rate. An anomaly detection algorithm (built upon the generalized Extreme Studentized Deviate (ESD) test) detects and removes the data acquired during flow rate surges and shifts from the median value.

 

The algorithm is run parameter-by-parameter on the population selected in the main pipeline body.

 

The Flow Rate Check Downsampling algorithms integrated in FCS Express is based on the Flow Rate Check algorithm reported in the original FlowAI paper (G. Monaco et al, flowAI: automatic and interactive anomaly discerning tools for flow cytometry data, Bioinformatics, Volume 32, Issue 16, 2016) and is indeed used by the FlowAI pre-defined step to perform the Flow Rate Check.

 

The algorithms allows two inputs from the user:

Alpha: The the level of statistical significance used to accept anomalies detected by the ESD method. The default value is 0.01. The highest accepted value is 0.1.

Max % of values that may be flagged as outliers: The highest percentage of events allowed to be flagged as outliers. The highest accepted value is 49.99%.

 

A new parameter, called by default "Downsampling Mask", can also be created by selecting the Create mask parameter radio button in the Downsampling Action section of the dialog.

If a mask parameter is created, sampled events will be assigned with a "1" value in that parameter while unsampled events will be assigned with a "0" value in that parameter. When no mask parameter is created, only the downsampled events will be available as result of the transformation.

 

The name of the downsampling mask parameter can be customized in the Downsampling mask name field.

 

pipeline_flowrate

 

Signal Acquisition Downsampling

 

Provided that the signal intensity of a given population (i.e. median and variance of intensity) should be constant during acquisition in the vast majority of flow cytometry experiments (kinetics experiments are an exception to the rule), changes in the median and/or in the variance of the signal are usually indicators of issues with fluidics or other problems when acquiring data. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

Note: The Signal Acquisition Downsampling method is used by the FlowAI pre-defined step to perform the "Signal Acquisition check". When the step is run independently from the FlowAI algorithm, the Signal Acquisition Downsampling step should be run on Linearly-scaled on-compensated data (please see the FlowAI pre-defined step section for further details on how to use Linearly-scaled non-compensated data as input data).

 

When using Signal Acquisition Downsampling, the median and the variance of the signal of equally-sized bins of events acquired sequentially are calculated parameter-by-parameter on the population selected in the main pipeline body. The number of sequentially events in each bin is defined by the variables entered by the user (see below).

 

The Signal Acquisition Downsampling algorithm detects shifts in the median and/or in the variance of the fluorescence intensity between bins in each of the input parameters by using the Binary Segmentation algorithm of the changepoint R package.

 

The algorithm allows users to customize the following options:

Penalty Value: The value of the penalty for the changepoint detection algorithm. The higher the penalty value the less strict is the detection of the anomalies. The default is 500.

Number of Events in Each Bin: The value defines the size of each bin, i.e. the number of sequential events that each bin should contain.

Max Number of Change Points: The maximum number of changepoints (i.e. changes in the median and or in the variance intensify) that can be detected for each channel. The default is 3.

Min Number of Contiguous Bins: The minimum segment length (i.e. the number of bins between changes). The default is the minimum allowed by theory, i.e. 2.

 

pipeline_signalacquisition

 

 

 

 

 

Miscellaneous

 

 

Step

 

 

Description

 

 

Folder

 

The Folder element allows users to add a folder within the pipeline to better organize and group steps in the pipeline.

The Pipeline Folder has a check-box that allows users to easily activate/deactivate all the steps contained in the Pipeline Folder.

 

To rename a Pipeline Folder, select the folder by left clicking on it and the do either of the following:

Click on the renamecomp button in the Transformation dialog;

Press F2 on the keyboard;

Right click on the folder and select Rename.

 

Pipeline_folder

 

 

Merge to Spectra

 

The Merge to Spectra step allows users to create a virtual spectrum parameter by combining multiple input detectors, generally from an overdetermined system. The derived spectrum parameter can then be displayed using a Spectrum Plot and results used for the Unmixing pipeline step. Please see our chapter on Virtual Spectrum for more information. Parameters can be selected individually by clicking on the boxes or by using the right click menu.  The right click menu gives the user the option to Select All, Deselect All, or Invert Selection.

 

The virtual spectrum parameter will be named with the text defined in the New Spectral Parameter Name field. In the example below, the new parameter has been named Full Spectrum.

 

The Merged Spectra details can also be saved as an AllExtraKeywords.txt file and then be used to automatically recreate the spectrum parameter without using any pipeline. Please refer to the Virtual Spectrum chapter for more details.

pipeline_mergetospectra

 

Parameter Removal

 

The Parameter Removal step allows users to remove parameters from the parameter list.

 

Parameter Removal is particularly useful when one or multiple parameters are not useful in the analysis, or a data file for use in a template was acquired with more parameters than the original data file, and thus can be removed to make the parameter list shorter or to make the data file adhere to the original template design.

 

pipeline_parameter_removal

 

 

Unmixing

 

The Unmixing step allows a user to perform spectral unmixing on the input spectral parameter using an unmixing matrix imported by the user.

 

The spectrum parameter may come from a data file acquired on a compatible spectral instrument or from a virtual spectrum parameter created via the Merge to Spectra step.

The unmixing matrix can also be one that was created in FCS Express using the Unmixing wizard.

 

In the Options section, the user can define:

Which algorithm use to unmix the input parameter.

Which suffix to add to the unmixed parameters.

 

pipeline_unmixing

 

 

Virtual Bandpass

 

The Virtual Bandpass step allows users to combine multiple single detectors, generally from an overdetermined system (e.g. a spectral instrument) to recreate a virtual bandpass filter including a particular range of detectors.

 

Compatible input parameters for the Virtual Bandpass step are spectral parameters that are derived from a compatible spectral data file or from a Merge to Spectra pipeline step. If multiple spectral parameter are selected, the virtual band pass filter will be applied to each of them independently.

 

Once the input parameters are defined, the user can specify which detectors to combine in the Spectral Detectors Range field. Detectors may be listed as comma separated values (e.g. 1,2,3,4,5) or with dash (e.g. 1-5 is equal to 1,2,3,4,5). Detector numbers are associated with the order in which they appear in a spectral plot. For instance, if your spectral plot displays FITC-A, PE-A, PeCy7-A, APC-A, and APC-Cy7-A, the detector numbers will be 1. FITC-A, 2. PE-A, 3. PeCy7-A, 4. APC-A, and 5. APC-Cy7-A (example below).

 

VirtualSpectrumPipeline

 

The calculated values for each Virtual Bandpass Filter step will be saved into a new parameter. The new parameter is defined by the user in the New parameter base name field (e.g. "UV-A Virtual bandpass").

 

Different summary statistics (i.e.: Maximum, Minimum, Average, Average, Standard Deviation) can be selected for the calculation of the new values.

 

Two suffix can be automatically added to the new parameter name via the following check boxes:

Add stats name to new parameter base name. When this check box is checked, the name of the statistics selected from the Chose stats to calculate list will be added as a suffix to the name defined in the New parameter base name field. If multiple statistics are selected, the name of the statistics will always be added as a suffix.

Add spectra name to new parameter base name. When this check box is checked, the name of the parameter selected from the Parameter Options list will be added as a suffix to the name defined in the New parameter base name field. If multiple input parameters are selected, the name of input parameter will always be added as a suffix.

 

 

pipeline_virtualbandpass