Clustering

Adaptive Binning values
Note that adaptive binning will take a set of events, create a histogram on every parameter being clustered for those events, and then examine if any of those histograms could be divided usefully. The following parameters control how this histogram examination is performed to divide events further and further into smaller and smaller bins.

Minimum Separation Channel:

Clustering will not divide any population below this fraction of the total scale of a parameter. Thus, a value of "0.25" means don't divide any subset within the first decade (on a 4-decade log scale). Setting this value too high will not allow the platform to resolve populations near the low end of the scale. Too low, and it may create "biologically-irrelevant" populations.

Range of values: 0 - 1
Default value: 0.03 (roughly, 1/8th of a decade in a 4-decade scale).
Other useful values: 0.06, 0.125, 0.25

Histogram Resolution:

The number of channels used for histogramming. Channels are equally spaced (linear parameter), or log-spaced (log parameters), exactly as for creating displays. Setting this value too high could cause subsets to be "orphaned" during the "shaving" of empty space off of histograms during adaptive binning... this could lead to clusters that cannot be joined. In addition, the clustering algorithm will take longer with larger values. Setting this value too low will lead to an inability to distinguish clusters.

Range of values: None (However, less than 16 is probably useless, as is more than 1024)
Default value: 256.
Other useful values: 128, 64.

Minimum Shave Fraction:

During adaptive binning, one major criterion for dividing a distribution into bins is whether or not there is "empty" space at one end of the histogram. Such empty space will be "shaved off" into a separate bin from the data. Since bins that are not physically adjacent can never be re-joined into a cluster, shaving too aggressively can lead to the inability to rejoin event-containing bins into real clusters. For this clustering algorithm, the shaving step is one of the most critical.

This value specifies how much of a distribution in a given parameter must be devoid of events before the platform considers "shaving" it off. Thus, a value of "0.1" means that 10% of the distribution's width (either upper or lower end) must be completely devoid of events before shaving is considered. A value that is too small will lead to too many orphaned clusters. A value that is too large will not allow sufficient resolution to segregate otherwise close clusters.

Range of values: 0 - 1
Default value: 0.1.

Minimum Shave Channels:

See above discussion on "Minimum Shave Fraction." In addition to that criterion, the platform will never shave off any empty space if it contains less than a fixed number of channels, which is this value times the histogram resolution. In other words, this value defines the smallest empty space that can be created during binning.

Range of values: 0 - 1
Default value: 0.063 (approximately 1/4th decade on a log scale)
Other useful values: Smaller, probably as low as 0.01 might be useful.

Maximum Value Height Ratio:

During adaptive binning, the other criterion for dividing a distribution that is always checked is whether or not there is a bimodal distribution. This value determines what kind of distribution is considered bimodal. If a valley exists between two peaks, then the distribution is bimodal. A valley must be no higher than this value times the lower of the two peaks. (In other words, a value of 0.5 means that the valley must be no higher than half of the height of the lower peak). If there is no point in the distribution where this exists, then the distribution is not considered bimodal. A value that is too small will prevent the platform from finding useful peaks in a distribution. A value that is too high will divide events seemingly randomly.

Range of values: 0 - 1
Default value: 0.5
Useful value: 0.75

Separation: Valley Weighting

Separation: Even Division Weighting

If, for any given set of events, there is more than one parameter that is bimodal, then these two values determine which one better separates the events. Since adaptive binning is iterative, it probably doesn't matter which parameter is first divided; the other will be divided shortly. Nonetheless, these two parameters provide the relative weighting to compare two such divisions. A larger value on valley weighting means that emphasis is place more on how deep a valley is between the two peaks. A larger value on the "Ev Division" means that emphasis is placed on divisions that more evenly divide the events by number.

Range of values: 0 - infinity
Default values: 1 and 10
Useful values: 10 and 1

Allow division of uniform clusters
If set, then adaptive binning is instructed to divide a uniform cluster of events. This occurs when no shaving can occur and no "valley" between peaks can be found in any parameter. If this option is selected, then Cluster Joining will be required.

Division percentile
If the platform decides to divide a uniform cluster, then it will do so at this percentile of the distribution. A value of 0.5 means to divide the distribution evenly. A value of 0.8 means to divide the cluster at either the 20th or the 80th percentile, whichever gives a larger division in terms of area. Values further from 0.5 will tend to create more clusters near edges of event densities.

Range of values: 0 - 1
Default value: 0.8
Useful values: 0.5, 0.9

Minimum # of events to divide
If the number of events in a uniform cluster is less than this value, then the cluster is not divided. In general, if the uniform division is allowed, then there will be no cluster with more than this number of events. Therefore, this number effectively determines a lower bound on the number of clusters. A small value will therefore create many more clusters, and require correspondingly much greater computational time at joining.

Range of values: 1 - infinity
Default value: 100
Useful values: about 0.1 to 1.0% of the number of events in the file

Do simple Peak Find Separation
If selected, then the platform will attempt a more sophisticated peak finding algorithm to find populations (i.e., more sophisticated than the valley search performed by default). This algorithm might allow for the identification and separation of "shoulder" clusters. The algorithm functions by scanning on either side of the mode of a histogram. As it moves along either side, it computes the local slope of the histogram. By comparing the Based on the following parameters, a determination is made as to whether or not a peak has been found, and if so, what its extents are.

Chan Width of Slope Function

The number of channels over which a running slope is calculated. A large value will, in effect, use a heavily smoothed histogram to look for a peak. A small value will obviate the utility of the peak find.

Range of values: 1 - infinity
Default value: 5

Minimum Peak Height

A peak must be at least this many cells in a single channel before it is considered a true "peak". Large values will tend to prevent the algorithm from finding a peak; small values will tend to identify too many peaks.

Range of values: 1 - infinity
Default value: 10

Trigger Slope: Mode Ratio

A peak has been found when local slope (per channel) is more than this factor times the mode value.

Range of values: 1 - infinity
Default value: 0.1
Useful values: 0.01 - 0.5

Fire slope Ratio

The end of the peak is identified when the local slope is less than this factor times the maximum slope that occurred since the mode. i.e., a value of 0.1 means that the farthest reaches of the peak occurs when the slope is only 10% (or less) of the maximum slope between this point and the mode. Larger values will find more subtle shoulders in a distribution, but are more prone to simply randomly dividing the distribution. A value of 0 would require a true "valley"--i.e, for the distribution to start rising again.

Range of values: 0 - 1
Default value: 0.1

Joining Criteria:
If joining is initiated, then the algorithm attempts to join all physically adjacent bins into clusters. Bins that are not physically adjacent (tangent) are not considered; hence, the "shaving" function above cannot be too aggressive.

Minimum
Joining ceases when this many clusters remain. Since the order in which bins are joined is not well-defined, it is unlikely that this value should ever be changed from the default of 1.

Max InterClus Dist x
In order for two clusters to be joined, not only must they be tangent but the distributions of each cluster must not be significantly changed by the join. Currently, this is tested as making sure that for every parameter involved in the cluster, the distance between the centers of the two clusters is no more than this value times the width of each cluster. Therefore, larger values will tend to join clusters that are more distinct; smaller values will tend to keep clusters apart.

Range of values: 0 - infinity.
Default value: 2
Useful values: 1 - 3.


	FlowJo v9 Manual
FlowJo V9 Table of Contents Overview Workspace Graphs Platforms Output Techniques Menus Preferences	Clustering Adaptive Binning values Note that adaptive binning will take a set of events, create a histogram on every parameter being clustered for those events, and then examine if any of those histograms could be divided usefully. The following parameters control how this histogram examination is performed to divide events further and further into smaller and smaller bins. Minimum Separation Channel: Clustering will not divide any population below this fraction of the total scale of a parameter. Thus, a value of "0.25" means don't divide any subset within the first decade (on a 4-decade log scale). Setting this value too high will not allow the platform to resolve populations near the low end of the scale. Too low, and it may create "biologically-irrelevant" populations. Range of values: 0 - 1 Default value: 0.03 (roughly, 1/8th of a decade in a 4-decade scale). Other useful values: 0.06, 0.125, 0.25 Histogram Resolution: The number of channels used for histogramming. Channels are equally spaced (linear parameter), or log-spaced (log parameters), exactly as for creating displays. Setting this value too high could cause subsets to be "orphaned" during the "shaving" of empty space off of histograms during adaptive binning... this could lead to clusters that cannot be joined. In addition, the clustering algorithm will take longer with larger values. Setting this value too low will lead to an inability to distinguish clusters. Range of values: None (However, less than 16 is probably useless, as is more than 1024) Default value: 256. Other useful values: 128, 64. Minimum Shave Fraction: During adaptive binning, one major criterion for dividing a distribution into bins is whether or not there is "empty" space at one end of the histogram. Such empty space will be "shaved off" into a separate bin from the data. Since bins that are not physically adjacent can never be re-joined into a cluster, shaving too aggressively can lead to the inability to rejoin event-containing bins into real clusters. For this clustering algorithm, the shaving step is one of the most critical. This value specifies how much of a distribution in a given parameter must be devoid of events before the platform considers "shaving" it off. Thus, a value of "0.1" means that 10% of the distribution's width (either upper or lower end) must be completely devoid of events before shaving is considered. A value that is too small will lead to too many orphaned clusters. A value that is too large will not allow sufficient resolution to segregate otherwise close clusters. Range of values: 0 - 1 Default value: 0.1. Minimum Shave Channels: See above discussion on "Minimum Shave Fraction." In addition to that criterion, the platform will never shave off any empty space if it contains less than a fixed number of channels, which is this value times the histogram resolution. In other words, this value defines the smallest empty space that can be created during binning. Range of values: 0 - 1 Default value: 0.063 (approximately 1/4th decade on a log scale) Other useful values: Smaller, probably as low as 0.01 might be useful. Maximum Value Height Ratio: During adaptive binning, the other criterion for dividing a distribution that is always checked is whether or not there is a bimodal distribution. This value determines what kind of distribution is considered bimodal. If a valley exists between two peaks, then the distribution is bimodal. A valley must be no higher than this value times the lower of the two peaks. (In other words, a value of 0.5 means that the valley must be no higher than half of the height of the lower peak). If there is no point in the distribution where this exists, then the distribution is not considered bimodal. A value that is too small will prevent the platform from finding useful peaks in a distribution. A value that is too high will divide events seemingly randomly. Range of values: 0 - 1 Default value: 0.5 Useful value: 0.75 Separation: Valley Weighting Separation: Even Division Weighting If, for any given set of events, there is more than one parameter that is bimodal, then these two values determine which one better separates the events. Since adaptive binning is iterative, it probably doesn't matter which parameter is first divided; the other will be divided shortly. Nonetheless, these two parameters provide the relative weighting to compare two such divisions. A larger value on valley weighting means that emphasis is place more on how deep a valley is between the two peaks. A larger value on the "Ev Division" means that emphasis is placed on divisions that more evenly divide the events by number. Range of values: 0 - infinity Default values: 1 and 10 Useful values: 10 and 1 Allow division of uniform clusters If set, then adaptive binning is instructed to divide a uniform cluster of events. This occurs when no shaving can occur and no "valley" between peaks can be found in any parameter. If this option is selected, then Cluster Joining will be required. Division percentile If the platform decides to divide a uniform cluster, then it will do so at this percentile of the distribution. A value of 0.5 means to divide the distribution evenly. A value of 0.8 means to divide the cluster at either the 20th or the 80th percentile, whichever gives a larger division in terms of area. Values further from 0.5 will tend to create more clusters near edges of event densities. Range of values: 0 - 1 Default value: 0.8 Useful values: 0.5, 0.9 Minimum # of events to divide If the number of events in a uniform cluster is less than this value, then the cluster is not divided. In general, if the uniform division is allowed, then there will be no cluster with more than this number of events. Therefore, this number effectively determines a lower bound on the number of clusters. A small value will therefore create many more clusters, and require correspondingly much greater computational time at joining. Range of values: 1 - infinity Default value: 100 Useful values: about 0.1 to 1.0% of the number of events in the file Do simple Peak Find Separation If selected, then the platform will attempt a more sophisticated peak finding algorithm to find populations (i.e., more sophisticated than the valley search performed by default). This algorithm might allow for the identification and separation of "shoulder" clusters. The algorithm functions by scanning on either side of the mode of a histogram. As it moves along either side, it computes the local slope of the histogram. By comparing the Based on the following parameters, a determination is made as to whether or not a peak has been found, and if so, what its extents are. Chan Width of Slope Function The number of channels over which a running slope is calculated. A large value will, in effect, use a heavily smoothed histogram to look for a peak. A small value will obviate the utility of the peak find. Range of values: 1 - infinity Default value: 5 Minimum Peak Height A peak must be at least this many cells in a single channel before it is considered a true "peak". Large values will tend to prevent the algorithm from finding a peak; small values will tend to identify too many peaks. Range of values: 1 - infinity Default value: 10 Trigger Slope: Mode Ratio A peak has been found when local slope (per channel) is more than this factor times the mode value. Range of values: 1 - infinity Default value: 0.1 Useful values: 0.01 - 0.5 Fire slope Ratio The end of the peak is identified when the local slope is less than this factor times the maximum slope that occurred since the mode. i.e., a value of 0.1 means that the farthest reaches of the peak occurs when the slope is only 10% (or less) of the maximum slope between this point and the mode. Larger values will find more subtle shoulders in a distribution, but are more prone to simply randomly dividing the distribution. A value of 0 would require a true "valley"--i.e, for the distribution to start rising again. Range of values: 0 - 1 Default value: 0.1 Joining Criteria: If joining is initiated, then the algorithm attempts to join all physically adjacent bins into clusters. Bins that are not physically adjacent (tangent) are not considered; hence, the "shaving" function above cannot be too aggressive. Minimum Joining ceases when this many clusters remain. Since the order in which bins are joined is not well-defined, it is unlikely that this value should ever be changed from the default of 1. Max InterClus Dist x In order for two clusters to be joined, not only must they be tangent but the distributions of each cluster must not be significantly changed by the join. Currently, this is tested as making sure that for every parameter involved in the cluster, the distance between the centers of the two clusters is no more than this value times the width of each cluster. Therefore, larger values will tend to join clusters that are more distinct; smaller values will tend to keep clusters apart. Range of values: 0 - infinity. Default value: 2 Useful values: 1 - 3.
Contact Us \| End-User License Agreement \| ©FlowJo, LLC 2006 - 2017 \| ©Stanford University 1995 - 1996

FlowJo v9 Manual

FlowJo V9

Clustering