Class SimpleKMeans

All Implemented Interfaces:
Serializable, Cloneable, Clusterer, NumberOfClustersRequestable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, WeightedInstancesHandler

public class SimpleKMeans extends RandomizableClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler
Cluster data using the k means algorithm

Valid options are:

 -N <num>
  number of clusters.
  (default 2).
 
 -V
  Display std. deviations for centroids.
 
 -M
  Replace missing values with mean/mode.
 
 -S <num>
  Random number seed.
  (default 10)
 
 -A <classname and options>
  Distance function to be used for instance comparison
  (default weka.core.EuclidianDistance)
 
 -I <num>
  Maximum number of iterations.
 
 -O 
  Preserve order of instances.
 
Version:
$Revision: 10537 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • SimpleKMeans

      public SimpleKMeans()
      the default constructor
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Clusterer
      Overrides:
      getCapabilities in class AbstractClusterer
      Returns:
      the capabilities of this clusterer
      See Also:
    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
      Specified by:
      buildClusterer in interface Clusterer
      Specified by:
      buildClusterer in class AbstractClusterer
      Parameters:
      data - set of instances serving as training data
      Throws:
      Exception - if the clusterer has not been generated successfully
    • clusterInstance

      public int clusterInstance(Instance instance) throws Exception
      Classifies a given instance.
      Specified by:
      clusterInstance in interface Clusterer
      Overrides:
      clusterInstance in class AbstractClusterer
      Parameters:
      instance - the instance to be assigned to a cluster
      Returns:
      the number of the assigned cluster as an interger if the class is enumerated, otherwise the predicted value
      Throws:
      Exception - if instance could not be classified successfully
    • numberOfClusters

      public int numberOfClusters() throws Exception
      Returns the number of clusters.
      Specified by:
      numberOfClusters in interface Clusterer
      Specified by:
      numberOfClusters in class AbstractClusterer
      Returns:
      the number of clusters generated for a training dataset.
      Throws:
      Exception - if number of clusters could not be returned successfully
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableClusterer
      Returns:
      an enumeration of all the available options.
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumClusters

      public void setNumClusters(int n) throws Exception
      set the number of clusters to generate
      Specified by:
      setNumClusters in interface NumberOfClustersRequestable
      Parameters:
      n - the number of clusters to generate
      Throws:
      Exception - if number of clusters is negative
    • getNumClusters

      public int getNumClusters()
      gets the number of clusters to generate
      Returns:
      the number of clusters to generate
    • maxIterationsTipText

      public String maxIterationsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaxIterations

      public void setMaxIterations(int n) throws Exception
      set the maximum number of iterations to be executed
      Parameters:
      n - the maximum number of iterations
      Throws:
      Exception - if maximum number of iteration is smaller than 1
    • getMaxIterations

      public int getMaxIterations()
      gets the number of maximum iterations to be executed
      Returns:
      the number of clusters to generate
    • displayStdDevsTipText

      public String displayStdDevsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDisplayStdDevs

      public void setDisplayStdDevs(boolean stdD)
      Sets whether standard deviations and nominal count Should be displayed in the clustering output
      Parameters:
      stdD - true if std. devs and counts should be displayed
    • getDisplayStdDevs

      public boolean getDisplayStdDevs()
      Gets whether standard deviations and nominal count Should be displayed in the clustering output
      Returns:
      true if std. devs and counts should be displayed
    • dontReplaceMissingValuesTipText

      public String dontReplaceMissingValuesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDontReplaceMissingValues

      public void setDontReplaceMissingValues(boolean r)
      Sets whether missing values are to be replaced
      Parameters:
      r - true if missing values are to be replaced
    • getDontReplaceMissingValues

      public boolean getDontReplaceMissingValues()
      Gets whether missing values are to be replaced
      Returns:
      true if missing values are to be replaced
    • distanceFunctionTipText

      public String distanceFunctionTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getDistanceFunction

      public DistanceFunction getDistanceFunction()
      returns the distance function currently in use.
      Returns:
      the distance function
    • setDistanceFunction

      public void setDistanceFunction(DistanceFunction df) throws Exception
      sets the distance function to use for instance comparison.
      Parameters:
      df - the new distance function to use
      Throws:
      Exception - if instances cannot be processed
    • preserveInstancesOrderTipText

      public String preserveInstancesOrderTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPreserveInstancesOrder

      public void setPreserveInstancesOrder(boolean r)
      Sets whether order of instances must be preserved
      Parameters:
      r - true if missing values are to be replaced
    • getPreserveInstancesOrder

      public boolean getPreserveInstancesOrder()
      Gets whether order of instances must be preserved
      Returns:
      true if missing values are to be replaced
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <num>
        number of clusters.
        (default 2).
       
       -V
        Display std. deviations for centroids.
       
       -M
        Replace missing values with mean/mode.
       
       -S <num>
        Random number seed.
        (default 10)
       
       -A <classname and options>
        Distance function to be used for instance comparison
        (default weka.core.EuclidianDistance)
       
       -I <num>
        Maximum number of iterations.
       
       -O
        Preserve order of instances.
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableClusterer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of SimpleKMeans
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableClusterer
      Returns:
      an array of strings suitable for passing to setOptions()
    • toString

      public String toString()
      return a string describing this clusterer
      Overrides:
      toString in class Object
      Returns:
      a description of the clusterer as a string
    • getClusterCentroids

      public Instances getClusterCentroids()
      Gets the the cluster centroids
      Returns:
      the cluster centroids
    • getClusterStandardDevs

      public Instances getClusterStandardDevs()
      Gets the standard deviations of the numeric attributes in each cluster
      Returns:
      the standard deviations of the numeric attributes in each cluster
    • getClusterNominalCounts

      public int[][][] getClusterNominalCounts()
      Returns for each cluster the frequency counts for the values of each nominal attribute
      Returns:
      the counts
    • getSquaredError

      public double getSquaredError()
      Gets the squared error for all clusters
      Returns:
      the squared error
    • getClusterSizes

      public int[] getClusterSizes()
      Gets the number of instances in each cluster
      Returns:
      The number of instances in each cluster
    • getAssignments

      public int[] getAssignments() throws Exception
      Gets the assignments for each instance
      Returns:
      Array of indexes of the centroid assigned to each instance
      Throws:
      Exception - if order of instances wasn't preserved or no assignments were made
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClusterer
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain the following arguments:

      -t training file [-N number of clusters]