Class sIB

All Implemented Interfaces:
Serializable, Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class sIB extends RandomizableClusterer implements TechnicalInformationHandler
Cluster data using the sequential information bottleneck algorithm.

Note: only hard clustering scheme is supported. sIB assign for each instance the cluster that have the minimum cost/distance to the instance. The trade-off beta is set to infinite so 1/beta is zero.

For more information, see:

Noam Slonim, Nir Friedman, Naftali Tishby: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, 129-136, 2002.

BibTeX:

 @inproceedings{Slonim2002,
    author = {Noam Slonim and Nir Friedman and Naftali Tishby},
    booktitle = {Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    pages = {129-136},
    title = {Unsupervised document classification using sequential information maximization},
    year = {2002}
 }
 

Valid options are:

 -I <num>
  maximum number of iterations
  (default 100).
 -M <num>
  minimum number of changes in a single iteration
  (default 0).
 -N <num>
  number of clusters.
  (default 2).
 -R <num>
  number of restarts.
  (default 5).
 -U
  set not to normalize the data
  (default true).
 -V
  set to output debug info
  (default false).
 -S <num>
  Random number seed.
  (default 1)
Version:
$Revision: 5538 $
Author:
Noam Slonim, Anna Huang
See Also:
  • Constructor Details

    • sIB

      public sIB()
  • Method Details

    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates a clusterer.
      Specified by:
      buildClusterer in interface Clusterer
      Specified by:
      buildClusterer in class AbstractClusterer
      Parameters:
      data - the training instances
      Throws:
      Exception - if something goes wrong
    • clusterInstance

      public int clusterInstance(Instance instance) throws Exception
      Cluster a given instance, this is the method defined in Clusterer interface do nothing but just return the cluster assigned to it
      Specified by:
      clusterInstance in interface Clusterer
      Overrides:
      clusterInstance in class AbstractClusterer
      Parameters:
      instance - the instance to be assigned to a cluster
      Returns:
      the number of the assigned cluster as an integer
      Throws:
      Exception - if instance could not be clustered successfully
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -I <num>
        maximum number of iterations
        (default 100).
       -M <num>
        minimum number of changes in a single iteration
        (default 0).
       -N <num>
        number of clusters.
        (default 2).
       -R <num>
        number of restarts.
        (default 5).
       -U
        set not to normalize the data
        (default true).
       -V
        set to output debug info
        (default false).
       -S <num>
        Random number seed.
        (default 1)
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableClusterer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableClusterer
      Returns:
      an enumeration of all the available options.
    • getOptions

      public String[] getOptions()
      Gets the current settings.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableClusterer
      Returns:
      an array of strings suitable for passing to setOptions()
    • debugTipText

      public String debugTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebug

      public void setDebug(boolean v)
      Set debug mode - verbose output
      Parameters:
      v - true for verbose output
    • getDebug

      public boolean getDebug()
      Get debug mode
      Returns:
      true if debug mode is set
    • maxIterationsTipText

      public String maxIterationsTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property
    • setMaxIterations

      public void setMaxIterations(int i)
      Set the max number of iterations
      Parameters:
      i - max number of iterations
    • getMaxIterations

      public int getMaxIterations()
      Get the max number of iterations
      Returns:
      max number of iterations
    • minChangeTipText

      public String minChangeTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property
    • setMinChange

      public void setMinChange(int m)
      set the minimum number of changes
      Parameters:
      m - the minimum number of changes
    • getMinChange

      public int getMinChange()
      get the minimum number of changes
      Returns:
      the minimum number of changes
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property
    • setNumClusters

      public void setNumClusters(int n)
      Set the number of clusters
      Parameters:
      n - number of clusters
    • getNumClusters

      public int getNumClusters()
      Get the number of clusters
      Returns:
      the number of clusters
    • numberOfClusters

      public int numberOfClusters()
      Get the number of clusters
      Specified by:
      numberOfClusters in interface Clusterer
      Specified by:
      numberOfClusters in class AbstractClusterer
      Returns:
      the number of clusters
    • numRestartsTipText

      public String numRestartsTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property
    • setNumRestarts

      public void setNumRestarts(int i)
      Set the number of restarts
      Parameters:
      i - number of restarts
    • getNumRestarts

      public int getNumRestarts()
      Get the number of restarts
      Returns:
      number of restarts
    • notUnifyNormTipText

      public String notUnifyNormTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property
    • setNotUnifyNorm

      public void setNotUnifyNorm(boolean b)
      Set whether to normalize instances to unify prior probability before building the clusterer
      Parameters:
      b - true to normalize, otherwise false
    • getNotUnifyNorm

      public boolean getNotUnifyNorm()
      Get whether to normalize instances to unify prior probability before building the clusterer
      Returns:
      true if set to normalize, false otherwise
    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer
      Returns:
      a description of the clusterer suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Clusterer
      Overrides:
      getCapabilities in class AbstractClusterer
      Returns:
      the capabilities of this clusterer
      See Also:
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClusterer
      Returns:
      the revision
    • main

      public static void main(String[] argv)