Class SubspaceCluster

All Implemented Interfaces:
Serializable, OptionHandler, Randomizable, RevisionHandler

public class SubspaceCluster extends ClusterGenerator
A data generator that produces data points in hyperrectangular subspace clusters.

Valid options are:

 -h
  Prints this help.
 -o <file>
  The name of the output file, otherwise the generated data is
  printed to stdout.
 -r <name>
  The name of the relation.
 -d
  Whether to print debug informations.
 -S
  The seed for random function (default 1)
 -a <num>
  The number of attributes (default 1).
 -c
  Class Flag, if set, the cluster is listed in extra attribute.
 -b <range>
  The indices for boolean attributes.
 -m <range>
  The indices for nominal attributes.
 -P <num>
  The noise rate in percent (default 0.0).
  Can be between 0% and 30%. (Remark: The original 
  algorithm only allows noise up to 10%.)
 -C <cluster-definition>
  A cluster definition of class 'SubspaceClusterDefinition'
  (definition needs to be quoted to be recognized as 
  a single argument).
 
 Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
 
 -A <range>
  Generates randomly distributed instances in the cluster.
 -U <range>
  Generates uniformly distributed instances in the cluster.
 -G <range>
  Generates gaussian distributed instances in the cluster.
 -D <num>,<num>
  The attribute min/max (-A and -U) or mean/stddev (-G) for
  the cluster.
 -N <num>..<num>
  The range of number of instances per cluster (default 1..50).
 -I
  Uses integer instead of continuous values (default continuous).
Version:
$Revision: 1.5 $
Author:
Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
See Also:
  • Field Details

    • UNIFORM_RANDOM

      public static final int UNIFORM_RANDOM
      cluster type: uniform/random
      See Also:
    • TOTAL_UNIFORM

      public static final int TOTAL_UNIFORM
      cluster type: total uniform
      See Also:
    • GAUSSIAN

      public static final int GAUSSIAN
      cluster type: gaussian
      See Also:
    • TAGS_CLUSTERTYPE

      public static final Tag[] TAGS_CLUSTERTYPE
      the tags for the cluster types
    • CONTINUOUS

      public static final int CONTINUOUS
      cluster subtype: continuous
      See Also:
    • INTEGER

      public static final int INTEGER
      cluster subtype: integer
      See Also:
    • TAGS_CLUSTERSUBTYPE

      public static final Tag[] TAGS_CLUSTERSUBTYPE
      the tags for the cluster types
  • Constructor Details

    • SubspaceCluster

      public SubspaceCluster()
      initializes the generator, sets the number of clusters to 0, since user has to specify them explicitly
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this data generator.
      Returns:
      a description of the data generator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class ClusterGenerator
      Returns:
      an enumeration of all the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a list of options for this object.

      Valid options are:

       -h
        Prints this help.
       -o <file>
        The name of the output file, otherwise the generated data is
        printed to stdout.
       -r <name>
        The name of the relation.
       -d
        Whether to print debug informations.
       -S
        The seed for random function (default 1)
       -a <num>
        The number of attributes (default 1).
       -c
        Class Flag, if set, the cluster is listed in extra attribute.
       -b <range>
        The indices for boolean attributes.
       -m <range>
        The indices for nominal attributes.
       -P <num>
        The noise rate in percent (default 0.0).
        Can be between 0% and 30%. (Remark: The original 
        algorithm only allows noise up to 10%.)
       -C <cluster-definition>
        A cluster definition of class 'SubspaceClusterDefinition'
        (definition needs to be quoted to be recognized as 
        a single argument).
       
       Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
       
       -A <range>
        Generates randomly distributed instances in the cluster.
       -U <range>
        Generates uniformly distributed instances in the cluster.
       -G <range>
        Generates gaussian distributed instances in the cluster.
       -D <num>,<num>
        The attribute min/max (-A and -U) or mean/stddev (-G) for
        the cluster.
       -N <num>..<num>
        The range of number of instances per cluster (default 1..50).
       -I
        Uses integer instead of continuous values (default continuous).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class ClusterGenerator
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the datagenerator.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class ClusterGenerator
      Returns:
      an array of strings suitable for passing to setOptions
      See Also:
      • DataGenerator.removeBlacklist(String[])
    • setNumAttributes

      public void setNumAttributes(int numAttributes)
      Sets the number of attributes the dataset should have.
      Overrides:
      setNumAttributes in class ClusterGenerator
      Parameters:
      numAttributes - the new number of attributes
    • numAttributesTipText

      public String numAttributesTipText()
      Returns the tip text for this property
      Overrides:
      numAttributesTipText in class ClusterGenerator
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNoiseRate

      public double getNoiseRate()
      Gets the percentage of noise set.
      Returns:
      the percentage of noise set
    • setNoiseRate

      public void setNoiseRate(double newNoiseRate)
      Sets the percentage of noise set.
      Parameters:
      newNoiseRate - new percentage of noise
    • noiseRateTipText

      public String noiseRateTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getClusterDefinitions

      public ClusterDefinition[] getClusterDefinitions()
      returns the currently set clusters
      Returns:
      the currently set clusters
    • setClusterDefinitions

      public void setClusterDefinitions(ClusterDefinition[] value) throws Exception
      sets the clusters to use
      Parameters:
      value - the clusters do use
      Throws:
      Exception - if clusters are not the correct class
    • clusterDefinitionsTipText

      public String clusterDefinitionsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getSingleModeFlag

      public boolean getSingleModeFlag()
      Gets the single mode flag.
      Specified by:
      getSingleModeFlag in class DataGenerator
      Returns:
      true if methode generateExample can be used.
    • defineDataFormat

      public Instances defineDataFormat() throws Exception
      Initializes the format for the dataset produced.
      Overrides:
      defineDataFormat in class DataGenerator
      Returns:
      the output data format
      Throws:
      Exception - data format could not be defined
      See Also:
      • DataGenerator.defaultRelationName()
    • isBoolean

      public boolean isBoolean(int index)
      Returns true if attribute is boolean
      Parameters:
      index - of the attribute
      Returns:
      true if the attribute is boolean
    • isNominal

      public boolean isNominal(int index)
      Returns true if attribute is nominal
      Parameters:
      index - of the attribute
      Returns:
      true if the attribute is nominal
    • getNumValues

      public int[] getNumValues()
      returns array that stores the number of values for a nominal attribute.
      Returns:
      the array that stores the number of values for a nominal attribute
    • generateExample

      public Instance generateExample() throws Exception
      Generate an example of the dataset.
      Specified by:
      generateExample in class DataGenerator
      Returns:
      the instance generated
      Throws:
      Exception - if format not defined or generating
      examples one by one is not possible, because voting is chosen
    • generateExamples

      public Instances generateExamples() throws Exception
      Generate all examples of the dataset.
      Specified by:
      generateExamples in class DataGenerator
      Returns:
      the instance generated
      Throws:
      Exception - if format not defined
    • generateFinished

      public String generateFinished() throws Exception
      Compiles documentation about the data generation after the generation process
      Specified by:
      generateFinished in class DataGenerator
      Returns:
      string with additional information about generated dataset
      Throws:
      Exception - no input structure has been defined
    • generateStart

      public String generateStart()
      Compiles documentation about the data generation before the generation process
      Specified by:
      generateStart in class DataGenerator
      Returns:
      string with additional information
    • getRevision

      public String getRevision()
      Returns the revision string.
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - should contain arguments for the data producer: