Class MetaCost

All Implemented Interfaces:
Serializable, Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class MetaCost extends RandomizableSingleClassifierEnhancer implements TechnicalInformationHandler
This metaclassifier makes its base classifier cost-sensitive using the method specified in

Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999.

This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

BibTeX:

 @inproceedings{Domingos1999,
    author = {Pedro Domingos},
    booktitle = {Fifth International Conference on Knowledge Discovery and Data Mining},
    pages = {155-164},
    title = {MetaCost: A general method for making classifiers cost-sensitive},
    year = {1999}
 }
 

Valid options are:

 -I <num>
  Number of bagging iterations.
  (default 10)
 -C <cost file name>
  File name of a cost matrix to use. If this is not supplied,
  a cost matrix will be loaded on demand. The name of the
  on-demand file is the relation name of the training data
  plus ".cost", and the path to the on-demand file is
  specified with the -N option.
 -N <directory>
  Name of a directory to search for cost files when loading
  costs on demand (default current directory).
 -cost-matrix <matrix>
  The cost matrix in Matlab single line format.
 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Version:
$Revision: 1.24 $
Author:
Len Trigg (len@reeltwo.com)
See Also:
  • Field Details

    • MATRIX_ON_DEMAND

      public static final int MATRIX_ON_DEMAND
      load cost matrix on demand
      See Also:
    • MATRIX_SUPPLIED

      public static final int MATRIX_SUPPLIED
      use explicit matrix
      See Also:
    • TAGS_MATRIX_SOURCE

      public static final Tag[] TAGS_MATRIX_SOURCE
      Specify possible sources of the cost matrix
  • Constructor Details

    • MetaCost

      public MetaCost()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -I <num>
        Number of bagging iterations.
        (default 10)
       -C <cost file name>
        File name of a cost matrix to use. If this is not supplied,
        a cost matrix will be loaded on demand. The name of the
        on-demand file is the relation name of the training data
        plus ".cost", and the path to the on-demand file is
        specified with the -N option.
       -N <directory>
        Name of a directory to search for cost files when loading
        costs on demand (default current directory).
       -cost-matrix <matrix>
        The cost matrix in Matlab single line format.
       -P
        Size of each bag, as a percentage of the
        training set size. (default 100)
       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -W
        Full name of base classifier.
        (default: weka.classifiers.rules.ZeroR)
       
       Options specific to classifier weka.classifiers.rules.ZeroR:
       
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
      Options after -- are passed to the designated classifier.

      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableSingleClassifierEnhancer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an array of strings suitable for passing to setOptions
    • costMatrixSourceTipText

      public String costMatrixSourceTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getCostMatrixSource

      public SelectedTag getCostMatrixSource()
      Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.
      Returns:
      the cost matrix source.
    • setCostMatrixSource

      public void setCostMatrixSource(SelectedTag newMethod)
      Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.
      Parameters:
      newMethod - the cost matrix location method.
    • onDemandDirectoryTipText

      public String onDemandDirectoryTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getOnDemandDirectory

      public File getOnDemandDirectory()
      Returns the directory that will be searched for cost files when loading on demand.
      Returns:
      The cost file search directory.
    • setOnDemandDirectory

      public void setOnDemandDirectory(File newDir)
      Sets the directory that will be searched for cost files when loading on demand.
      Parameters:
      newDir - The cost file search directory.
    • bagSizePercentTipText

      public String bagSizePercentTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getBagSizePercent

      public int getBagSizePercent()
      Gets the size of each bag, as a percentage of the training set size.
      Returns:
      the bag size, as a percentage.
    • setBagSizePercent

      public void setBagSizePercent(int newBagSizePercent)
      Sets the size of each bag, as a percentage of the training set size.
      Parameters:
      newBagSizePercent - the bag size, as a percentage.
    • numIterationsTipText

      public String numIterationsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumIterations

      public void setNumIterations(int numIterations)
      Sets the number of bagging iterations
      Parameters:
      numIterations - the number of iterations to use
    • getNumIterations

      public int getNumIterations()
      Gets the number of bagging iterations
      Returns:
      the maximum number of bagging iterations
    • costMatrixTipText

      public String costMatrixTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getCostMatrix

      public CostMatrix getCostMatrix()
      Gets the misclassification cost matrix.
      Returns:
      the cost matrix
    • setCostMatrix

      public void setCostMatrix(CostMatrix newCostMatrix)
      Sets the misclassification cost matrix.
      Parameters:
      newCostMatrix - the cost matrix
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class SingleClassifierEnhancer
      Returns:
      the capabilities of this classifier
      See Also:
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Builds the model of the base learner.
      Specified by:
      buildClassifier in class Classifier
      Parameters:
      data - the training data
      Throws:
      Exception - if the classifier could not be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Classifies a given instance after filtering.
      Overrides:
      distributionForInstance in class Classifier
      Parameters:
      instance - the instance to be classified
      Returns:
      the class distribution for the given instance
      Throws:
      Exception - if instance could not be classified successfully
    • toString

      public String toString()
      Output a representation of this classifier
      Overrides:
      toString in class Object
      Returns:
      a string representaiton of the classifier
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Classifier
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain the following arguments: -t training file [-T test file] [-c class index]