Class LatentSemanticAnalysis

All Implemented Interfaces:
Serializable, AttributeEvaluator, AttributeTransformer, CapabilitiesHandler, OptionHandler, RevisionHandler

public class LatentSemanticAnalysis extends UnsupervisedAttributeEvaluator implements AttributeTransformer, OptionHandler
Performs latent semantic analysis and transformation of the data. Use in conjunction with a Ranker search. A low-rank approximation of the full data is found by specifying the number of singular values to use. The dataset may be transformed to give the relation of either the attributes or the instances (default) to the concept space created by the transformation.

Valid options are:

 -N
  Normalize input data.
 -R
  Rank approximation used in LSA. May be actual number of 
  LSA attributes to include (if greater than 1) or a proportion 
  of total singular values to account for (if between 0 and 1). 
  A value less than or equal to zero means use all latent variables.
  (default = 0.95)
 -A
  Maximum number of attributes to include in 
  transformed attribute names. (-1 = include all)
Version:
$Revision: 11821 $
Author:
Amri Napolitano
See Also:
  • Constructor Details

    • LatentSemanticAnalysis

      public LatentSemanticAnalysis()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this attribute transformer
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.

      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N
        Normalize input data.
       -R
        Rank approximation used in LSA. May be actual number of 
        LSA attributes to include (if greater than 1) or a proportion 
        of total singular values to account for (if between 0 and 1). 
        A value less than or equal to zero means use all latent variables.
        (default = 0.95)
       -A
        Maximum number of attributes to include in 
        transformed attribute names. (-1 = include all)
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • normalizeTipText

      public String normalizeTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNormalize

      public void setNormalize(boolean newNormalize)
      Set whether input data will be normalized.
      Parameters:
      newNormalize - true if input data is to be normalized
    • getNormalize

      public boolean getNormalize()
      Gets whether or not input data is to be normalized
      Returns:
      true if input data is to be normalized
    • rankTipText

      public String rankTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setRank

      public void setRank(double newRank)
      Sets the desired matrix rank (or coverage proportion) for feature-space reduction
      Parameters:
      newRank - the desired rank (or coverage) for feature-space reduction
    • getRank

      public double getRank()
      Gets the desired matrix rank (or coverage proportion) for feature-space reduction
      Returns:
      the rank (or coverage) for feature-space reduction
    • maximumAttributeNamesTipText

      public String maximumAttributeNamesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaximumAttributeNames

      public void setMaximumAttributeNames(int newMaxAttributes)
      Sets maximum number of attributes to include in transformed attribute names.
      Parameters:
      newMaxAttributes - the maximum number of attributes
    • getMaximumAttributeNames

      public int getMaximumAttributeNames()
      Gets maximum number of attributes to include in transformed attribute names.
      Returns:
      the maximum number of attributes
    • getOptions

      public String[] getOptions()
      Gets the current settings of LatentSemanticAnalysis
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      an array of strings suitable for passing to setOptions()
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the capabilities of this evaluator.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class ASEvaluation
      Returns:
      the capabilities of this evaluator
      See Also:
    • buildEvaluator

      public void buildEvaluator(Instances data) throws Exception
      Initializes the singular values/vectors and performs the analysis
      Specified by:
      buildEvaluator in class ASEvaluation
      Parameters:
      data - the instances to analyse/transform
      Throws:
      Exception - if analysis fails
    • transformedHeader

      public Instances transformedHeader() throws Exception
      Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through getTransformedData().
      Specified by:
      transformedHeader in interface AttributeTransformer
      Returns:
      the header of the transformed data.
      Throws:
      Exception - if the header of the transformed data can't be determined.
    • transformedData

      public Instances transformedData(Instances data) throws Exception
      Transform the supplied data set (assumed to be the same format as the training data)
      Specified by:
      transformedData in interface AttributeTransformer
      Returns:
      the transformed training data
      Throws:
      Exception - if transformed data can't be returned
    • evaluateAttribute

      public double evaluateAttribute(int att) throws Exception
      Evaluates the merit of a transformed attribute. This is defined to be the square of the singular value for the latent variable corresponding to the transformed attribute.
      Specified by:
      evaluateAttribute in interface AttributeEvaluator
      Parameters:
      att - the attribute to be evaluated
      Returns:
      the merit of a transformed attribute
      Throws:
      Exception - if attribute can't be evaluated
    • convertInstance

      public Instance convertInstance(Instance instance) throws Exception
      Transform an instance in original (unnormalized) format
      Specified by:
      convertInstance in interface AttributeTransformer
      Parameters:
      instance - an instance in the original (unnormalized) format
      Returns:
      a transformed instance
      Throws:
      Exception - if instance can't be transformed
    • toString

      public String toString()
      Returns a description of this attribute transformer
      Overrides:
      toString in class Object
      Returns:
      a String describing this attribute transformer
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class ASEvaluation
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class
      Parameters:
      argv - should contain the command line arguments to the evaluator/transformer (see AttributeSelection)