Class BFTree

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class BFTree
    extends RandomizableClassifier
    implements AdditionalMeasureProducer, TechnicalInformationHandler
    Class for building a best-first decision tree classifier. This class uses binary split for both nominal and numeric attributes. For missing values, the method of 'fractional' instances is used.

    For more information, see:

    Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ.

    Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407.

    BibTeX:

     @mastersthesis{Shi2007,
        address = {Hamilton, NZ},
        author = {Haijian Shi},
        note = {COMP594},
        school = {University of Waikato},
        title = {Best-first decision tree learning},
        year = {2007}
     }
     
     @article{Friedman2000,
        author = {Jerome Friedman and Trevor Hastie and Robert Tibshirani},
        journal = {Annals of statistics},
        number = {2},
        pages = {337-407},
        title = {Additive logistic regression : A statistical view of boosting},
        volume = {28},
        year = {2000},
        ISSN = {0090-5364}
     }
     

    Valid options are:

     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -P <UNPRUNED|POSTPRUNED|PREPRUNED>
      The pruning strategy.
      (default: POSTPRUNED)
     -M <min no>
      The minimal number of instances at the terminal nodes.
      (default 2)
     -N <num folds>
      The number of folds used in the pruning.
      (default 5)
     -H
      Don't use heuristic search for nominal attributes in multi-class
      problem (default yes).
     
     -G
      Don't use Gini index for splitting (default yes),
      if not information is used.
     -R
      Don't use error rate in internal cross-validation (default yes), 
      but root mean squared error.
     -A
      Use the 1 SE rule to make pruning decision.
      (default no).
     -C
      Percentage of training data size (0-1]
      (default 1).
    Version:
    $Revision: 6947 $
    Author:
    Haijian Shi (hs69@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Detail

      • PRUNING_UNPRUNED

        public static final int PRUNING_UNPRUNED
        pruning strategy: un-pruned
        See Also:
        Constant Field Values
      • PRUNING_POSTPRUNING

        public static final int PRUNING_POSTPRUNING
        pruning strategy: post-pruning
        See Also:
        Constant Field Values
      • PRUNING_PREPRUNING

        public static final int PRUNING_PREPRUNING
        pruning strategy: pre-pruning
        See Also:
        Constant Field Values
      • TAGS_PRUNING

        public static final Tag[] TAGS_PRUNING
        pruning strategy
    • Constructor Detail

      • BFTree

        public BFTree()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Method for building a BestFirst decision tree classifier.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - set of instances serving as training data
        Throws:
        java.lang.Exception - if decision tree cannot be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Computes class probabilities for instance using the decision tree.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance for which class probabilities is to be computed
        Returns:
        the class probabilities for the given instance
        Throws:
        java.lang.Exception - if something goes wrong
      • toString

        public java.lang.String toString()
        Prints the decision tree using the protected toString method from below.
        Overrides:
        toString in class java.lang.Object
        Returns:
        a textual description of the classifier
      • numNodes

        public int numNodes()
        Compute size of the tree.
        Returns:
        size of the tree
      • numLeaves

        public int numLeaves()
        Compute number of leaf nodes.
        Returns:
        number of leaf nodes
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class RandomizableClassifier
        Returns:
        an enumeration describing the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses the options for this object.

        Valid options are:

         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -P <UNPRUNED|POSTPRUNED|PREPRUNED>
          The pruning strategy.
          (default: POSTPRUNED)
         -M <min no>
          The minimal number of instances at the terminal nodes.
          (default 2)
         -N <num folds>
          The number of folds used in the pruning.
          (default 5)
         -H
          Don't use heuristic search for nominal attributes in multi-class
          problem (default yes).
         
         -G
          Don't use Gini index for splitting (default yes),
          if not information is used.
         -R
          Don't use error rate in internal cross-validation (default yes), 
          but root mean squared error.
         -A
          Use the 1 SE rule to make pruning decision.
          (default no).
         -C
          Percentage of training data size (0-1]
          (default 1).
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableClassifier
        Parameters:
        options - the options to use
        Throws:
        java.lang.Exception - if setting of options fails
      • enumerateMeasures

        public java.util.Enumeration enumerateMeasures()
        Return an enumeration of the measure names.
        Specified by:
        enumerateMeasures in interface AdditionalMeasureProducer
        Returns:
        an enumeration of the measure names
      • measureTreeSize

        public double measureTreeSize()
        Return number of tree size.
        Returns:
        number of tree size
      • getMeasure

        public double getMeasure​(java.lang.String additionalMeasureName)
        Returns the value of the named measure
        Specified by:
        getMeasure in interface AdditionalMeasureProducer
        Parameters:
        additionalMeasureName - the name of the measure to query for its value
        Returns:
        the value of the named measure
        Throws:
        java.lang.IllegalArgumentException - if the named measure is not supported
      • pruningStrategyTipText

        public java.lang.String pruningStrategyTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setPruningStrategy

        public void setPruningStrategy​(SelectedTag value)
        Sets the pruning strategy.
        Parameters:
        value - the strategy
      • getPruningStrategy

        public SelectedTag getPruningStrategy()
        Gets the pruning strategy.
        Returns:
        the current strategy.
      • minNumObjTipText

        public java.lang.String minNumObjTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMinNumObj

        public void setMinNumObj​(int value)
        Set minimal number of instances at the terminal nodes.
        Parameters:
        value - minimal number of instances at the terminal nodes
      • getMinNumObj

        public int getMinNumObj()
        Get minimal number of instances at the terminal nodes.
        Returns:
        minimal number of instances at the terminal nodes
      • numFoldsPruningTipText

        public java.lang.String numFoldsPruningTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumFoldsPruning

        public void setNumFoldsPruning​(int value)
        Set number of folds in internal cross-validation.
        Parameters:
        value - the number of folds
      • getNumFoldsPruning

        public int getNumFoldsPruning()
        Set number of folds in internal cross-validation.
        Returns:
        number of folds in internal cross-validation
      • heuristicTipText

        public java.lang.String heuristicTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setHeuristic

        public void setHeuristic​(boolean value)
        Set if use heuristic search for nominal attributes in multi-class problems.
        Parameters:
        value - if use heuristic search for nominal attributes in multi-class problems
      • getHeuristic

        public boolean getHeuristic()
        Get if use heuristic search for nominal attributes in multi-class problems.
        Returns:
        if use heuristic search for nominal attributes in multi-class problems
      • useGiniTipText

        public java.lang.String useGiniTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setUseGini

        public void setUseGini​(boolean value)
        Set if use Gini index as splitting criterion.
        Parameters:
        value - if use Gini index splitting criterion
      • getUseGini

        public boolean getUseGini()
        Get if use Gini index as splitting criterion.
        Returns:
        if use Gini index as splitting criterion
      • useErrorRateTipText

        public java.lang.String useErrorRateTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setUseErrorRate

        public void setUseErrorRate​(boolean value)
        Set if use error rate in internal cross-validation.
        Parameters:
        value - if use error rate in internal cross-validation
      • getUseErrorRate

        public boolean getUseErrorRate()
        Get if use error rate in internal cross-validation.
        Returns:
        if use error rate in internal cross-validation.
      • useOneSETipText

        public java.lang.String useOneSETipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setUseOneSE

        public void setUseOneSE​(boolean value)
        Set if use the 1SE rule to choose final model.
        Parameters:
        value - if use the 1SE rule to choose final model
      • getUseOneSE

        public boolean getUseOneSE()
        Get if use the 1SE rule to choose final model.
        Returns:
        if use the 1SE rule to choose final model
      • sizePerTipText

        public java.lang.String sizePerTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setSizePer

        public void setSizePer​(double value)
        Set training set size.
        Parameters:
        value - training set size
      • getSizePer

        public double getSizePer()
        Get training set size.
        Returns:
        training set size
      • main

        public static void main​(java.lang.String[] args)
        Main method.
        Parameters:
        args - the options for the classifier