Class PaceRegression

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

    public class PaceRegression
    extends Classifier
    implements OptionHandler, WeightedInstancesHandler, TechnicalInformationHandler
    Class for building pace regression linear models and using them for prediction.

    Under regularity conditions, pace regression is provably optimal when the number of coefficients tends to infinity. It consists of a group of estimators that are either overall optimal or optimal under certain conditions.

    The current work of the pace regression theory, and therefore also this implementation, do not handle:

    - missing values
    - non-binary nominal attributes
    - the case that n - k is small where n is the number of instances and k is the number of coefficients (the threshold used in this implmentation is 20)

    For more information see:

    Wang, Y (2000). A new approach to fitting linear models in high dimensional spaces. Hamilton, New Zealand.

    Wang, Y., Witten, I. H.: Modeling for optimal probability prediction. In: Proceedings of the Nineteenth International Conference in Machine Learning, Sydney, Australia, 650-657, 2002.

    BibTeX:

     @phdthesis{Wang2000,
        address = {Hamilton, New Zealand},
        author = {Wang, Y},
        school = {Department of Computer Science, University of Waikato},
        title = {A new approach to fitting linear models in high dimensional spaces},
        year = {2000}
     }
     
     @inproceedings{Wang2002,
        address = {Sydney, Australia},
        author = {Wang, Y. and Witten, I. H.},
        booktitle = {Proceedings of the Nineteenth International Conference in Machine Learning},
        pages = {650-657},
        title = {Modeling for optimal probability prediction},
        year = {2002}
     }
     

    Valid options are:

     -D
      Produce debugging output.
      (default no debugging output)
     -E <estimator>
      The estimator can be one of the following:
       eb -- Empirical Bayes estimator for noraml mixture (default)
       nested -- Optimal nested model selector for normal mixture
       subset -- Optimal subset selector for normal mixture
       pace2 -- PACE2 for Chi-square mixture
       pace4 -- PACE4 for Chi-square mixture
       pace6 -- PACE6 for Chi-square mixture
     
       ols -- Ordinary least squares estimator
       aic -- AIC estimator
       bic -- BIC estimator
       ric -- RIC estimator
       olsc -- Ordinary least squares subset selector with a threshold
     -S <threshold value>
      Threshold value for the OLSC estimator
    Version:
    $Revision: 5523 $
    Author:
    Yong Wang (yongwang@cs.waikato.ac.nz), Gabi Schmidberger (gabi@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Detail

      • TAGS_ESTIMATOR

        public static final Tag[] TAGS_ESTIMATOR
        estimator types
    • Constructor Detail

      • PaceRegression

        public PaceRegression()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this classifier
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Builds a pace regression model for the given data.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - the training data to be used for generating the linear regression function
        Throws:
        java.lang.Exception - if the classifier could not be built successfully
      • checkForMissing

        public boolean checkForMissing​(Instance instance,
                                       Instances model)
        Checks if an instance has a missing value.
        Parameters:
        instance - the instance
        model - the data
        Returns:
        true if missing value is present
      • classifyInstance

        public double classifyInstance​(Instance instance)
                                throws java.lang.Exception
        Classifies the given instance using the linear regression function.
        Overrides:
        classifyInstance in class Classifier
        Parameters:
        instance - the test instance
        Returns:
        the classification
        Throws:
        java.lang.Exception - if classification can't be done successfully
      • toString

        public java.lang.String toString()
        Outputs the linear regression model as a string.
        Overrides:
        toString in class java.lang.Object
        Returns:
        the model as string
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class Classifier
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -D
          Produce debugging output.
          (default no debugging output)
         -E <estimator>
          The estimator can be one of the following:
           eb -- Empirical Bayes estimator for noraml mixture (default)
           nested -- Optimal nested model selector for normal mixture
           subset -- Optimal subset selector for normal mixture
           pace2 -- PACE2 for Chi-square mixture
           pace4 -- PACE4 for Chi-square mixture
           pace6 -- PACE6 for Chi-square mixture
         
           ols -- Ordinary least squares estimator
           aic -- AIC estimator
           bic -- BIC estimator
           ric -- RIC estimator
           olsc -- Ordinary least squares subset selector with a threshold
         -S <threshold value>
          Threshold value for the OLSC estimator
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class Classifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • coefficients

        public double[] coefficients()
        Returns the coefficients for this linear model.
        Returns:
        the coefficients for this linear model
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of the classifier.
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class Classifier
        Returns:
        an array of strings suitable for passing to setOptions
      • numParameters

        public int numParameters()
        Get the number of coefficients used in the model
        Returns:
        the number of coefficients
      • debugTipText

        public java.lang.String debugTipText()
        Returns the tip text for this property
        Overrides:
        debugTipText in class Classifier
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebug

        public void setDebug​(boolean debug)
        Controls whether debugging output will be printed
        Overrides:
        setDebug in class Classifier
        Parameters:
        debug - true if debugging output should be printed
      • getDebug

        public boolean getDebug()
        Controls whether debugging output will be printed
        Overrides:
        getDebug in class Classifier
        Returns:
        true if debugging output should be printed
      • estimatorTipText

        public java.lang.String estimatorTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getEstimator

        public SelectedTag getEstimator()
        Gets the estimator
        Returns:
        the estimator
      • setEstimator

        public void setEstimator​(SelectedTag estimator)
        Sets the estimator.
        Parameters:
        estimator - the new estimator
      • thresholdTipText

        public java.lang.String thresholdTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setThreshold

        public void setThreshold​(double newThreshold)
        Set threshold for the olsc estimator
        Parameters:
        newThreshold - the threshold for the olsc estimator
      • getThreshold

        public double getThreshold()
        Gets the threshold for olsc estimator
        Returns:
        the threshold
      • main

        public static void main​(java.lang.String[] argv)
        Generates a linear regression function predictor.
        Parameters:
        argv - the options