Package weka.clusterers
Class XMeans
- java.lang.Object
-
- weka.clusterers.AbstractClusterer
-
- weka.clusterers.RandomizableClusterer
-
- weka.clusterers.XMeans
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
,Clusterer
,CapabilitiesHandler
,OptionHandler
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
public class XMeans extends RandomizableClusterer implements TechnicalInformationHandler
Cluster data using the X-means algorithm.
X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.
For more information see:
Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000. BibTeX:@inproceedings{Pelleg2000, author = {Dan Pelleg and Andrew W. Moore}, booktitle = {Seventeenth International Conference on Machine Learning}, pages = {727-734}, publisher = {Morgan Kaufmann}, title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters}, year = {2000} }
Valid options are:-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
- Version:
- $Revision: 9986 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
- See Also:
RandomizableClusterer
, Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
D_CONVCHCLOSER
have a closer look at converge children.static int
D_CURR
for current debug.static int
D_FOLLOWSPLIT
follows the splitting of the centers.static int
D_GENERAL
general debugging.static int
D_ITERCOUNT
follow iterations.static int
D_KDTREE
check on kdtree.static int
D_METH_MISUSE
functions were maybe misused.static int
D_PRINTCENTERS
print the centers.static int
D_RANDOMVECTOR
check on random vectors.boolean
m_CurrDebugFlag
Flag: I'm debugging.static int
R_HIGH
Index in ranges for HIGH.static int
R_LOW
Index in ranges for LOW.static int
R_WIDTH
Index in ranges for WIDTH.
-
Constructor Summary
Constructors Constructor Description XMeans()
the default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
binValueTipText()
Returns the tip text for this property.void
buildClusterer(Instances data)
Generates the X-Means clusterer.boolean
checkForNominalAttributes(Instances data)
Checks for nominal attributes in the dataset.int
clusterInstance(Instance instance)
Classifies a given instance.java.lang.String
cutOffFactorTipText()
Returns the tip text for this property.java.lang.String
debugLevelTipText()
Returns the tip text for this property.java.lang.String
debugVectorsFileTipText()
Returns the tip text for this property.java.lang.String
distanceFTipText()
Returns the tip text for this property.double
getBinValue()
Gets value that represents true in a new numeric attribute.Capabilities
getCapabilities()
Returns default capabilities of the clusterer.Instances
getClusterCenters()
Return the centers of the clusters as an Instances objectdouble
getCutOffFactor()
Gets the cutoff factor.int
getDebugLevel()
Gets the debug level.java.io.File
getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored.DistanceFunction
getDistanceF()
Gets the distance function.java.io.File
getInputCenterFile()
Gets the file to read the list of centers from.KDTree
getKDTree()
Gets the KDTree class.int
getMaxIterations()
Gets the maximum number of iterations.int
getMaxKMeans()
Gets the maximum number of iterations in KMeans.int
getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans.int
getMaxNumClusters()
Gets the maximum number of clusters to generate.int
getMinNumClusters()
Gets the minimum number of clusters to generate.Instance
getNextDebugVectorsInstance(Instances model)
Read an instance from debug vectors file.java.lang.String[]
getOptions()
Gets the current settings of SimpleKMeans.java.io.File
getOutputCenterFile()
Gets the file to write the list of centers to.java.lang.String
getRevision()
Returns the revision string.TechnicalInformation
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
getUseKDTree()
Gets whether the KDTree is used or not.java.lang.String
globalInfo()
Returns a string describing this clusterer.void
initDebugVectorsInput()
Initialises the debug vector input.java.lang.String
inputCenterFileTipText()
Returns the tip text for this property.java.lang.String
KDTreeTipText()
Returns the tip text for this property.java.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] argv)
Main method for testing this class.java.lang.String
maxIterationsTipText()
Returns the tip text for this property.java.lang.String
maxKMeansForChildrenTipText()
Returns the tip text for this property.java.lang.String
maxKMeansTipText()
Returns the tip text for this property.java.lang.String
maxNumClustersTipText()
Returns the tip text for this property.java.lang.String
minNumClustersTipText()
Returns the tip text for this property.int
numberOfClusters()
Returns the number of clusters.java.lang.String
outputCenterFileTipText()
Returns the tip text for this property.void
setBinValue(double value)
Sets the distance value between true and false of binary attributes.void
setCutOffFactor(double i)
Sets a new cutoff factor.void
setDebugLevel(int d)
Sets the debug level.void
setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored.void
setDistanceF(DistanceFunction distanceF)
gets the "binary" distance value.void
setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from.void
setKDTree(KDTree k)
Sets the KDTree class.void
setMaxIterations(int i)
Sets the maximum number of iterations to perform.void
setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans.void
setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed on the child centers.void
setMaxNumClusters(int n)
Sets the maximum number of clusters to generate.void
setMinNumClusters(int n)
Sets the minimum number of clusters to generate.void
setOptions(java.lang.String[] options)
Parses a given list of options.void
setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to.void
setUseKDTree(boolean value)
Sets whether to use the KDTree or not.java.lang.String
toString()
Return a string describing this clusterer.java.lang.String
useKDTreeTipText()
Returns the tip text for this property.-
Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
-
Methods inherited from class weka.clusterers.AbstractClusterer
distributionForInstance, forName, makeCopies, makeCopy
-
-
-
-
Field Detail
-
R_LOW
public static int R_LOW
Index in ranges for LOW.
-
R_HIGH
public static int R_HIGH
Index in ranges for HIGH.
-
R_WIDTH
public static int R_WIDTH
Index in ranges for WIDTH.
-
D_PRINTCENTERS
public static int D_PRINTCENTERS
print the centers.
-
D_FOLLOWSPLIT
public static int D_FOLLOWSPLIT
follows the splitting of the centers.
-
D_CONVCHCLOSER
public static int D_CONVCHCLOSER
have a closer look at converge children.
-
D_RANDOMVECTOR
public static int D_RANDOMVECTOR
check on random vectors.
-
D_KDTREE
public static int D_KDTREE
check on kdtree.
-
D_ITERCOUNT
public static int D_ITERCOUNT
follow iterations.
-
D_METH_MISUSE
public static int D_METH_MISUSE
functions were maybe misused.
-
D_CURR
public static int D_CURR
for current debug.
-
D_GENERAL
public static int D_GENERAL
general debugging.
-
m_CurrDebugFlag
public boolean m_CurrDebugFlag
Flag: I'm debugging.
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this clusterer.- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
getCapabilities
public Capabilities getCapabilities()
Returns default capabilities of the clusterer.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Specified by:
getCapabilities
in interfaceClusterer
- Overrides:
getCapabilities
in classAbstractClusterer
- Returns:
- the capabilities of this clusterer
- See Also:
Capabilities
-
buildClusterer
public void buildClusterer(Instances data) throws java.lang.Exception
Generates the X-Means clusterer.- Specified by:
buildClusterer
in interfaceClusterer
- Specified by:
buildClusterer
in classAbstractClusterer
- Parameters:
data
- set of instances serving as training data- Throws:
java.lang.Exception
- if the clusterer has not been generated successfully
-
checkForNominalAttributes
public boolean checkForNominalAttributes(Instances data)
Checks for nominal attributes in the dataset. Class attribute is ignored.- Parameters:
data
- the data to check- Returns:
- false if no nominal attributes are present
-
clusterInstance
public int clusterInstance(Instance instance) throws java.lang.Exception
Classifies a given instance.- Specified by:
clusterInstance
in interfaceClusterer
- Overrides:
clusterInstance
in classAbstractClusterer
- Parameters:
instance
- the instance to be assigned to a cluster- Returns:
- the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
- Throws:
java.lang.Exception
- if instance could not be classified successfully
-
numberOfClusters
public int numberOfClusters()
Returns the number of clusters.- Specified by:
numberOfClusters
in interfaceClusterer
- Specified by:
numberOfClusters
in classAbstractClusterer
- Returns:
- the number of clusters generated for a training dataset.
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableClusterer
- Returns:
- an enumeration of all the available options
-
minNumClustersTipText
public java.lang.String minNumClustersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMinNumClusters
public void setMinNumClusters(int n)
Sets the minimum number of clusters to generate.- Parameters:
n
- the minimum number of clusters to generate
-
getMinNumClusters
public int getMinNumClusters()
Gets the minimum number of clusters to generate.- Returns:
- the minimum number of clusters to generate
-
maxNumClustersTipText
public java.lang.String maxNumClustersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxNumClusters
public void setMaxNumClusters(int n)
Sets the maximum number of clusters to generate.- Parameters:
n
- the maximum number of clusters to generate
-
getMaxNumClusters
public int getMaxNumClusters()
Gets the maximum number of clusters to generate.- Returns:
- the maximum number of clusters to generate
-
maxIterationsTipText
public java.lang.String maxIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxIterations
public void setMaxIterations(int i) throws java.lang.Exception
Sets the maximum number of iterations to perform.- Parameters:
i
- the number of iterations- Throws:
java.lang.Exception
- if i is less than 1
-
getMaxIterations
public int getMaxIterations()
Gets the maximum number of iterations.- Returns:
- the number of iterations
-
maxKMeansTipText
public java.lang.String maxKMeansTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxKMeans
public void setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans.- Parameters:
i
- the number of iterations
-
getMaxKMeans
public int getMaxKMeans()
Gets the maximum number of iterations in KMeans.- Returns:
- the number of iterations
-
maxKMeansForChildrenTipText
public java.lang.String maxKMeansForChildrenTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxKMeansForChildren
public void setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed on the child centers.- Parameters:
i
- the number of iterations
-
getMaxKMeansForChildren
public int getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans.- Returns:
- the number of iterations
-
cutOffFactorTipText
public java.lang.String cutOffFactorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property
-
setCutOffFactor
public void setCutOffFactor(double i)
Sets a new cutoff factor.- Parameters:
i
- the new cutoff factor
-
getCutOffFactor
public double getCutOffFactor()
Gets the cutoff factor.- Returns:
- the cutoff factor
-
binValueTipText
public java.lang.String binValueTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBinValue
public double getBinValue()
Gets value that represents true in a new numeric attribute. (False is always represented by 0.0.)- Returns:
- the value that represents true in a new numeric attribute
-
setBinValue
public void setBinValue(double value)
Sets the distance value between true and false of binary attributes. and "same" and "different" of nominal attributes- Parameters:
value
- the distance
-
distanceFTipText
public java.lang.String distanceFTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDistanceF
public void setDistanceF(DistanceFunction distanceF)
gets the "binary" distance value.- Parameters:
distanceF
- the distance function with all options set
-
getDistanceF
public DistanceFunction getDistanceF()
Gets the distance function.- Returns:
- the distance function
-
debugVectorsFileTipText
public java.lang.String debugVectorsFileTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebugVectorsFile
public void setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored. Only used for debugging reasons.- Parameters:
value
- the file to read the random vectors from
-
getDebugVectorsFile
public java.io.File getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored. Only used for debugging purposes.- Returns:
- the file to read the vectors from
-
initDebugVectorsInput
public void initDebugVectorsInput() throws java.lang.Exception
Initialises the debug vector input.- Throws:
java.lang.Exception
- if there is error opening the debug input file.
-
getNextDebugVectorsInstance
public Instance getNextDebugVectorsInstance(Instances model) throws java.lang.Exception
Read an instance from debug vectors file.- Parameters:
model
- the data model for the instance.- Returns:
- the next debug vector.
- Throws:
java.lang.Exception
- if there are no debug vector in m_DebugVectors.
-
inputCenterFileTipText
public java.lang.String inputCenterFileTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setInputCenterFile
public void setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from.- Parameters:
value
- the file to read centers from
-
getInputCenterFile
public java.io.File getInputCenterFile()
Gets the file to read the list of centers from.- Returns:
- the file to read the centers from
-
outputCenterFileTipText
public java.lang.String outputCenterFileTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputCenterFile
public void setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to.- Parameters:
value
- file to write centers to
-
getOutputCenterFile
public java.io.File getOutputCenterFile()
Gets the file to write the list of centers to.- Returns:
- filename of the file to write centers to
-
KDTreeTipText
public java.lang.String KDTreeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setKDTree
public void setKDTree(KDTree k)
Sets the KDTree class.- Parameters:
k
- a KDTree object with all options set
-
getKDTree
public KDTree getKDTree()
Gets the KDTree class.- Returns:
- the configured KDTree
-
useKDTreeTipText
public java.lang.String useKDTreeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setUseKDTree
public void setUseKDTree(boolean value)
Sets whether to use the KDTree or not.- Parameters:
value
- if true the KDTree is used
-
getUseKDTree
public boolean getUseKDTree()
Gets whether the KDTree is used or not.- Returns:
- true if KDTrees are used
-
debugLevelTipText
public java.lang.String debugLevelTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebugLevel
public void setDebugLevel(int d)
Sets the debug level. debug level = 0, means no output- Parameters:
d
- debuglevel
-
getDebugLevel
public int getDebugLevel()
Gets the debug level.- Returns:
- debug level
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a given list of options. Valid options are:-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableClusterer
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of SimpleKMeans.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableClusterer
- Returns:
- an array of strings suitable for passing to setOptions
-
toString
public java.lang.String toString()
Return a string describing this clusterer.- Overrides:
toString
in classjava.lang.Object
- Returns:
- a description of the clusterer as a string
-
getClusterCenters
public Instances getClusterCenters()
Return the centers of the clusters as an Instances object- Returns:
- the cluster centers.
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractClusterer
- Returns:
- the revision
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class.- Parameters:
argv
- should contain options
-
-