Package weka.core.tokenizers
Class WordTokenizer
- java.lang.Object
-
- weka.core.tokenizers.Tokenizer
-
- weka.core.tokenizers.CharacterDelimitedTokenizer
-
- weka.core.tokenizers.WordTokenizer
-
- All Implemented Interfaces:
java.io.Serializable
,java.util.Enumeration
,OptionHandler
,RevisionHandler
public class WordTokenizer extends CharacterDelimitedTokenizer
A simple tokenizer that is using the java.util.StringTokenizer class to tokenize the strings. Valid options are:-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
- Version:
- $Revision: 1.4 $
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description WordTokenizer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
getRevision()
Returns the revision string.java.lang.String
globalInfo()
Returns a string describing the stemmerboolean
hasMoreElements()
Tests if this enumeration contains more elements.static void
main(java.lang.String[] args)
Runs the tokenizer with the given options and strings to tokenize.java.lang.Object
nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.void
tokenize(java.lang.String s)
Sets the string to tokenize.-
Methods inherited from class weka.core.tokenizers.CharacterDelimitedTokenizer
delimitersTipText, getDelimiters, getOptions, listOptions, setDelimiters, setOptions
-
Methods inherited from class weka.core.tokenizers.Tokenizer
runTokenizer, tokenize
-
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing the stemmer- Specified by:
globalInfo
in classTokenizer
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
hasMoreElements
public boolean hasMoreElements()
Tests if this enumeration contains more elements.- Specified by:
hasMoreElements
in interfacejava.util.Enumeration
- Specified by:
hasMoreElements
in classTokenizer
- Returns:
- true if and only if this enumeration object contains at least one more element to provide; false otherwise.
-
nextElement
public java.lang.Object nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.- Specified by:
nextElement
in interfacejava.util.Enumeration
- Specified by:
nextElement
in classTokenizer
- Returns:
- the next element of this enumeration.
-
tokenize
public void tokenize(java.lang.String s)
Sets the string to tokenize. Tokenization happens immediately.
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.- Parameters:
args
- the commandline options and strings to tokenize
-
-