Package org.carrot2.text.suffixtree
Class SuffixTree
java.lang.Object
org.carrot2.text.suffixtree.SuffixTree
Builds a suffix tree (or generalized suffix tree) on a sequence of any integers (or objects that
can be represented as unique integers). A direct implementation of Esko Ukkonen's algorithm, but
optimized for Java to use primitive data types instead of objects (or boxed types).
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
Progress callback is invoked when iterating forward through the input sequence elements.static interface
A callback invoked when new states are added to the tree.static interface
Visitor interface for traversals.static class
Empty implementation recursively walking the entire suffix tree. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Marker for the state's last edge intransitions
. -
Constructor Summary
ConstructorsConstructorDescriptionSuffixTree
(Sequence sequence, SuffixTree.IStateCallback newStateCallback, SuffixTree.IProgressCallback progressCallback) Build a suffix tree for a given input sequence of symbols. -
Method Summary
Modifier and TypeMethodDescriptionboolean
containsSuffix
(Sequence seq) final int
findEdge
(int state, int symbol) Find a transition from statestate
, labeled with a given symbol.final int
firstEdge
(int state) Returns the index of the first edge from a given state orNO_EDGE
if a given state has no edges.int
getEndIndex
(int edge) Returns the edge label's end index (inclusive).int
For procedural traversals (not visitors).int
getStartIndex
(int edge) Returns the edge label's start index (inclusive).final int
int
getToState
(int edge) Returns the target state for a given edge.final int
final boolean
isLeaf
(int state) Check ifstate
is a leaf (has no outgoing edges).final int
nextEdge
(int edge) Returns the index of the next edge (sibling) orNO_EDGE
ifedge
is the last edge in its state.final void
visit
(SuffixTree.IVisitor visitor) Walks the states and edges of the suffix tree, depth-first.final void
visitState
(int state, SuffixTree.IVisitor visitor) Start visiting from a given state.
-
Field Details
-
NO_EDGE
public static final int NO_EDGEMarker for the state's last edge intransitions
.- See Also:
-
-
Constructor Details
-
SuffixTree
public SuffixTree(Sequence sequence, SuffixTree.IStateCallback newStateCallback, SuffixTree.IProgressCallback progressCallback) Build a suffix tree for a given input sequence of symbols.
-
-
Method Details
-
getTransitionsCount
public final int getTransitionsCount()- Returns:
- Return the number of transitions (edges) in the tree.
-
getStatesCount
public final int getStatesCount()- Returns:
- Return the number of states in the tree.
-
containsSuffix
- Returns:
true
if this suffix tree has a path from the root state to a leaf state corresponding to a given sequence of objects. This indicates the input sequence had a suffix identical tosequence
.
-
visit
Walks the states and edges of the suffix tree, depth-first. -
visitState
Start visiting from a given state. -
getRootState
public int getRootState()For procedural traversals (not visitors). -
isLeaf
public final boolean isLeaf(int state) Check ifstate
is a leaf (has no outgoing edges). -
firstEdge
public final int firstEdge(int state) Returns the index of the first edge from a given state orNO_EDGE
if a given state has no edges. Does not perform any sanity check on the input state. -
nextEdge
public final int nextEdge(int edge) Returns the index of the next edge (sibling) orNO_EDGE
ifedge
is the last edge in its state. -
findEdge
public final int findEdge(int state, int symbol) Find a transition from statestate
, labeled with a given symbol.NO_EDGE
is returned if there is no such edge. -
getToState
public int getToState(int edge) Returns the target state for a given edge. -
getStartIndex
public int getStartIndex(int edge) Returns the edge label's start index (inclusive). -
getEndIndex
public int getEndIndex(int edge) Returns the edge label's end index (inclusive).
-