Class PreprocessingContext

java.lang.Object
org.carrot2.text.preprocessing.PreprocessingContext
All Implemented Interfaces:
Closeable, AutoCloseable

public final class PreprocessingContext extends Object implements Closeable
Document preprocessing context provides low-level (usually integer-coded) data structures useful for further processing.

Internals of PreprocessingContext

  • Field Details

    • languageComponents

      public final LanguageComponents languageComponents
      Language model to be used
    • documentCount

      public int documentCount
      Count of documents processed by the tokenizer.
    • allTokens

      public final PreprocessingContext.AllTokens allTokens
      Information about all tokens of the input documents.
    • allFields

      public final PreprocessingContext.AllFields allFields
      Information about all fields processed for the input documents.
    • allWords

      public final PreprocessingContext.AllWords allWords
      Information about all unique words found in the input documents.
    • allStems

      public final PreprocessingContext.AllStems allStems
      Information about all unique stems found in the input documents.
    • allPhrases

      Information about all frequently appearing sequences of words found in the input documents.
    • allLabels

      public final PreprocessingContext.AllLabels allLabels
      Information about words and phrases that might be good cluster label candidates.
  • Constructor Details

    • PreprocessingContext

      public PreprocessingContext(LanguageComponents languageComponents)
      Creates a preprocessing context for the provided documents and with the provided languageModel.
  • Method Details

    • hasWords

      public boolean hasWords()
      Returns true if this context contains any words.
    • hasLabels

      public boolean hasLabels()
      Returns true if this context contains any label candidates.
    • format

      public String format(LabelFormatter formatter, int featureIndex)
      Applies label formatter to a given word or phrase (depending on the feature index provided).
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • toFieldIndexes

      public static int[] toFieldIndexes(byte b)
      Convert the selected bits in a byte to an array of indexes.
    • close

      public void close()
      This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • intern

      public char[] intern(MutableCharArray chs)
      Return a unique char buffer representing a given character sequence.