Class Tokenizer

java.lang.Object
com.codename1.ai.Tokenizer

public final class Tokenizer extends Object

Rough best-effort token counting. Useful for the common case of "am I likely to exceed this model's context window?" without shipping the full BPE table (cl100k_base is ~1.7 MB which is substantial for a mobile binary).

The rule of thumb is 1 token ~= 4 characters of English text, which holds within ~10-15% for typical chat traffic. For non-Latin scripts the ratio is closer to 1:1, so we clamp the lower bound at the rough number of words. Apps that need exact accounting should fetch a usage value from the API response and adjust their budget.

  • Method Details

    • estimate

      public static int estimate(String text)
      Approximate token count for text.
    • estimateMessages

      public static int estimateMessages(List<ChatMessage> messages)
      Estimate the prompt-tokens cost of an entire conversation. Adds a small fixed overhead per message to approximate the role / formatting tokens the provider includes.