com.codename1.ai.Tokenizer

public final class Tokenizer extends Object

Rough best-effort token counting. Useful for the common case of "am I likely to exceed this model's context window?" without shipping the full BPE table (cl100k_base is ~1.7 MB which is substantial for a mobile binary).

The rule of thumb is 1 token ~= 4 characters of English text, which holds within ~10-15% for typical chat traffic. For non-Latin scripts the ratio is closer to 1:1, so we clamp the lower bound at the rough number of words. Apps that need exact accounting should fetch a usage value from the API response and adjust their budget.

Method Summary

Modifier and Type

Method

Description

static int

estimate(String text)

Approximate token count for text.

static int

estimateMessages(List<ChatMessage> messages)

Estimate the prompt-tokens cost of an entire conversation.

Methods inherited from class Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- estimate
  
  public static int estimate(String text)
  
  Approximate token count for text.
- estimateMessages
  
  public static int estimateMessages(List<ChatMessage> messages)
  
  Estimate the prompt-tokens cost of an entire conversation. Adds a small fixed overhead per message to approximate the role / formatting tokens the provider includes.

Class Tokenizer

Method Summary

Methods inherited from class Object

Method Details

estimate

estimateMessages