Image: Eloquent.chat

Share

What are tokens?

By Jerom Kok

15-12-2023 11:40

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

  • 1 token ~= 4 chars in English

  • 1 token ~= ¾ words

  • 100 tokens ~= 75 words

    Or

  • 1-2 sentence ~= 30 tokens

  • 1 paragraph ~= 100 tokens

  • 1,500 words ~= 2048 tokens

How words are split into tokens is also language-dependent. For example ‘Cómo estás’ (‘How are you’ in Spanish) contains 5 tokens (for 10 chars). The higher token-to-char ratio can make it more expensive to implement Eloquent for languages other than English. To further explore tokenization, you can use the interactive Tokenizer tool by OpenAI, which allows you to calculate the number of tokens and see how text is broken into tokens.

Ready for your own AI agent?

Try Eloquent for free today or contact us for a live demo.

Start for free