Count tokens
count_tokens(text, model_name='cl100k_base')
Count the number of tokens in the text
References: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
Encoding name | OpenAI models |
---|---|
cl100k_base | gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large |
p50k_base | Codex models, text-davinci-002, text-davinci-003 |
r50k_base (or gpt2) | GPT-3 models like davinci |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to count the tokens |
required |
model_name |
str
|
The model name to use for tokenization. Default is "cl100k_base" |
'cl100k_base'
|
Returns:
Name | Type | Description |
---|---|---|
total_token |
int
|
The number of tokens in the text |
Source code in Docs2KG/utils/llm/count_tokens.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|