Please share Gemini tokenize information

hello.

Thank you so much for the recently announced Gemini-Pro API availability.

We use a lot of APIs, and in the case of OpenAI, we expose cl100k_base so that users can pre-calculate the number of tokens and avoid API errors.

But Gemini-Pro doesn't know anything about the token, so it has to rely on the character count. 😅

Is it possible to share token information like OpenAI's tiktoken?

Thank you for creating good model. 😃

 
Solved Solved
2 1 1,298
1 ACCEPTED SOLUTION

With the Vertex AI SDK (python)  -- we compute the number of tokens (and characters)  like:

Token Count Docs 
It looks like this:
from vertexai.preview.generative_models import GenerativeModel
gemini_pro_model = GenerativeModel("gemini-pro")
print(gemini_pro_model.count_tokens("why is sky blue?"))
I do miss having a local implementation that we could use like tiktoken, and it will be greater if it exists (I am not aware of it) ---
I hope it helps. 

View solution in original post

1 REPLY 1

With the Vertex AI SDK (python)  -- we compute the number of tokens (and characters)  like:

Token Count Docs 
It looks like this:
from vertexai.preview.generative_models import GenerativeModel
gemini_pro_model = GenerativeModel("gemini-pro")
print(gemini_pro_model.count_tokens("why is sky blue?"))
I do miss having a local implementation that we could use like tiktoken, and it will be greater if it exists (I am not aware of it) ---
I hope it helps.