I'm quite proud of my "chunker" for my custom RAG, which has some elegant recursive mechanisms based on tiktoken:
` tokenizer = tiktoken.encoding_for_model("text-embedding-3-large") `
I'm wanting to upgrade to gemini-embedding-exp-03-07 but there's no way to count tokens without the 1000000x slowdown of online API calls involved.
Is there an official local-lib we can use which is guaranteed to count tokens properly which we can use, without exhausting our API limits and slowing-down all our code unnecessarily?
I'm asking, not just for myself, but for everyone who seriously wants to try and work with these models: "hello world" simple examples where token-counts are basically ignored are nice and all, but a production environment is going to depend on the existence of robust tools we can use... not actually having/releasing those things makes all these Gemini-* models a "non starter" for serious business use-cases...
Specifically - the "location=location" requirement of the lib's count_tokens method needs to be removed (or the entire `vertexai.init(project=project_id, location=location)` )
```
from vertexai.generative_models import GenerativeModel, Partimport vertexai def count_tokens(project_id: str, location: str, model_name: str, prompt: str): """Counts the number of tokens in the given text.""" vertexai.init(project=project_id, location=location) model = GenerativeModel(model_name) response = model.count_tokens(Part.from_text(prompt)) return response.total_tokens
Solved! Go to Solution.
Here is (or should be) the answer:
Hi @cndg,
Welcome to Google Cloud Community!
There's currently no offline token counting method for Gemini embeddings like gemini-embedding-exp-03-07, making precise chunking for RAG applications difficult.
Here's what you can do:
Alternatively, you can submit a feature request so that our Engineering Team can help you further. Please note that I cannot specify when this enhancement will be implemented. For future updates, I recommend monitoring the tracker and release notes regularly.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Looks like a new offline feature to do this has just been released:
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/list-token
Here is (or should be) the answer:
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |