Solved: Can we please get an offline token-counter so RAG ... - Page 2

cndg · 03-20-2025 05:54 PM

I'm quite proud of my "chunker" for my custom RAG, which has some elegant recursive mechanisms based on tiktoken:

` tokenizer = tiktoken.encoding_for_model("text-embedding-3-large") `

I'm wanting to upgrade to gemini-embedding-exp-03-07 but there's no way to count tokens without the 1000000x slowdown of online API calls involved.

Is there an official local-lib we can use which is guaranteed to count tokens properly which we can use, without exhausting our API limits and slowing-down all our code unnecessarily?

I'm asking, not just for myself, but for everyone who seriously wants to try and work with these models: "hello world" simple examples where token-counts are basically ignored are nice and all, but a production environment is going to depend on the existence of robust tools we can use... not actually having/releasing those things makes all these Gemini-* models a "non starter" for serious business use-cases...

Specifically - the "location=location" requirement of the lib's count_tokens method needs to be removed (or the entire `vertexai.init(project=project_id, location=location)` )

```

from vertexai.generative_models import GenerativeModel, Partimport vertexai
def count_tokens(project_id: str, location: str, model_name: str, prompt: str):
    """Counts the number of tokens in the given text."""
    vertexai.init(project=project_id, location=location)
    model = GenerativeModel(model_name)
    response = model.count_tokens(Part.from_text(prompt))
    return response.total_tokens

cndg

Here is (or should be) the answer:

View solution in original post

Can we please get an offline token-counter so RAG chunkers can work reliably w/ embedding-exp-03-07