Support Needed: Discrepancy in Vertex AI Context C...

lovee93 · 03-07-2025 07:29 PM

Hi team,

I've been experimenting with Vertex AI function calling and the preview feature, context caching. While following the documentation and Python examples (codelabs), everything seems to work as expected. However, when implementing the same in Node.js, I’ve noticed a major discrepancy, particularly with context caching.

When using the example PDF URIs from the codelab, the token count in Python meets the expected minimum, but in Node.js, the token count is significantly lower. Am I missing something here? I am also attaching some screenshots for the reference
From the documentation, my understanding is that context caching should improve response times by retrieving data from cache instead of making external API calls. However, in my tests, I am seeing longer response times instead of improvements. Is my assumption incorrect?

Looking forward to your thoughts!

ibaui

Hi @lovee93,

Welcome to Google Cloud Community!

The cached_content.create call fails in Node.js because the calculated token count falls below the required minimum for context caching. This is likely caused by a discrepancy in token counts between Python and Node.js, potentially due to differing tokenization algorithms in their respective Vertex AI SDKs.

Here are some potential reasons and suggestions you might consider to address the issue:

Incorrect Content Handling (Node.js): Check for incorrect handling of PDF content encoding in the Node.js client. Also, be aware of potential inconsistencies in PDF parsing (either by Vertex AI's internal library during download/processing or by pre-processing, if applicable), as these can both contribute to inaccurate tokenization.
API Version/SDK Differences: Ensure you're using the latest Vertex AI Node.js SDK, as older versions may contain bugs or inconsistencies that can exhibit differences between Python and Node.js.
Reproducibility and Isolation: To facilitate debugging, create a minimal Node.js example using a single, simple PDF URI. Moreover, ensure that both your Node.js and Python environments are as consistent as possible, sharing factors like operating system and network configuration.

You can also refer to the following documents for more details:

Vertex AI Generative AI Overview: Starting point for understanding Vertex AI's generative capabilities. While it doesn't specifically address the Node.js vs. Python issue, it sets the context.
Gemini API Overview
Using Context Caching
Token Counting
Google Cloud Node.js Client Libraries

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

lovee93

Thank you so much for replying and suggesting possible things to try on. I am currently using the latest version of the sdk which is 1.9.3 and have tried with a minimal example. The fact that is confusing me is that when I just use 1 PDF file, the token count is more while when I use 2 PDF files, the token count is less. Attaching screenshots for your reference:

Testing with 1 PDF file, the cached content is of 23258 tokens.

Testing with 2 PDF files, the cached content is of 19904 tokens

Here's the repository with this example: https://github.com/Lovee93/context-caching-bug

Thank you!

Support Needed: Discrepancy in Vertex AI Context Caching (Node.js vs. Python)