Support Needed: Discrepancy in Vertex AI Context Caching (Node.js vs. Python)

Hi team,

I've been experimenting with Vertex AI function calling and the preview feature, context caching. While following the documentation and Python examples (codelabs), everything seems to work as expected. However, when implementing the same in Node.js, I’ve noticed a major discrepancy, particularly with context caching.

When using the example PDF URIs from the codelab, the token count in Python meets the expected minimum, but in Node.js, the token count is significantly lower. Am I missing something here? I am also attaching some screenshots for the reference
From the documentation, my understanding is that context caching should improve response times by retrieving data from cache instead of making external API calls. However, in my tests, I am seeing longer response times instead of improvements. Is my assumption incorrect?

Looking forward to your thoughts!

0 2 196

2 REPLIES 2

never-displayed