dear support,
I have a vertex AI project and I use it to predict answers, I gave the model a prompt yesterday, and it was generating the required number of tokens,Today I gave the model the same prompt, and it decreased this number of tokens to the quarters, it was the exactly the same prompt, Temperature is 0 and top p is 0. and top k is 40 while the max output tokens is equal to 8190. I want to know the reason of the shrinkage in the number of tokens generated although I didn't change anything related to the prediction parameters
There could be a few reasons for the reduction in the number of tokens generated despite using the same prompt and prediction parameters:
Sometimes, AI models get updated or fine-tuned, which can lead to changes in how they respond to certain prompts. The underlying model might have been modified since your last use, impacting its token generation. Even with the same parameters, the model's internal sampling methods might lead to variations in response length. The model might have chosen different strategies for token selection.
AI platforms often have safeguards to prevent excessively long responses. The platform might enforce token limits or policies that weren't strictly enforced during your previous query.
To troubleshoot this issue, you might want to check the platform's release notes or documentation for any recent updates or changes that could affect response length.