Unable to generate the output up to 8192 tokens in...

taihuang · 07-16-2024 10:11 AM

I use code-bison-32k and set max_output_tokens = 8192 in my Python code. The prompt asks to geneate the output in a json format. The output is always truncated for larger output. I used a couple free token count web site and they showed the output only has around 3000 tokens. Can code-bison-32k really generate up to 8192 tokens output as advertised?

ruthseki

Hi @taihuang,

Welcome to Google Cloud Community!

The issue you’re encountering with the code-bison-32k model in VertexAI is related to token limits.

According to this documentation:

There is no charge for using the CountTokens API. The maximum quota for the CountTokens API and the ComputeTokens API is 3000 requests per minute.

With regards to if code-bison-32k can really generate up to 8192 tokens output as advertised:

The code-bison-32k model has a maximum context length of 8192 tokens. This means that the total number of tokens in your input prompt (including any previous conversation history) cannot exceed 8192 tokens.

I hope the above information is helpful.

Unable to generate the output up to 8192 tokens in the ouput