I use code-bison-32k and set max_output_tokens = 8192 in my Python code. The prompt asks to geneate the output in a json format. The output is always truncated for larger output. I used a couple free token count web site and they showed the output only has around 3000 tokens. Can code-bison-32k really generate up to 8192 tokens output as advertised?
Hi @taihuang,
Welcome to Google Cloud Community!
The issue you’re encountering with the code-bison-32k model in VertexAI is related to token limits.
According to this documentation:
There is no charge for using the CountTokens API. The maximum quota for the CountTokens API and the ComputeTokens API is 3000 requests per minute.
With regards to if code-bison-32k can really generate up to 8192 tokens output as advertised:
The code-bison-32k model has a maximum context length of 8192 tokens. This means that the total number of tokens in your input prompt (including any previous conversation history) cannot exceed 8192 tokens.
I hope the above information is helpful.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |