Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Mistral Large (2407): Inference: context length error

I am using the Mistral Large (2407)  for inference. The context window according to the Vertex page for this model (here) say it has a context length of 128k. This is also what the Mistral docs confirms.

When I send a "large" request through (e.g. for 65k tokens) I get the following error:

{"object":"Error","message":"Prompt contains 65673 tokens, too large for model with 32768 maximum context length","type":"invalid_request_error","code":3051}

The API only seems to accept a 32k context length. Here is a minimal curl command that reproduces the issue:

curl \
-X POST \
-H "Authorization: Bearer $(./gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://europe-west4-aiplatform.googleapis.com/v1/projects/basebox-llm-api/locations/europe-west4/publishers/mistralai/models/mistral-large@2407:streamRawPredict \
--data '{
"model": "mistral-large",
"messages": [
{"role": "user", "content": '"$(cat large-text-file.txt | jq -Rs .)"'}
]
}' - you'd have to supply your own large-text-file.

Anyone come across this?

0 1 3,102
1 REPLY 1

Hello,

Thank you for contacting the Google Cloud Community.

I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here. Our support team will prioritize your request and provide you with the assistance you need.

For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!