Re: Mistral Large (2407): Inference: context lengt...

KevinNaidoo · 09-04-2024 03:37 AM

I am using the Mistral Large (2407) for inference. The context window according to the Vertex page for this model (here) say it has a context length of 128k. This is also what the Mistral docs confirms.

When I send a "large" request through (e.g. for 65k tokens) I get the following error:

{"object":"Error","message":"Prompt contains 65673 tokens, too large for model with 32768 maximum context length","type":"invalid_request_error","code":3051}

The API only seems to accept a 32k context length. Here is a minimal curl command that reproduces the issue:

curl \
-X POST \
-H "Authorization: Bearer $(./gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://europe-west4-aiplatform.googleapis.com/v1/projects/basebox-llm-api/locations /europe-west4/publishers/mistralai/models/mistral-large@2407:streamRawPredict \
--data '{
"model": "mistral-large",
"messages": [
{"role": "user", "content": '"$(cat large-text-file.txt | jq -Rs .)"'}
]
}' - you'd have to supply your own large-text-file.

Anyone come across this?

jaia

Hello,

Thank you for contacting the Google Cloud Community.

I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here. Our support team will prioritize your request and provide you with the assistance you need.

For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!

Mistral Large (2407): Inference: context length error