Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Mistral-7B-instruct on Vertex AI endpoint often truncates responses

I'm using the prompt template directly from Mistral:

 

{
  "instances": [
    {
      "prompt": "<s>[INST] What is your favourite colour and why? [/INST]My favorite color is blue.</s>[INST] And which one after that? [/INST]"
    }
  ],
  "parameters": {
    "max_tokens": -1
  }
}

 

 

Then use this command:

 

 

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://europe-west4-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/europe-west4/endpoints/${ENDPOINT_ID}:predict -d "@request.json"

 

And this is a typical response:

 

{
  "predictions": [
    "Prompt:\n\u003cs\u003e[INST] What is your favourite colour and why? [/INST]My favorite color is blue.\u003c/s\u003e[INST] And which one after that? [/INST]\nOutput:\n My second favorite color is green. This is because it is a color that represents"
  ],
  "deployedModelId": "[REDACTED]",
  "model": "projects/[REDACTED]/locations/europe-west4/models/mistral-7b-instruct-v0_1",
  "modelDisplayName": "mistral-7b-instruct-v0_1",
  "modelVersionId": "1"
}

 

As you can see the response is short, which is fine, but it's also truncated somehow. What causes this?

I have tried the following:

  • Different values for max_tokens such as -1, 500, 2048 and not including max_tokens at all.
  • Escaping the prompt in different ways, including escaping the forward slashes.

Am I still doing something wrong? Or does the Vertex AI endpoint mangle my JSON somehow?

2 REPLIES 2