I'm using the prompt template directly from Mistral:
{
"instances": [
{
"prompt": "<s>[INST] What is your favourite colour and why? [/INST]My favorite color is blue.</s>[INST] And which one after that? [/INST]"
}
],
"parameters": {
"max_tokens": -1
}
}
Then use this command:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://europe-west4-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/europe-west4/endpoints/${ENDPOINT_ID}:predict -d "@request.json"
And this is a typical response:
{
"predictions": [
"Prompt:\n\u003cs\u003e[INST] What is your favourite colour and why? [/INST]My favorite color is blue.\u003c/s\u003e[INST] And which one after that? [/INST]\nOutput:\n My second favorite color is green. This is because it is a color that represents"
],
"deployedModelId": "[REDACTED]",
"model": "projects/[REDACTED]/locations/europe-west4/models/mistral-7b-instruct-v0_1",
"modelDisplayName": "mistral-7b-instruct-v0_1",
"modelVersionId": "1"
}
As you can see the response is short, which is fine, but it's also truncated somehow. What causes this?
I have tried the following:
Am I still doing something wrong? Or does the Vertex AI endpoint mangle my JSON somehow?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |