Hello. I have deployed and been successfully hitting an endpoint for the LLaMa 2 7B chat model on Vertex AI. However, I am having a couple of issues. I sent this body in a request:
{
"instances": [
{ "prompt": "this is the prompt"}
],
"parameters": {
"temperature": 0.2,
"maxOutputTokens": 256,
"topK": 40,
"topP": 0.95
}
}
And received this response:
{
"predictions": [
"Prompt:\nthis is the prompt\nOutput:\n for class today:\n\nPlease write a 1-2 page reflection on"
],
"deployedModelId": "8051409189878104064",
"model": "projects/563127813488/locations/us-central1/models/llama2-7b-chat-base",
"modelDisplayName": "llama2-7b-chat-base",
"modelVersionId": "1"
}
Why is this response cutting off mid-sentence? I have adjusted the maxOutputTokens parameter, but no matter what I set it to, the response cuts off in roughly the same place. How can I fix this?
I would also like to pass prior conversation to the LLaMa model. I can do this to chat-bison with a body like this:
{
"instances": [
{
"context": "",
"examples": [],
"messages": [
{
"author": "user",
"content": "hello my name is tim"
},
{
"author": "bot",
"content": " Hello Tim, how can I help you today?
",
"citationMetadata": {
"citations": []
}
},
{
"author": "user",
"content": "what is my name"
}
]
}
],
"parameters": {
"candidateCount": 1,
"maxOutputTokens": 1024,
"temperature": 0.2,
"topP": 0.8,
"topK": 40
}
}
The model will "remember" that my name is Tim. What is the syntax for doing the equivalent with LLaMa? Right now I am constrained to a singular "prompt" field like this:
{
"instances": [
{ "prompt": "this is the prompt"}
],
"parameters": {
"temperature": 0.2,
"maxOutputTokens": 256,
"topK": 40,
"topP": 0.95
}
}
How can I additionally pass prior queries and responses, or even a system prompt? Thank you in advance for your help!