Solved: How to pass prior conversation over LLaMa 2 7B cha... - Page 2

wbalkan · 10-12-2023 07:43 PM

Hello. I have deployed and been successfully hitting an endpoint for the LLaMa 2 7B chat model on Vertex AI. However, I am having a couple of issues. I sent this body in a request:

{

"instances": [

{ "prompt": "this is the prompt"}

],

"parameters": {

"temperature": 0.2,

"maxOutputTokens": 256,

"topK": 40,

"topP": 0.95

}

And received this response:

{

"predictions": [

"Prompt:\nthis is the prompt\nOutput:\n for class today:\n\nPlease write a 1-2 page reflection on"

],

"deployedModelId": "8051409189878104064",

"model": "projects/563127813488/locations/us-central1/models/llama2-7b-chat-base",

"modelDisplayName": "llama2-7b-chat-base",

"modelVersionId": "1"

}

Why is this response cutting off mid-sentence? I have adjusted the maxOutputTokens parameter, but no matter what I set it to, the response cuts off in roughly the same place. How can I fix this?

I would also like to pass prior conversation to the LLaMa model. I can do this to chat-bison with a body like this:

{

"instances": [

{

"context": "",

"examples": [],

"messages": [

{

"author": "user",

"content": "hello my name is tim"

},

{

"author": "bot",

"content": " Hello Tim, how can I help you today?

",

"citationMetadata": {

"citations": []

}

},

{

"author": "user",

"content": "what is my name"

}

]

}

],

"parameters": {

"candidateCount": 1,

"maxOutputTokens": 1024,

"temperature": 0.2,

"topP": 0.8,

"topK": 40

}

The model will "remember" that my name is Tim. What is the syntax for doing the equivalent with LLaMa? Right now I am constrained to a singular "prompt" field like this:

{

"instances": [

{ "prompt": "this is the prompt"}

],

"parameters": {

"temperature": 0.2,

"maxOutputTokens": 256,

"topK": 40,

"topP": 0.95

}

How can I additionally pass prior queries and responses, or even a system prompt? Thank you in advance for your help!

wbalkan

Hi - I did figure out how to pass the conversation, but I haven't solved the issue of the responses getting cut off yet. This is the format I used for passing conversations:

{

"instances": [

{ "prompt": "[SYS]This is the system prompt[/SYS][INST]Here is the user's first prompt[/INST]This is the model's first response[INST]This is the next prompt[/INST]"}

]

}

By using the [SYS] and [INST] tags I was able to pass the conversation and a system prompt. I hope this helps!

View solution in original post

Sakshat234

Hey, I guess, I solved the cut off responses: This is my input

text=endpoint.predict(instances=[ { "prompt" : "[SYS]Be respectful and answer, use emojis[/SYS][INST]Hey[/INST]Hey[INST]How is your day going so far?[/INST]","max_tokens":1000 } ]

)

The parameters will be inside the dictionary

View solution in original post

How to pass prior conversation over LLaMa 2 7B chat API? How to increase output length?