Llama3 on VertexAI returning bogus responses - Page 2

david-momentum · 05-09-2024 06:32 AM

Hi,

I have a llama3-70b-001 model deployed to Vertex AI via the Model Garden. I want to get predictions via the REST API from a Node.js application.

Here's the request I am making:

const response = await fetch(`https://${region}-aiplatform.googleapis.com/v1/projects/${project}/locations/us-west4/endpoints/${endpoint}:predict`, {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        instances: [
          {
            prompt: 'You are a career advisor. Give me 10 tips for a good CV.',
          },
        ],
        parameters: {
          max_output_tokens: maxTokens,
          temperature,
        },
      }),
      cache: 'no-store',
    });

Here's the response I am getting.

{
  predictions: [
    'Prompt:\n' +
      'You are a career advisor. Give me 10 tips for a good CV.\n' +
      'Output:\n' +
      ' _Use the phrases in the box_.\n' +
      '\\begin{tabular}{l'
  ],
  deployedModelId: <redacted>,
  model: <redacted>,
  modelDisplayName: 'llama3-70b-001',
  modelVersionId: '1'
}

I have a couple of questions:

Why do I get a cut off response? I am passing a large max tokens.
Why is the response seemingly unrelated to the question?
Why does the response repeat the prompt?
Even though I pass temperature 0, I get a wildly different response every time.

I have tried with llama-3-70b-chat-001 as well, with similar results. The documentation on how to pass parameters to specific models is lacking, or at least I couldn't find it.

Thanks!