Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Llama3 on VertexAI returning bogus responses

Hi,

I have a llama3-70b-001 model deployed to Vertex AI via the Model Garden. I want to get predictions via the REST API from a Node.js application.

Here's the request I am making:

 

const response = await fetch(`https://${region}-aiplatform.googleapis.com/v1/projects/${project}/locations/us-west4/endpoints/${endpoint}:predict`, {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        instances: [
          {
            prompt: 'You are a career advisor. Give me 10 tips for a good CV.',
          },
        ],
        parameters: {
          max_output_tokens: maxTokens,
          temperature,
        },
      }),
      cache: 'no-store',
    });

 

Here's the response I am getting.

 

{
  predictions: [
    'Prompt:\n' +
      'You are a career advisor. Give me 10 tips for a good CV.\n' +
      'Output:\n' +
      ' _Use the phrases in the box_.\n' +
      '\\begin{tabular}{l'
  ],
  deployedModelId: <redacted>,
  model: <redacted>,
  modelDisplayName: 'llama3-70b-001',
  modelVersionId: '1'
}

 

I have a couple of questions:

  • Why do I get a cut off response? I am passing a large max tokens.
  • Why is the response seemingly unrelated to the question?
  • Why does the response repeat the prompt?
  • Even though I pass temperature 0, I get a wildly different response every time.

I have tried with llama-3-70b-chat-001 as well, with similar results. The documentation on how to pass parameters to specific models is lacking, or at least I couldn't find it.

Thanks!

 

 

2 6 991
6 REPLIES 6