How to understand the JSON returned from Gemini wi... - Page 2

lambdalove · 09-06-2024 02:05 PM

I'd like some help understanding the meaning of the JSON returned by Gemini with respect to citations. (I checked Google's API documentation, but I found it to be sparse and unclear.)

I've grounded Gemini 1.5 Pro in a Vertex AI Search app and data store. I've observed that when I prompt the model, in the JSON response may contain the following info pertaining to citations:

1) candidates[].groundingMetadata.groundingChunks[]
This is a list of objects with a retrievedContext property, which indicates the title and URI of a particular document in the data store.

2) candidates[].groundingMetadata.groundingSupports[]
As far as I've seen, this list contains a single object, which includes:
-- the model's text response to the prompt (segment.text)
-- a groundingChunkIndices[] list (Google's documentation indicates that this is a list of indexes into the groundingChunks[] list mentioned above.)

My questions:
Can someone please explain the actual meaning of groundingChunks[] vs. groundingChunkIndices[]? I don't know how to interpret these fields in terms of what they mean for citations.

For example, I've seen something like the following:

"groundingChunks": [
  {
    retrievedContext: {
      uri: 'gs://my-bucket/0.pdf',
      title: 'A'
    }
  },
  {
    retrievedContext: {
      uri: 'gs://my-bucket/1.pdf',
      title: 'B'
    }
  },
  {
    retrievedContext: {
      uri: 'gs://my-bucket/2.pdf',
      title: 'C'
    }
  },
  {
    retrievedContext: {
      uri: 'gs://my-bucket/3.pdf',
      title: 'D'
    }
  },
  {
    retrievedContext: {
      uri: 'gs://my-bucket/4.pdf',
      title: 'E'
    }
  }
]

"groundingSupports": [
  {
    segment: {
      endIndex: 69,
      text: '(Here is the text that the model produced in response to the prompt.)'
    },
    groundingChunkIndices: [ 0, 3 ],
    confidenceScores: [ 0.9919262, 0.9919262 ]
  }
]

As you can see, there are five documents in groundingChunks[], two of which are specified in groundingChunkIndices[]. What does this mean?
Did the model produce a response based on all 5 documents in groundingChunks[]? But then what does it mean that just two of them are indicated in groundingChunkIndices[]?

Thanks for your help!

How to understand the JSON returned from Gemini with respect to citations