Layout Parser in Document AI: Boundin box issue

vguleria · 11-28-2024 12:06 AM

I'm using the Layout Parser in Document AI, but it doesn't seem to return bounding box details. Is there a way to retrieve these details from the API?

Currently, the response I receive looks like this:

{
  "document": {
    "documentLayout": {
      "blocks": [
        {
          "blockId": "1",
          "textBlock": {
            "text": "Firefox",
            "type": "header"
          },
          "pageSpan": {
            "pageStart": 1,
            "pageEnd": 1
          }
        }
      ]
    }
  }
}

ruthseki

Hi @vguleria,

Welcome to Google Cloud Community!

Here's what you should look for:

boundingPoly: This is the most common field name for the bounding box. It usually contains normalized coordinates relative to the page.
layout (Optional): In some older versions or slightly different API implementations, bounding box information might be nested within a layout object.

You may try these approaches to fix your code:

Check your API response carefully: Examine the complete JSON response from the Document AI Layout Parser API. Search for "boundingPoly" or "layout" within the blocks array. The boundingPoly is usually a direct sibling of blockId, textBlock, and pageSpan.
Update your code to access the bounding box: Once you've located the boundingPoly field, update your code to extract the vertex coordinates.

Additionally, you may check these links:

I also found this case that you might find useful.

I hope the above information is helpful.

nguyen_toan

Hello friend, I also have the same problem as you. You have now solved the above problem. Can you please tell me how to fix them if you have solved them.

divekarsc

Has someone found solution to this original question?

verasupport

It appears that you `returnBoundingBoxes` is exposed in `layoutConfig` but I'm still not seeing it in the return payload. `returnImages` is working as expected.

 const request: protos.google.cloud.documentai.v1beta3.IProcessRequest = {
    name,
    rawDocument: {
      content: documentContent,
      mimeType,
    },
    processOptions: {
      layoutConfig: {
        returnBoundingBoxes: true, // not working
        returnImages: true,
      },
    },
  };

  const [result] = await client.processDocument(request);

@ruthsekiAny ideas?