Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI Search Agent Builder: Indexing Failure After Successful Imports

I've built a Vertex AI search application using the Agent Builder via Python. Initially, datastore creation, app creation, linking, and document import (via datastore URIs) worked correctly, taking ~10 minutes.

However, subsequent attempts to recreate the entire process or simply update documents (which previously functioned) now fail. The process takes ~40 minutes and returns the following error:

"Document projects/<project id>/locations/eu/collections/default_collection/dataStores/<Datastore id>/branches/0/documents/<id> (uri: gs://<bucket>/<server>/<namespace>/documents/<key>/Workplace Health and Safety.pdf) is imported but not yet indexed. Its index status is not found." with a status code 5 and @type: "type.googleapis.com/google.rpc.Status"

I have verified that the GCS URI is valid. This has also not been changed.


Type: Unstructured data
Serving state: Enabled
Region: eu

labels: {
credential_id: ""
location: "eu"
method: "google.cloud.discoveryengine.v1beta.DocumentService.ImportDocuments"
project_id: ""
service: "discoveryengine.googleapis.com"
version: ""
}
I've already tried recreating the datastore and application. The issue persists after multiple attempts.

I have tried multiple documents and one document, the issue still persists.

I'm seeking assistance in resolving this indexing failure. The documents are successfully imported, but Discovery Engine fails to index them. The missing index status is unusual and i am not sure what it means.

0 6 1,204
6 REPLIES 6

I've been getting the same error and I found another post related to this.

Were you able to find a solution? The issue seems to have temporarily resolved for me after some waiting, though the lack of clarity on why it happened is concerning. I’ve observed that the error may simply mean the document hasn't fully indexed yet, rather than a full upload failure. To manage this, I plan to monitor the list of successfully uploaded documents in the datastore and re-import any missing ones. If anyone has tips for handling or preventing these indexing delays, I’d really appreciate it!

No, apparently this isn't something wrong we're doing. This is a bug that needs fixing.

Thats relieving, where were you able to get a hold of this insight, also thanks for the quick replies 🙂

If you are using “Linked unstructured documents (JSONL with metadata)”, please make sure that the "structData" property is written in camel case as described in the documentation https://cloud.google.com/generative-ai-app-builder/docs/prepare-data#storage-unstructured and according to the Google JSON Style Guide https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Property_Name_Format#Property_Name_F.... I had the same problem that you described and have been troubleshooting since Friday and this has helped.

So far I have always written the property "struct_data" in snake case and that worked without any problems. Since Friday this was no longer possible and difficult to identify due to the short error message. What also made troubleshooting difficult was that the metadata from struct_data was displayed correctly in the App Builder and the automatically derived schema also looked valid.

Hope this helps!

does anyone know how long it typically takes the search agent to index a simple website?