Dear Community,
I'm facing issue with batch update of Vertex AI Vector Search Index. I'm able only to add new vectors but update neither deletion of existing vectors is not performed. The update process doesn't fail but index remains untouched.
Did anybody experience this earlier? Any suggestion what can be responsible for that.
Are you updating via these instructions found in this documentation? :
To update the content of an existing
Index
, use theIndexService.UpdateIndex
method.To replace the existing content of an existing
Index
:
- Set
Index.metadata.contentsDeltaUri
to the Cloud Storage URI that includes the vectors you want to update.- Set
isCompleteOverwrite
to true.If you set the
contentsDeltaUri
field when callingIndexService.UpdateIndex
, then no other index fields (such asdisplayName
,description
, oruserLabels
) can be also updated as part of the same call.
I've tested also with isCompleteOverwite set to true for updating and separately for deleting subset of vectors. Each time this option worked as documented leading to overwriting whole index while my aim is to update / delete only part of vectors stored in the index.
I am also experiencing some strange behaviors.
Following the instructions in how to input data in the bucket
I have witnessed the following behavior:
First update (index has no data).
Bucket contains a test_data_1.json file with 4 entries. Update goes well and my index has a dense count of 4
Second update (index dense count of 4).
Bucket contains the following
delete/delete_intent_1.txt
test_data_2.json
test_data_without_intent_1.json
I trigger the update using the following snippet
metadata = struct_pb2.Struct(
fields={
"contentsDeltaUri": struct_pb2.Value(string_value=bucket_name),
}
)
matching_engine_index_update_request = {
"name": name,
"display_name": index.display_name,
"description": index.description,
"metadata": struct_pb2.Value(struct_value=metadata),
}
I end up with a dense count of 12 and intent_1 is still there (NB: in this iteration the id of intent_1 is in the delete_intent_1.txt and is not in test_data_without_intent_1.json. Why it still in the vector search is beyond me).
Third update (index has a dense count of 12)
This time I decided to try with the overwrite set to true so I am using this snippet to update the index
metadata = struct_pb2.Struct(
fields={
"contentsDeltaUri": struct_pb2.Value(string_value=bucket_name),
"isCompleteOverwrite": struct_pb2.Value(bool_value=True),
}
)
name = f"projects/{self.project_id}/locations/{LOCATION}/indexes/{index.name}"
matching_engine_index_update_request = {
"name": name,
"display_name": index.display_name,
"description": index.description,
"metadata": struct_pb2.Value(struct_value=metadata),
}
Bucket contains the following
delete/delete_intent_1.txt
test_data_2.json
test_data_without_intent_1.json
After the update I get a dense count of 8 and intent_1 is not there.
At this point I get curious and I am wondering why it is necessary to include the delete/ folder when it is stated that `An ID cannot appear both in a regular data file and a delete data file. `?
So I tried a fourth update with
metadata = struct_pb2.Struct(
fields={
"contentsDeltaUri": struct_pb2.Value(string_value=bucket_name),
"isCompleteOverwrite": struct_pb2.Value(bool_value=True),
}
)
name = f"projects/{self.project_id}/locations/{LOCATION}/indexes/{index.name}"
matching_engine_index_update_request = {
"name": name,
"display_name": index.display_name,
"description": index.description,
"metadata": struct_pb2.Value(struct_value=metadata),
}
Bucket contains the following
test_data_2.json
test_data_without_intent_1.json
After the update I get a dense count of 8 and intent_1 is not there.
At this point my questions are:
Please let me know and don't hesitate to contact me if you need more information.
Kind regards,
Enrico
Hello ,
I also have similar challenge while executing the upsert operation on index, Are there any updates on this issue ?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |