Matching Engine: Queries with filtering do not wor...

federicoprt · 11-10-2022 07:31 AM

I tried to run this notebook: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/matching_engi...
I got an issue with the filtering step:
Let's say glove100.json is:

{"id":"0","embedding":[-0.99544,-2.3651],"restricts":[{"namespace": "class", "allow_list": ["0"]}],"crowding_tag":"a"}
{"id":"1","embedding":[0.42052,-1.1817],"restricts":[{"namespace": "class", "allow_list": ["1"]}],"crowding_tag":"b"}
{"id":"2","embedding":[-0.10185,0.59817],"restricts":[{"namespace": "class", "allow_list": ["2"]}],"crowding_tag":"a"}

If I try to filter in this way:

for val in query:
    request.float_val.append(val)

restrict = match_service_pb2.Namespace()
restrict.name = "class"
restrict.allow_tokens.append("1")
request.restricts.append(restrict)
response = stub.Match(request)
response

I do not get any response (empty result). If from the above code I remove the "restrict block", it works (of course without filtering).

request = match_service_pb2.MatchRequest()
request.deployed_index_id = DEPLOYED_INDEX_ID
for val in query:
   request.float_val.append(val)
# restrict = match_service_pb2.Namespace()
# restrict.name = "class"
# restrict.allow_tokens.append("1")
# request.restricts.append(restrict)
response = stub.Match(request)
response

response:

neighbor {
id: "0"
distance: 17.592369079589844
}
neighbor {
id: "1"
distance: 17.592369079589844
}

But, if I add a new vector by Vertex SDK for Python (link ), in this way:

insert_datapoints_payload = aiplatform_v1.IndexDatapoint(
datapoint_id="3",
feature_vector=query,
restricts=[{"namespace": "class", "allow_list": ["3"]}],
# crowding_tag=aiplatform_v1.IndexDatapoint.CrowdingTag(crowding_attribute="b"), <-- this does not seem to change anything
)

upsert_request = aiplatform_v1.UpsertDatapointsRequest(
index=INDEX_RESOURCE_NAME, datapoints=[insert_datapoints_payload]
)

index_client.upsert_datapoints(request=upsert_request)

and then I try to filter in the same way above with filter class == "3", I get the right response.

It seems like the allow_tokens are "seen" by vertex only when I insert a new vector by Vertex SDK and not when I specify them in the initial glove100.json.

Moreover, if I update a datapoint where the filter did not work, for example id=1:

update_datapoints_payload = aiplatform_v1.IndexDatapoint(
datapoint_id="1",
feature_vector=embedding,
restricts=[{"namespace": "class", "allow_list": ["1"]}],
# crowding_tag=aiplatform_v1.IndexDatapoint.CrowdingTag(crowding_attribute="b"),
)

upsert_request = aiplatform_v1.UpsertDatapointsRequest(
index=INDEX_RESOURCE_NAME, datapoints=[update_datapoints_payload]
)

index_client.upsert_datapoints(request=upsert_request)

response = stub.Match(request)
response

The filter for class=1 starts to work.

Is there a way to know what is actually stored in the index? I mean a kind of "SELECT * FROM myindex" in order to check embeddings and tokens stored.

Any ideas on how to solve this issue?

Thanks in advance

Specifications

I tried from local and from workbench, the result is the same.

- Version: Python 3.7.9
- Platform: Matching Engine. zone: europe-west1

ricconoel

Hi,

Thank you for reaching out, you can create a public issue tracker for this question since it might be an unexpected behavior and you have a workaround to show for.

federicoprt

Hi @ricconoel ,
Thank you for replying!

Since I do not see a "Matching Engine" section, in which one do you suggest to open an issue?

Matching Engine: Queries with filtering do not work as expected