Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex Agent Builder Data Store Tool + programatic querying getting extra results back

Hi, I am having trouble working with a data store Tool using indexed data from a csv file.  I'm planing to invoke this code from a TOOL via the Vertex AI Agent Builder.

When I query the data store I'm getting more results returned than I would have expected. 

For example, csv data:

id,pets,enriched_result
1,I have a question about dogs,please check out dog information https://acme/dogs
2,I have a question about cats,please check out cat information https://acme/cats

My tests:
Dogs, returns both cats and dogs:

query: "any info about dogs?",
want: "please check out dog information https://acme/dogs",
got: "please check out dog information https://acme/dogs,please check out cat information https://acme/cats"

Cats, returns both cats and dogs:

query: "any info about cats?",
want: "please check out cat information https://acme/cats",
got: "please check out dog information https://acme/dogs,please check out cat information https://acme/cats"

Birds, returns just cats:

query: "any info about birds?",
want: "no results found",
got: "please check out cat information https://acme/cats"

My code, is in go but can convert it if easier but I hope it helps to demonstrate the approach, I used this example as a base

package pets

import (
"context"
"errors"
"fmt"
"strings"

discoveryengine "cloud.google.com/go/discoveryengine/apiv1beta"
"cloud.google.com/go/discoveryengine/apiv1beta/discoveryenginepb"
"github.com/chainguard-dev/clog"
"google.golang.org/api/iterator"
"google.golang.org/api/option"
)

const (
projectID = "foo"
searchEngineID = "pets-ds_123"
location = "us"
endpoint_base = "discoveryengine.googleapis.com:443"
)

func SearchQuery(ctx context.Context, query string) (string, error) {
log := clog.FromContext(ctx)
endpoint := endpoint_base

if
location != "global" {
endpoint = fmt.Sprintf("%s-%s", location, endpoint_base)
}

client, err := discoveryengine.NewSearchClient(ctx, option.WithEndpoint(endpoint))
if err != nil {
return "", fmt.Errorf("creating Vertex AI Search client: %w", err)
}
defer client.Close()

// Full resource name of search engine serving config
servingConfig := fmt.Sprintf("projects/%s/locations/%s/collections/default_collection/dataStores/%s/servingConfigs/default_serving_config",
projectID, location, searchEngineID)

searchRequest := &discoveryenginepb.SearchRequest{
ServingConfig: servingConfig,
Query: query,
RelevanceThreshold: discoveryenginepb.SearchRequest_HIGH,
}

extraPetInfo := []string{}
count := 0

it := client.Search(ctx, searchRequest)
for {
resp, err := it.Next()
if errors.Is(err, iterator.Done) {
log.Infof("%d No more results", count)
break
}

if err != nil {
return "", err
}
extraPetInfo = append(extraPetInfo, resp.GetDocument().GetStructData().GetFields()["enriched_result"].GetStringValue())

log.Infof("%+v\n", resp)
}

if len(extraPetInfo) == 0 {
return "No results found", nil
}

return strings.Join(extraPetInfo, "\n"), nil
}

This is just an example but demonstrates the same behaviour I'm having with a real world scenario.  

Are my assumptions correct in how this should fit together? Should I be able to pass a query to the search and match against a pet column so I can return another field from that matched row?

Any suggestions on how I can adapt the code? Thanks.

0 1 242
1 REPLY 1

Hi @rawlingsj,

Welcome to Google Cloud Community!

Based on your description, you're encountering unexpected behavior in your data querying logic. It seems that the query might not be targeting the specific field you're intending to search against. Vertex AI Search requires you to specify which field to match against in your queries. This ensures that the search is more precise and relevant to the specific data you are querying. By defining the field, you can avoid generic matches and improve the accuracy of your search results. With regard to this, you can consider the following, which might help you answer your current scenario:

  • Query Relevance:
    • Relevance Threshold: The RelevanceThreshold you've set to HIGH might not be sufficient to filter out non-relevant results. This is good for precise matches but might be too restrictive for natural language queries.You might want to experiment with ‘discoveryenginepb.SearchRequest_MEDIUM’ or even ‘discoveryenginepb.SearchRequest_LOW’ to allow for broader matches.
    • Query Expansion: Your query might be too broad. For example, "any info about dogs?" could be triggering a match for "cats" as well due to the general nature of the query. Consider refining the query logic to more precisely match specific terms.
  • Data Structure:
    • Ensure your CSV data is indexed correctly in the data store. If the search engine is not configured to index specific fields, it might not behave as expected. The ‘pets’ field should be indexed to enable efficient search.
  • Search Logic
    • Modify your search request to specifically target the "pets" field. You can use filtering in your search request to limit the results to relevant documents. Consider using the ‘Filter’ parameter in your ‘SearchRequest’ to specify that you only want documents where the ‘pets’ field matches the user's query.

I hope the above information is helpful.