Hello,
I'm currently working with an unstructured (Import with Metadata.JSONL) search application using Vertex AI Search. I've noticed that the PageNumber value is missing from my search results. In my previous experience with similar search implementations, this value is typically present to indicate which page of the document the search result comes from.
I would appreciate clarification on:
Has anyone else encountered similar behavior or could explain the expected behavior regarding PageNumber values in Vertex AI Search?
Thank you in advance for any insights.
Search Code Ref :
response = client.search(
request=discoveryengine.SearchRequest(
query=user_query,
# filter=f"category: ANY(\"{filter}\")",
page_size=10,
serving_config=serving_config,
content_search_spec=discoveryengine.SearchRequest.ContentSearchSpec(
extractive_content_spec=discoveryengine.SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
max_extractive_segment_count=4,
return_extractive_segment_score=True,
num_previous_segments=1,
num_next_segments=1,
)
),
query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(
condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO,
pin_unexpanded_results=True,
),
spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(
mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
)
)
)
print(response)
SearchPager Ref :
SearchPager<results {
id: "doc-164"
document {
name: "projects/REPLACE VALUE/locations/global/collections/default_collection/dataStores/REPLACE VALUE_1733723908768/branches/0/documents/doc-164"
id: "doc-164"
struct_data {
fields {
key: "title"
value {
string_value: "REPLACE VALUE.pdf"
}
}
fields {
key: "start_year"
value {
string_value: "2020"
}
}
fields {
key: "model"
value {
list_value {
values {
string_value: "VENUE"
}
values {
string_value: "ALL"
}
}
}
}
fields {
key: "end_year"
value {
string_value: "Now"
}
}
fields {
key: "category"
value {
list_value {
values {
string_value: "QX1.6"
}
values {
string_value: "ALL"
}
}
}
}
}
derived_struct_data {
fields {
key: "link"
value {
string_value: "gs://REPLACE VALUE.pdf"
}
}
fields {
key: "extractive_segments"
value {
list_value {
values {
struct_value {
fields {
key: "relevanceScore"
value {
number_value: 0.83465969562530518
}
}
fields {
key: "id"
value {
string_value: "c1"
}
}
fields {
key: "content"
value {
string_value: "REPLACE VALUE"
}
}
}
}
}
}
}
}
}
}
Hi @MaxChen1126,
Welcome to Google Cloud Community!
It sounds like you're working on an interesting project with Vertex AI Search, and I can understand why you'd be concerned about the missing PageNumber value in your search results. The PageNumber can be crucial for understanding the context of a search result, especially when dealing with large documents or datasets. To answer your questions:
1. Under what conditions should the PageNumber value be present in the search results?
In general, PageNumber should appear in search results when:
2. Are there specific scenarios or document types where PageNumber information might not be available?
Yes, there are some cases where the PageNumber may not be available or may not be automatically included in the search results:
3. Has anyone else encountered similar behavior or could explain the expected behavior regarding PageNumber values in Vertex AI Search?
Yes, this is a known issue that some users have encountered when working with large or segmented documents in Vertex AI Search. The issue can often be traced back to how the documents are structured and how metadata, including PageNumber, is handled during ingestion. Here are the common causes:
In order for you to troubleshoot and resolve the issue, When querying the search index, make sure you’re requesting the PageNumber field in your search query's return parameters. Some fields might be excluded by default, so you need to explicitly ask for the PageNumber in the response. You can check this document to know more about the concept of Indexing.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |