RAG | Vector Search | Vertex AI Search | Grounding

hardikitis · 01-31-2025 02:23 AM

My Usecase:

Uploading 500GB of documents(structured and unstructured) in a vectordb/datastore.
Able to query on provided document_ids only. (to make it multi user)
Conversational Style (a thread kinda, if the user comes after a year, they must be able to continue on that thread if they want)
Importing documents: pub-sub implementation for real-time ingestion updates of each documents.

What I've tried

Vector Search: https://cloud.google.com/vertex-ai/generative-ai/docs/use-vertexai-vector-search
- It's costly, I need to keep my machine on even when a user ain't querying. Doesn't support xlsx, and bigger document sizes. Can't create an index other than 768 dim embeddings as corpus doesn't support. Doesn't provide page number in retrieval.
- Few Observations:
  - to reindex same file name, use file path, instead of folder path in gcs.
  - same name corpus can be created
  - don't know how many vectors an index can hold (10B according to pricing calculator)
  - no machine is assigned when deployed an index (idk what happened exactly)
Vertex AI Search X RAG: Colab Docs
- Since vector search index was costly for me, I switched to data stores in Vertex AI Search as backend. But Rag here doesn't support any tweaking with data stores like import using corpus, listing file_ids, etc. In docs its been mentioned that one can pass rag_file_ids to let it search with those vectors only but it throws error

Rag file ids are not supported for Vertex AI Search

instead. What's the purpose of using corpus then?

It created a schema automatically which is visible in activity on console, how can i make use of this schema, can i put my metadata to filter my file ids? (Please, just don't share links, give some explanation as well.)
Is there any cost for creating corpus, if i have attached data store to it?
Vertex AI Search X Grounding API: Docs

import vertexai

from vertexai.preview.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Tool,
    grounding,
)

# TODO(developer): Update and un-comment below lines
PROJECT_ID = "PRO"
data_store_id = "PRO_ID"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-001")

tool = Tool.from_retrieval(
    grounding.Retrieval(
        grounding.VertexAISearch(
            datastore=data_store_id,
            project=PROJECT_ID,
            location="global",
        )
    )
)

prompt = "What is the name of the company?"
response = model.generate_content(
    prompt,
    tools=[tool],
    generation_config=GenerationConfig(
        temperature=0.0,
    ),
)

print(response.text)
print(response)

this works quite well, but I want to achieve rag_file_ids functionality, is it possible here?
How can i apply pub-sub to datastore import?
Whats the best way to achieve conversational style? (multi-turn?)
How to calculate cost for this combo? Vertex AI Search x Grounding API (pricing page has two kinda so i am confused)
Miscelleneous
- Can i create a persona for each ingested file? If yes, provide some snippet example. [https://cloud.google.com/generative-ai-app-builder/docs/filter-search-metadata]
- Does Conversational Agents help for my use case with Generative Fallback?
- How does this thing work for rag_file_ids? Grounding Multi Turn Cost for this? [data store + i/p and o/p tokens?]
- What's difference between third and fourth row in the image? Share links to both the thing from your docs.
- Is the Grounding API built into the Grounded Generation API? When should each be used, and which one corresponds to the above shared links?
- Recommend what I should use.

Miscellaneous questions can be vague as i haven't deep dived into them. Thanks in advance.

MJane

Hi @hardikitis,

Welcome to the Google Cloud Community!

I see you have a detailed set of questions regarding your document search implementation using Vertex AI Search and related services. This includes vector search, data stores, the Grounding API, and Pub/Sub integration. I understand you've also included details on cost, multi-user access, and conversational aspects. Let's go through each of your questions for possible solutions.

1. What's the purpose of using corpus?

When using Vertex AI Search data stores with the RAG pipeline, direct filtering based on rag_file_ids is not supported. The corpus is the underlying storage and indexing mechanism. Even if you create a data store, the corpus is still there. Data stores provide an abstraction layer to interact with the corpus, not a method to bypass it.

2. Automatic Schema

Schema Usage - The schema, visible in the console logs, helps you understand how your documents are indexed and structured. You can define which metadata fields should be indexed.
Metadata for Filtering - Since rag_file_ids isn't supported, you can achieve a similar result by adding a file ID to the document's metadata. Vertex AI Search uses this metadata to filter documents. By including a file ID as part of the metadata, you can imitate the functionality of rag_file_ids effectively.

3. Corpus Costs with Data Store

In Your Scenario, when you attach your data store to the corpus and index your documents, you are not bypassing costs. The underlying corpus is what's being used to store and process your documents. The cost will depend on the amount of indexed data, the amount of storage, and amount of queries you are performing.

4. Vertex AI Search x Grounding API and rag_file_ids

While you can't use rag_file_ids directly, you can achieve the same outcome by using the alternative approach we discussed:

Add file_id as Metadata - Make sure that when you ingest a document, you include its file ID as a metadata field example metadata.file_id.
Filter on file_id Metadata During Search - When you formulate your search request using the Grounding API, include a filter clause that targets the file_id metadata field to include only the desired documents based on the ID(s).

5. Pub/Sub for Data Store Import

Create a Pub/Sub topic to publish updates about new documents. Then, set up a Cloud Function subscriber that listens for these messages. When a new document is uploaded or updated, the Cloud Function is triggered, and it uses the Vertex AI Search API to ingest, update, or delete the document in your data store.

6. Conversational Style (Multi-Turn)

Store previous prompts and responses for each user/thread in a database or cache. Then, include this history in your next prompt. Be mindful of context window limits.

7. Cost Calculation (Vertex AI Search x Grounding API)

Input prompt - Cost of processing your input prompt for the grounding, which includes grounding facts . This is based on character count.
Output - Cost of the output generated by the model. This is also based on character count.
Grounded Generation for grounding on your own retrieved data - This is the cost for using the grounding functionality to generate grounded answers.
Data Retrieval : Vertex AI Search (Enterprise edition) - This is the query cost for the retrieval of documents by Vertex AI Search.

These costs are usually per 1,000 requests.The input prompt cost is related to the token processing of the prompt for the model, while Data Retrieval is the request to Vertex AI Search.

8. File Personas (Metadata)

Yes, you can create personas by adding metadata such as "file_category," "author," "user_group". During querying, you can use filters on these metadata fields to tailor results to specific use cases.

9. Conversational Agents and Generative Fallback

Conversational Agents could be useful if you intend to expand the application beyond document Q&A. For this particular use case, you would need to customize it a lot, so grounding is better.

Generative Fallback - Useful if you have search results with low relevancy, you can provide an answer using the model.

10. Grounding Multi-Turn and Cost

Multi-Turn - The Grounding API uses the history of your conversation as the context for the next request.
Cost - The cost is the same as for single requests but using the entire input prompt (including conversation history), output length, and the amount of documents retrieved.

11. Grounded Generation API vs. Grounding API

The Grounding API is what you are using, and it is not called Grounded Generation API. It uses the underlying search engine to fetch relevant data for your prompt and grounds the response accordingly.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

hardikitis

Appreciate your solution, @MJane

I tried unstructured documents with jsonl while creating datastore and made id field in structData indexable.

I tried both the apis. First, I implemented filter method using Grounded Generation Api (for rag_file_ids). Second, For Grounding Api, I wasn’t able to include filter when using grounding api. As suggested by you, when ingesting my docs to gcs, I have put my documentId as metadata for each document but couldn’t figure out how to filter when querying.

Grounded Generation API uses engine_id/app_id and Grounding Api uses datastore id. Since Grounding API doesn’t need to have an app attached to it, can I save cost using Grounding API by using it instead of Grounded Generation API.

Cost:

Scenario 1: Data store ($5 per GiB) + Layout Parser + Search App as Grounded Generation API needs it [searchapp($2:standard+$4:because by default it uses basic llm? How can I remove it?)+groundedgenerationapi($2.5)] per 1000 requests = data store + $8.5 per 1000 requests.

Scenario 2: Data store ($5 per GiB) + Layout Parser + Gemini 1.5 Flash | I/p prompt: 2500 char [$0.3125 per 1,000 requests] + O/p: Grounding API [$0.150 per 1,000 requests] = data store + $0.4625 per 1000 requests

Does both the scenarios complete my use case I’ve mentioned earlier, and are the calculations correct?

###Grounded Generation API

from google.cloud import discoveryengine_v1 as discoveryengine

project_number = "80"
engine_id = "unstructured-jsonl-17"

client = discoveryengine.GroundedGenerationServiceClient()

request = discoveryengine.GenerateGroundedContentRequest(
    # The full resource name of the location.
    # Format: projects/{project_number}/locations/{location}
    location=client.common_location_path(project=project_number, location="global"),
    generation_spec=discoveryengine.GenerateGroundedContentRequest.GenerationSpec(
        model_id="gemini-1.5-flash-001",
    ),
    # Conversation between user and model
    contents=[
        discoveryengine.GroundedGenerationContent(
            role="user",
            parts=[
                discoveryengine.GroundedGenerationContent.Part(
                    text=“What are the names of CHAPTER VI and CHAPTER VII ?"
                )
            ],
        )
    ],
    system_instruction=discoveryengine.GroundedGenerationContent(
        parts=[
            discoveryengine.GroundedGenerationContent.Part(
                text="Don't make up new information by yourself. Add a smiley emoji after the answer."
            )
        ],
    ),
    # What to ground on.
    grounding_spec=discoveryengine.GenerateGroundedContentRequest.GroundingSpec(
        grounding_sources=[
            discoveryengine.GenerateGroundedContentRequest.GroundingSource(
                search_source=discoveryengine.GenerateGroundedContentRequest.GroundingSource.SearchSource(
                    # The full resource name of the serving config for a Vertex AI Search App
                    filter='id: ANY("6c310a44-a02f-4210-a48d-53568afca394")',
                    max_result_count= 10,
                    serving_config=f"projects/{project_number}/locations/global/collections/default_collection/engines/{engine_id}/servingConfigs/default_search",
                    
                ),
            ),
        ]
    ),
)
response = client.generate_grounded_content(request)

# Handle the response
print(response)

“””Output”””

(base) apple@Apples-MacBook-Pro vertex % python xbdg.py 
candidates {
  content {
    role: "model"
    parts {
      text: "The document you provided does not contain the names of CHAPTER VI and CHAPTER VII.  It does, however, contain information about CHAPTER XVII, CHAPTER XXIX, and CHAPTER XXVIII.  😊 \n"
    }
  }
  grounding_score: 0.196530014
  grounding_metadata {
  }
}

Grounding API

###Grounding API (Searching in full datastore, I want use documentId as filter)
###It takes Datastore ID not engine_id/app_id


import vertexai

from vertexai.preview.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Tool,
    grounding,
)

# TODO(developer): Update and un-comment below lines
PROJECT_ID = "80"
data_store_id = "unstructured-jsonl-17"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-001")

tool = Tool.from_retrieval(
    grounding.Retrieval(
        grounding.VertexAISearch(
            datastore=data_store_id,
            project=PROJECT_ID,
            location="global",
            
        )
    )
)

prompt = “What are the names of CHAPTER VI and CHAPTER VII ?"
response = model.generate_content(
    prompt,
    tools=[tool],
    generation_config=GenerationConfig(
        temperature=0.0,
    ),
)

print(response.text)
print(response)

“””Output”””

The provided source does not contain information about the names of CHAPTER VI and CHAPTER VII. 

The source does mention CHAPTER X and CHAPTER XVII. 

candidates {
  content {
    role: "model"
    parts {
      text: "The provided source does not contain information about the names of CHAPTER VI and CHAPTER VII. \n\nThe source does mention CHAPTER X and CHAPTER XVII. \n"
    }
  }
  finish_reason: STOP
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
    probability_score: 0.166992188
    severity: HARM_SEVERITY_LOW
    severity_score: 0.245117188
  }
  safety_ratings {
    category: HARM_CATEGORY_DANGEROUS_CONTENT
    probability: NEGLIGIBLE
    probability_score: 0.34765625
    severity: HARM_SEVERITY_NEGLIGIBLE
    severity_score: 0.099609375
  }
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
    probability_score: 0.20703125
    severity: HARM_SEVERITY_LOW
    severity_score: 0.215820312
  }
  safety_ratings {
    category: HARM_CATEGORY_SEXUALLY_EXPLICIT
    probability: NEGLIGIBLE
    probability_score: 0.27734375
    severity: HARM_SEVERITY_LOW
    severity_score: 0.296875
  }
  grounding_metadata {
    retrieval_queries: "what are the names of CHAPTER VI and CHAPTER VII"
    grounding_chunks {
      retrieved_context {
        uri: "gs://doc-trying1/doc-trying1/Almost Done/CompaniesAct2013.pdf"
        title: "CompaniesAct2013"
        text: "(2) The Central Government may, by rules, prescribe the manner and the intervals in which the internal audit shall be conducted and reported to the Board. # CHAPTER X \n...\n154…………..”
      }
    }
    grounding_supports {
      segment {
        start_index: 98
        end_index: 149
        text: "The source does mention CHAPTER X and CHAPTER XVII."
      }
      grounding_chunk_indices: 0
      confidence_scores: 0.617767155
    }
  }
  avg_logprobs: -0.35637292554301603
}
usage_metadata {
  prompt_token_count: 11
  candidates_token_count: 31
  total_token_count: 42
  prompt_tokens_details {
    modality: TEXT
    token_count: 11
  }
  candidates_tokens_details {
    modality: TEXT
    token_count: 31
  }
}
model_version: "gemini-1.5-flash-001"

This is the document. Though both the above implementation doesn’t answer my simple question. I have used Layout Parser + Document Chunking. It’s not able to retrieve a basic chunk which was just a normal text in the pdf. How can I make it reliable? [PS: Grounding API searches in whole datastore as I wasn’t able to figure out how to use filter while querying]

Pub/Sub for Data Store Import: On console, When I ingest a metadata.jsonl. In activity, I can see 15 docs imported, it doesn't tell me for each document. I suppose that's the same with apis as well. I am interested in knowing for each file instead of 15 docs at once. (I can be highly wrong about all this)

Miscellaneous:

Since we have all the document metadata and its content indexed (embeddings though), is there a way to achieve on-the-fly keyword full-text search similar to Elasticsearch?

tl;dr-

Cost and how to make both the implementation robust?
How to achieve Pub Sub for Datastore at document level?
How to achieve filter rag_file_ids in grounding api?
Can we achieve on-the-fly keyword full-text search similar to Elasticsearch? Maybe some workaround?
If I am not using search app, instead using scenario 2, then I can create my own database for session persistence?

hardikitis

@MJane Can you answer ?