Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Struggling to build a simple RAG solution

I'm trying to build a solution that accomplishes the following:

  • Passes text files from a GCS bucket to the embeddings API (I think the files will need to be chunked first? Not sure.)
  • Saves the returned embeddings into a .json file in the same GCS bucket
  • Loads the .json file into Vector search
  • Allows me to have multi-turn conversations with my data

So I guess the first question is, are the steps I've listed above the appropriate steps to build a RAG solution from data in a GCS bucket?

I've gone through several notebooks on the Google Gen AI Github repo. I can get those to work just fine, but I can't seem to get anywhere when I attempt to customize them to accomplish what I've listed above. Is anyone aware of any good step by step documentation or code samples that performs what I'm trying to do?

3 6 4,169
6 REPLIES 6

Hi Jason -- Google Cloud has a couple different offerings for building a RAG app. Based on your description above, Vertex AI Search & Conversation (VASC) might be a good pick. (This product went by a few other names, previously - Discovery Engine, Gen AI App Builder, Enterprise Search)

  • First, you can create a VASC Data Store. You give VASC a Cloud Storage bucket with your text files. No need to pre-chunk yourself. VASC will take care of processing + creating the underlying vector embeddings in your Data Store.  (Docs)
  • Then you can write code to query your VASC Data Store with regular text queries. This is the prompt augmentation step for RAG.
  • Once you get back search results and augment your prompt, you can then use the Gemini API on Vertex, for the multi turn chat. (Docs)

Here's a code example in Java Spring, from some recent experimentation I did.


import com.google.cloud.discoveryengine.v1.SearchRequest;
import com.google.cloud.discoveryengine.v1.SearchResponse;
import com.google.cloud.discoveryengine.v1.SearchServiceClient;
import com.google.cloud.discoveryengine.v1.SearchServiceSettings;
import com.google.cloud.discoveryengine.v1.ServingConfigName;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.preview.ChatSession;
import com.google.cloud.vertexai.generativeai.preview.GenerativeModel;
import com.google.cloud.vertexai.generativeai.preview.ResponseHandler;
...

 @PostMapping(value = "/chat", consumes = "application/json", produces = "application/json")
    public ChatMessage message(@RequestBody ChatMessage message) {
        String userPrompt = message.getPrompt();
        logger.info("💬 POST /chat, prompt: " + userPrompt);
        // 1 - Query Vertex AI Search (VASC aka discoveryengine API) for matching
        // documents
        String projectId = "YOUR_PROJECT_ID";
        String location = "global";
        String collectionId = "default_collection";
        String dataStoreId = "YOUR_VASC_DATASTORE";
        String servingConfigId = "default_search";
        String searchQuery = userPrompt;
        String endpoint = String.format("discoveryengine.googleapis.com:443", location);
        String augment = "";
        try {
            SearchServiceSettings settings = SearchServiceSettings.newBuilder().setEndpoint(endpoint).build();
            SearchServiceClient searchServiceClient = SearchServiceClient.create(settings);
            SearchRequest request = SearchRequest.newBuilder()
                    .setServingConfig(
                            ServingConfigName.formatProjectLocationCollectionDataStoreServingConfigName(
                                    projectId, location, collectionId, dataStoreId, servingConfigId))
                    .setQuery(searchQuery)
                    .setPageSize(10)
                    .build();
            SearchResponse response = searchServiceClient.search(request).getPage().getResponse();
            for (SearchResponse.SearchResult element : response.getResultsList()) {
                Struct derivedStructData = element.getDocument().getDerivedStructData();
                Map<String, Value> fields = derivedStructData.getFieldsMap();
                Value extractiveAnswersValue = fields.get("extractive_answers");
                ListValue listValue = extractiveAnswersValue.getListValue();
                Value firstValue = listValue.getValues(0);
                Struct structValue = firstValue.getStructValue();
                Map<String, Value> innerFields = structValue.getFieldsMap();
                Value contentValue = innerFields.get("content");
                String stringValue = contentValue.getStringValue();
                augment += stringValue;
            }
        } catch (Exception e) {
            logger.error("⚠️ Vertex AI ERROR: " + e);
        }
        // 2 - Use augmented prompt to query Gemini (Vertex AI API)
        String geminiPrompt = "You are a helpful car manual chatbot. Answer the car owner's question about their car. Human prompt: "
                + userPrompt
                + ",\n Use the following grounding data as context. This came from the relevant vehicle owner's manual: "
                + augment;
        logger.info("🔮 GEMINI PROMPT: " + geminiPrompt);
        String geminiLocation = "us-central1";
        String modelName = "gemini-pro";
        try {
            VertexAI vertexAI = new VertexAI(projectId, geminiLocation);
            GenerateContentResponse response;
            GenerativeModel model = new GenerativeModel(modelName, vertexAI);
            ChatSession chatSession = new ChatSession(model);
            response = chatSession.sendMessage(geminiPrompt);
            String strResp = ResponseHandler.getText(response);
            logger.info("🔮 GEMINI RESPONSE: " + strResp);
            message.setResponse(strResp);
        } catch (Exception e) {
            logger.error("⚠️ GEMINI ERROR: " + e);
        }
        return message;
    }
}

 

Hi,

using VASC from console it's possible to configure the option search type with  "search with a response" in the widget configuration tab.

Using the SearchServiceClient it's possible to enable this feature?  The SearchResponse object has a getSummary() method but returns an empty string

Thanks

[SOLVED]
Looking the curl call in the integration section I found the missing parameters. Here the example code

 

SearchRequest request =
                    SearchRequest.newBuilder()
                            .setServingConfig(
                                    ServingConfigName.formatProjectLocationCollectionDataStoreServingConfigName(
                                            projectId, location, collectionId, dataStoreId, servingConfigId))
                            .setQuery(searchQuery)
                            .setPageSize(5)
                            .setQueryExpansionSpec(SearchRequest.QueryExpansionSpec.newBuilder().setCondition(SearchRequest.QueryExpansionSpec.Condition.AUTO).build())
                            .setContentSearchSpec(
                                    SearchRequest.ContentSearchSpec.newBuilder()
                                                    .setSummarySpec(SearchRequest.ContentSearchSpec.SummarySpec.newBuilder().setSummaryResultCount(5)
                                                    .setModelPromptSpec(SearchRequest.ContentSearchSpec.SummarySpec.ModelPromptSpec.newBuilder().setPreamble(prompt).getDefaultInstanceForType())
                                                    .setModelSpec(SearchRequest.ContentSearchSpec.SummarySpec.ModelSpec.newBuilder().setVersion("preview").build())
                                                    .setIncludeCitations(true)
                                                    .setIgnoreAdversarialQuery(true)
                                            .build())
                                    .build())
                            .build();

 

edu
Bronze 1
Bronze 1

You can also use Vector Search with langchain, but in my test it got much much more expensive then Search & Conversation.

I'm using Vertex Generic Search with structured data. The moment one uses order_by, the results go haywire. 

Could anyone steer me in the correct direction? The moment I try to sort_by on a numeric field, the search results go haywire, and everything seems to be returned, in no order whatsoever.  The modelYear below is a numeric field. In the sample below I tried boosting but that also influences everything. 

# Refer to the `SearchRequest` reference for all supported fields:
    # https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.SearchRequest
    request = discoveryengine.SearchRequest(
        serving_config=serving_config,
        query=search_query,
        page_size=10,
        content_search_spec=content_search_spec,
        query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(
            condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.DISABLED,
        ),
        spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.MODE_UNSPECIFIED,
        ),
        # Optional: Boost search results based on conditions
        boost_spec=discoveryengine.SearchRequest.BoostSpec(
           condition_boost_specs=[
            discoveryengine.SearchRequest.BoostSpec.ConditionBoostSpec(
                condition="modelYear = 2023",
                boost=1
            ),
            ]
        )
                
        # Optional: Use fine-tuned model for this request
        # custom_fine_tuning_spec=discoveryengine.CustomFineTuningSpec(
        #     enable_search_adaptor=True
        # ),
        #order_by="modelYear desc",
        #filter="modelYear = 2025",
    )