Struggling to build a simple RAG solution

JasonC · 01-31-2024 02:59 PM

I'm trying to build a solution that accomplishes the following:

Passes text files from a GCS bucket to the embeddings API (I think the files will need to be chunked first? Not sure.)
Saves the returned embeddings into a .json file in the same GCS bucket
Loads the .json file into Vector search
Allows me to have multi-turn conversations with my data

So I guess the first question is, are the steps I've listed above the appropriate steps to build a RAG solution from data in a GCS bucket?

I've gone through several notebooks on the Google Gen AI Github repo. I can get those to work just fine, but I can't seem to get anywhere when I attempt to customize them to accomplish what I've listed above. Is anyone aware of any good step by step documentation or code samples that performs what I'm trying to do?

MeganOKeefe

Hi Jason -- Google Cloud has a couple different offerings for building a RAG app. Based on your description above, Vertex AI Search & Conversation (VASC) might be a good pick. (This product went by a few other names, previously - Discovery Engine, Gen AI App Builder, Enterprise Search)

First, you can create a VASC Data Store. You give VASC a Cloud Storage bucket with your text files. No need to pre-chunk yourself. VASC will take care of processing + creating the underlying vector embeddings in your Data Store. (Docs)
Then you can write code to query your VASC Data Store with regular text queries. This is the prompt augmentation step for RAG.
Once you get back search results and augment your prompt, you can then use the Gemini API on Vertex, for the multi turn chat. (Docs)

Here's a code example in Java Spring, from some recent experimentation I did.


import com.google.cloud.discoveryengine.v1.SearchRequest;
import com.google.cloud.discoveryengine.v1.SearchResponse;
import com.google.cloud.discoveryengine.v1.SearchServiceClient;
import com.google.cloud.discoveryengine.v1.SearchServiceSettings;
import com.google.cloud.discoveryengine.v1.ServingConfigName;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.preview.ChatSession;
import com.google.cloud.vertexai.generativeai.preview.GenerativeModel;
import com.google.cloud.vertexai.generativeai.preview.ResponseHandler;
...

 @PostMapping(value = "/chat", consumes = "application/json", produces = "application/json")
    public ChatMessage message(@RequestBody ChatMessage message) {
        String userPrompt = message.getPrompt();
        logger.info("💬 POST /chat, prompt: " + userPrompt);
        // 1 - Query Vertex AI Search (VASC aka discoveryengine API) for matching
        // documents
        String projectId = "YOUR_PROJECT_ID";
        String location = "global";
        String collectionId = "default_collection";
        String dataStoreId = "YOUR_VASC_DATASTORE";
        String servingConfigId = "default_search";
        String searchQuery = userPrompt;
        String endpoint = String.format("discoveryengine.googleapis.com:443", location);
        String augment = "";
        try {
            SearchServiceSettings settings = SearchServiceSettings.newBuilder().setEndpoint(endpoint).build();
            SearchServiceClient searchServiceClient = SearchServiceClient.create(settings);
            SearchRequest request = SearchRequest.newBuilder()
                    .setServingConfig(
                            ServingConfigName.formatProjectLocationCollectionDataStoreServingConfigName(
                                    projectId, location, collectionId, dataStoreId, servingConfigId))
                    .setQuery(searchQuery)
                    .setPageSize(10)
                    .build();
            SearchResponse response = searchServiceClient.search(request).getPage().getResponse();
            for (SearchResponse.SearchResult element : response.getResultsList()) {
                Struct derivedStructData = element.getDocument().getDerivedStructData();
                Map<String, Value> fields = derivedStructData.getFieldsMap();
                Value extractiveAnswersValue = fields.get("extractive_answers");
                ListValue listValue = extractiveAnswersValue.getListValue();
                Value firstValue = listValue.getValues(0);
                Struct structValue = firstValue.getStructValue();
                Map<String, Value> innerFields = structValue.getFieldsMap();
                Value contentValue = innerFields.get("content");
                String stringValue = contentValue.getStringValue();
                augment += stringValue;
            }
        } catch (Exception e) {
            logger.error("⚠️ Vertex AI ERROR: " + e);
        }
        // 2 - Use augmented prompt to query Gemini (Vertex AI API)
        String geminiPrompt = "You are a helpful car manual chatbot. Answer the car owner's question about their car. Human prompt: "
                + userPrompt
                + ",\n Use the following grounding data as context. This came from the relevant vehicle owner's manual: "
                + augment;
        logger.info("🔮 GEMINI PROMPT: " + geminiPrompt);
        String geminiLocation = "us-central1";
        String modelName = "gemini-pro";
        try {
            VertexAI vertexAI = new VertexAI(projectId, geminiLocation);
            GenerateContentResponse response;
            GenerativeModel model = new GenerativeModel(modelName, vertexAI);
            ChatSession chatSession = new ChatSession(model);
            response = chatSession.sendMessage(geminiPrompt);
            String strResp = ResponseHandler.getText(response);
            logger.info("🔮 GEMINI RESPONSE: " + strResp);
            message.setResponse(strResp);
        } catch (Exception e) {
            logger.error("⚠️ GEMINI ERROR: " + e);
        }
        return message;
    }
}

and_pra

Hi,

using VASC from console it's possible to configure the option search type with "search with a response" in the widget configuration tab.

Using the SearchServiceClient it's possible to enable this feature? The SearchResponse object has a getSummary() method but returns an empty string

Thanks

and_pra

[SOLVED]
Looking the curl call in the integration section I found the missing parameters. Here the example code

SearchRequest request =
                    SearchRequest.newBuilder()
                            .setServingConfig(
                                    ServingConfigName.formatProjectLocationCollectionDataStoreServingConfigName(
                                            projectId, location, collectionId, dataStoreId, servingConfigId))
                            .setQuery(searchQuery)
                            .setPageSize(5)
                            .setQueryExpansionSpec(SearchRequest.QueryExpansionSpec.newBuilder().setCondition(SearchRequest.QueryExpansionSpec.Condition.AUTO).build())
                            .setContentSearchSpec(
                                    SearchRequest.ContentSearchSpec.newBuilder()
                                                    .setSummarySpec(SearchRequest.ContentSearchSpec.SummarySpec.newBuilder().setSummaryResultCount(5)
                                                    .setModelPromptSpec(SearchRequest.ContentSearchSpec.SummarySpec.ModelPromptSpec.newBuilder().setPreamble(prompt).getDefaultInstanceForType())
                                                    .setModelSpec(SearchRequest.ContentSearchSpec.SummarySpec.ModelSpec.newBuilder().setVersion("preview").build())
                                                    .setIncludeCitations(true)
                                                    .setIgnoreAdversarialQuery(true)
                                            .build())
                                    .build())
                            .build();

edu

You can also use Vector Search with langchain, but in my test it got much much more expensive then Search & Conversation.

svanschalkwyk

I'm using Vertex Generic Search with structured data. The moment one uses order_by, the results go haywire.

svanschalkwyk

Could anyone steer me in the correct direction? The moment I try to sort_by on a numeric field, the search results go haywire, and everything seems to be returned, in no order whatsoever. The modelYear below is a numeric field. In the sample below I tried boosting but that also influences everything.

# Refer to the `SearchRequest` reference for all supported fields:
    # https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.SearchRequest
    request = discoveryengine.SearchRequest(
        serving_config=serving_config,
        query=search_query,
        page_size=10,
        content_search_spec=content_search_spec,
        query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(
            condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.DISABLED,
        ),
        spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.MODE_UNSPECIFIED,
        ),
        # Optional: Boost search results based on conditions
        boost_spec=discoveryengine.SearchRequest.BoostSpec(
           condition_boost_specs=[
            discoveryengine.SearchRequest.BoostSpec.ConditionBoostSpec(
                condition="modelYear = 2023",
                boost=1
            ),
            ]
        )
                
        # Optional: Use fine-tuned model for this request
        # custom_fine_tuning_spec=discoveryengine.CustomFineTuningSpec(
        #     enable_search_adaptor=True
        # ),
        #order_by="modelYear desc",
        #filter="modelYear = 2025",
    )