Dialogflow CX - Manage structed csv data in Data Stores

Hi Team,

I have a CX agent which is connected to a Conversation App(which is linked to a data store -> cloud storage, where I have a csv file). This CSV file has 60+ FAQs and its in the correct format (question, answer). Assume, below is my csv file(GenAI FAQ.csv) and its been referenced as a structured csv in data store.

question,answer
What is GenAI?,Generative Artificial Intelligence refers to AI systems that have the ability to generate new content using techniques like neural networks and machine learning.
How does GenAI work?,GenAI works by training models on large datasets and learning the underlying patterns and structures within the data. These models can then generate new content by extrapolating from the learned patterns.

I need guidance on below scenarios:
1) In later stage, If I want to update the question or answer for an already existing FAQ #1(what is GenAI), what is the suggested way to do?
2) Is there any automated way for importing data in data store. Like If I upload any new FAQs in the storage bucket, Data store should automatically import and train?
3) I'm not sure this exist, but still checking. Is there a way to delete/remove an existing incorrect document from data store?

1 8 215
8 REPLIES 8

Let me reply to your questions:

1) In later stage, If I want to update the question or answer for an already existing FAQ #1(what is GenAI), what is the suggested way to do? You can update your bucket content and it will trigger a re-index of your data store to read the up-to-date information: https://cloud.google.com/dialogflow/vertex/docs/concept/data-store#import
2) Is there any automated way for importing data in data store. Like If I upload any new FAQs in the storage bucket, Data store should automatically import and train? There is an API that you can use to add documents to your datastore: https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/projects.locations.co...
3) I'm not sure this exist, but still checking. Is there a way to delete/remove an existing incorrect document from data store? You can delete a single document only using the API for now: https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/projects.locations.co... In the GUI you can only delete all the documents (purge)

Thanks @xavidop for the prompt response. I will look into it.

Hi @Vinoth1097  

yes we can automate the process that will directly import the documents to data store once it received in GCS 

i recently worked on the same use case

please refer this module in this module you can also delete the documents

https://github.com/googleapis/google-cloud-node/blob/main/packages/google-cloud-discoveryengine/samp...

it will definitely help you

Best

Piyush Garg

Thanks @piyush_garg it's really helpful.

@xavidop, I have a doubt in add/update data that is mentioned in the documentation. In this link (https://cloud.google.com/dialogflow/vertex/docs/concept/data-store#import), there's a point 

The following Data Import Options are available:
Add/Update Data: The provided documents are added to the data store. If a new document has the same ID as an old document, the new document replaces the old document.

I have a csv file stored in storage bucket. This file is then imported to my data store(structured-csv). In the csv, i had only 2 columns (question, answer). There's no column like id. I've updated the answer for one of the question and reuploaded the csv file with same name and overwritten the existing object. And then imported the same file in data store. I thought, this would update the existing document, but instead it created a new document entry. I don't understand how to update answers for existing question. And not sure how can we define the document id in csv file. Can you please help with this scenario? Currently for updating the answers, i'm performing (Full) import rather than (Incremental) which deletes all data and reimport again.

Hi,

The ID of a document is an internal ID. By this, I mean that the id is not the name of the document.

The id of the document is the id you can find in your data store (right column):

xavidop_0-1711362622354.png

The full ID of a document has this format:

projects/{project}/locations/{location}/collections/{collection}/dataStores/{dataStore}/branches/{branch}/documents/{document}

Another idea that you can do is to upload a document with only the new changes. if you want to update an existing doc, you can also use the patch version: https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/projects.locations.co...

Best,

Xavi

If i understand correct,
To add new faqs, we need a new csv file > add question answer pairs > upload to bucket > import to data store by pointing that bucket file.
To update existing faq in the data store, usage of the API (through patch) is the way to achieve ?

Hi,

if it is a NEW file, you should use the create method: https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/projects.locations.co...

if it is an update of an existing file you should use the patch file: https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/projects.locations.co...

Best,

Xavi