Hi, I'm trying to automate the creation of my datastore and import the data into it, I assembled the jsonl file as follows
{"_id": "d001", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}, "structData": {"title": "Como Cadastrar um administrador na revenda", "url": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}}
{"_id": "d002", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar um Dominio.txt"}, "structData": {"title": "Como criar um Dominio", "url": "gs://storage_processados_txt/Como criar um Dominio.txt"}}
{"_id": "d003", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma empresa.txt"}, "structData": {"title": "Como criar uma empresa", "url": "gs://storage_processados_txt/Como criar uma empresa.txt"}}
{"_id": "d004", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma revenda.txt"}, "structData": {"title": "Como criar uma revenda", "url": "gs://storage_processados_txt/Como criar uma revenda.txt"}}
I'm running the following script to import the data
from google.cloud import discoveryengine
from google.api_core.client_options import ClientOptions
# Caminho para a chave JSON da conta de serviço
GOOGLE_APPLICATION_CREDENTIALS = './chave.json'
project_id = "rodrigo-estudos"
location = "global" # Values: "global"
data_store_id = "data-store-gpt-01"
# Format: `gs://bucket/directory/object.json` or `gs://bucket/directory/*.json`
gcs_uri = "gs://clean/lista_arquivos_bucket_2.json"
client_options = ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
client = discoveryengine.DocumentServiceClient(client_options=client_options)
parent = client.branch_path(project=project_id, location=location, data_store=data_store_id, branch="default_branch")
request = discoveryengine.ImportDocumentsRequest(
parent=parent,
gcs_source=discoveryengine.GcsSource(
input_uris=[gcs_uri],
data_schema="custom",
),
reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)
operation = client.import_documents(request=request)
response = operation.result()
print("Importação concluída:", response)
and I'm getting the following error message
Importação concluída: error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:2"
}
}
error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:3"
}
}
error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:4"
}
}
error_config {
gcs_prefix: "gs://748489500091_us_import_custom/errors44188869074447421"
}
I confess that I have already read the documentation on the data import process but I am having difficulty assembling the import file, and I would like to know if I can simply pass the path to the bucket where the txt files are directly
Hi ,
Welcome and thank you for reaching out to our community.
I understand that you are having challenges importing unstructured data and we appreciate you providing the reference document that you are using. I've looked into your use case and it seems that you may need to specify the "data_schema" in your code, instead of using "custom", as written in the sample code, you can try using "document".
Adding this reference guide as it contains useful information for unstructured data stores.
Hope this helps.
i'am getting the same error. i'am using the big query console to create a table and populate it
here is the SCHEMA
[
{
"name": "id",
"mode": "REQUIRED",
"type": "STRING",
"fields": []
},
{
"name": "jsonData",
"mode": "NULLABLE",
"type": "STRING",
"fields": []
}
]
and the jsonl
{"id":"unique-id-001","jsonData":"{\"nomproduit\":\"Laptop\",\"prix\":1200.99,\"sku\":\"LPT12345\"}"}
{"id":"unique-id-002","jsonData":"{\"nomproduit\":\"Smartphone\",\"prix\":799.49,\"sku\":\"SMP67890\"}"}
{"id":"unique-id-003","jsonData":"{\"nomproduit\":\"Headphones\",\"prix\":199.99,\"sku\":\"HD789123\"}"}
{"id":"unique-id-004","jsonData":"{\"nomproduit\":\"Tablet\",\"prix\":450.00,\"sku\":\"TBL98765\"}"}
{"id":"unique-id-005","jsonData":"{\"nomproduit\":\"Smartwatch\",\"prix\":249.99,\"sku\":\"SW123456\"}"}
the table is created successfully , but when i try to import Data into a data store using big query, i get this error
To create document without content, content config of data store must be NO_CONTENT.
Hello,
Thank you for contacting the Google Cloud Community.
I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here[1]. Our support team will prioritize your request and provide you with the assistance you need.
For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!
[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases
Im having a similar issue...
I'm not sure if this helps anyone but I had the same issue when sending an API call to my data store from my flask app.
"in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 To create document without content, content config of data store must be NO_CONTENT."
Here's what I found and how I solved the issue. Firstly don't waste your time like I did looking for any kind of content config settings anywhere because they are immutable. It is determined when the data store is first created and cannot be changed later. So when I set up my data store I set it up as unstructured data JSONL format, this meant that when I created this data store, Agent Builder set its content_config to CONTENT_REQUIRED behind the scenes, because it expects every entry to have file content. SO the API is rejecting the call because I am trying to add a content-less entry to a data store that demands content for every entry (but I am not sending empty content I am sending editing instructions/content).
Basically the only way around it is to create a new data store that is specifically designed to handle metadata-only or structured entries, so a Structured Data Store, not an unstructured one. Because a Structured Data Store has its content_config set to No_CONTENT by default.
This is the ONLY work around I could figure out for this issue.....
Good Luck!
Aivee