Hi, I'm trying to automate the creation of my datastore and import the data into it, I assembled the jsonl file as follows
{"_id": "d001", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}, "structData": {"title": "Como Cadastrar um administrador na revenda", "url": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}}
{"_id": "d002", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar um Dominio.txt"}, "structData": {"title": "Como criar um Dominio", "url": "gs://storage_processados_txt/Como criar um Dominio.txt"}}
{"_id": "d003", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma empresa.txt"}, "structData": {"title": "Como criar uma empresa", "url": "gs://storage_processados_txt/Como criar uma empresa.txt"}}
{"_id": "d004", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma revenda.txt"}, "structData": {"title": "Como criar uma revenda", "url": "gs://storage_processados_txt/Como criar uma revenda.txt"}}
I'm running the following script to import the data
from google.cloud import discoveryengine
from google.api_core.client_options import ClientOptions
# Caminho para a chave JSON da conta de serviço
GOOGLE_APPLICATION_CREDENTIALS = './chave.json'
project_id = "rodrigo-estudos"
location = "global" # Values: "global"
data_store_id = "data-store-gpt-01"
# Format: `gs://bucket/directory/object.json` or `gs://bucket/directory/*.json`
gcs_uri = "gs://clean/lista_arquivos_bucket_2.json"
client_options = ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
client = discoveryengine.DocumentServiceClient(client_options=client_options)
parent = client.branch_path(project=project_id, location=location, data_store=data_store_id, branch="default_branch")
request = discoveryengine.ImportDocumentsRequest(
parent=parent,
gcs_source=discoveryengine.GcsSource(
input_uris=[gcs_uri],
data_schema="custom",
),
reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)
operation = client.import_documents(request=request)
response = operation.result()
print("Importação concluída:", response)
and I'm getting the following error message
Importação concluída: error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:2"
}
}
error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:3"
}
}
error_samples {
code: 3
message: "To create document without content, content config of data store must be NO_CONTENT."
details {
type_url: "type.googleapis.com/google.rpc.ResourceInfo"
value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:4"
}
}
error_config {
gcs_prefix: "gs://748489500091_us_import_custom/errors44188869074447421"
}
I confess that I have already read the documentation on the data import process but I am having difficulty assembling the import file, and I would like to know if I can simply pass the path to the bucket where the txt files are directly
Hi ,
Welcome and thank you for reaching out to our community.
I understand that you are having challenges importing unstructured data and we appreciate you providing the reference document that you are using. I've looked into your use case and it seems that you may need to specify the "data_schema" in your code, instead of using "custom", as written in the sample code, you can try using "document".
Adding this reference guide as it contains useful information for unstructured data stores.
Hope this helps.
i'am getting the same error. i'am using the big query console to create a table and populate it
here is the SCHEMA
[
{
"name": "id",
"mode": "REQUIRED",
"type": "STRING",
"fields": []
},
{
"name": "jsonData",
"mode": "NULLABLE",
"type": "STRING",
"fields": []
}
]
and the jsonl
{"id":"unique-id-001","jsonData":"{\"nomproduit\":\"Laptop\",\"prix\":1200.99,\"sku\":\"LPT12345\"}"}
{"id":"unique-id-002","jsonData":"{\"nomproduit\":\"Smartphone\",\"prix\":799.49,\"sku\":\"SMP67890\"}"}
{"id":"unique-id-003","jsonData":"{\"nomproduit\":\"Headphones\",\"prix\":199.99,\"sku\":\"HD789123\"}"}
{"id":"unique-id-004","jsonData":"{\"nomproduit\":\"Tablet\",\"prix\":450.00,\"sku\":\"TBL98765\"}"}
{"id":"unique-id-005","jsonData":"{\"nomproduit\":\"Smartwatch\",\"prix\":249.99,\"sku\":\"SW123456\"}"}
the table is created successfully , but when i try to import Data into a data store using big query, i get this error
To create document without content, content config of data store must be NO_CONTENT.
Hello,
Thank you for contacting the Google Cloud Community.
I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here[1]. Our support team will prioritize your request and provide you with the assistance you need.
For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!
[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |