Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Failed to import unstructured data into the DataStore To create document without content, content c

Hi, I'm trying to automate the creation of my datastore and import the data into it, I assembled the jsonl file as follows

 

{"_id": "d001", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}, "structData": {"title": "Como Cadastrar um administrador na revenda", "url": "gs://storage_processados_txt/Como cadastrar um administrador na revenda.txt"}}
{"_id": "d002", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar um Dominio.txt"}, "structData": {"title": "Como criar um Dominio", "url": "gs://storage_processados_txt/Como criar um Dominio.txt"}}
{"_id": "d003", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma empresa.txt"}, "structData": {"title": "Como criar uma empresa", "url": "gs://storage_processados_txt/Como criar uma empresa.txt"}}
{"_id": "d004", "content": {"mimeType": "text/plain", "uri": "gs://storage_processados_txt/Como criar uma revenda.txt"}, "structData": {"title": "Como criar uma revenda", "url": "gs://storage_processados_txt/Como criar uma revenda.txt"}}

 

I'm running the following script to import the data

 

from google.cloud import discoveryengine
from google.api_core.client_options import ClientOptions

# Caminho para a chave JSON da conta de serviço
GOOGLE_APPLICATION_CREDENTIALS = './chave.json'

project_id = "rodrigo-estudos"
location = "global" # Values: "global"
data_store_id = "data-store-gpt-01"
# Format: `gs://bucket/directory/object.json` or `gs://bucket/directory/*.json`
gcs_uri = "gs://clean/lista_arquivos_bucket_2.json"

client_options = ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
client = discoveryengine.DocumentServiceClient(client_options=client_options)

parent = client.branch_path(project=project_id, location=location, data_store=data_store_id, branch="default_branch")

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        input_uris=[gcs_uri],
        data_schema="custom",
    ),
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

operation = client.import_documents(request=request)
response = operation.result()

print("Importação concluída:", response)

 

and I'm getting the following error message

 

Importação concluída: error_samples {
  code: 3
  message: "To create document without content, content config of data store must be NO_CONTENT."
  details {
    type_url: "type.googleapis.com/google.rpc.ResourceInfo"
CONTENT."
  details {
    type_url: "type.googleapis.com/google.rpc.ResourceInfo"
    value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:2"
  }
}
error_samples {
  code: 3
  message: "To create document without content, content config of data store must be NO_CONTENT."
  details {
    value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:3"
  }
}
error_samples {
  code: 3
  message: "To create document without content, content config of data store must be NO_CONTENT."
  details {
    type_url: "type.googleapis.com/google.rpc.ResourceInfo"
    value: "\0229gs://clean-rodrigo-estudos/lista_arquivos_bucket_2.json:4"
  }
}
error_config {
  gcs_prefix: "gs://748489500091_us_import_custom/errors44188869074447421"
}

 


I confess that I have already read the documentation on the data import process but I am having difficulty assembling the import file, and I would like to know if I can simply pass the path to the bucket where the txt files are directly



0 3 386
3 REPLIES 3