Hello Google Cloud Community,
I am facing a persistent issue with Google Cloud Document AI batch processing, even after setting up a brand new, clean project. I am encountering a `ValueError` when running a Python script, and I've also noticed a limitation in the Google Cloud Console UI for the Document AI processor.
**1. Overview of the Problem:**
I'm unable to perform asynchronous batch processing with Document AI.
* **Python Script Error:** When running my Python script in Cloud Shell, I get the following specific error: `ValueError: Unknown field for GcsDocuments: gcs_uri`.
* **Console UI Limitation:** In the Google Cloud Console for my Document AI processor, only a "Test" button is displayed. I cannot find or click "Start Batch Processing" or other typical operation buttons, which significantly restricts UI-based management.
**2. Project and Resource Details:**
* **Google Cloud Project ID:** `docai-test-XXXXXX-new` (Project Number: `619893412057`)
* **Document AI Processor Type:** Document OCR
* **Processor Name:** `document-ocr-XXXXXX-2025`
* **Processor ID:** `bf52b8e242a187ff`
* **Processor Region:** `us`
* **Input GCS Bucket Name:** `docai-new-input-XXXXXX-2025-final`
* **Input File URI:** `gs://docai-new-input-XXXXXX-2025-final/ZZZ07_600dpi.pdf` (This is a ~75MB PDF file.)
**3. Steps Taken and Observations:**
I have thoroughly troubleshooted this issue, including a full environment reset:
* **New, Clean Project:** I created a completely new Google Cloud Project (`docai-test-XXXXXX-new`) and performed all setup (API enablement, new Document OCR processor, new GCS bucket, PDF upload) within this fresh environment.
* **Python Script & Cloud Shell:**
* My Python script uses `gcs_documents=documentai.GcsDocuments(gcs_uri=gcs_input_uri)`, which should be correct for the current API.
* I ensured all old script files were removed (`rm -f`) and then re-uploaded the latest script to a fresh Cloud Shell session tied to the new project.
* The Cloud Shell environment runs Python 3.12.
* The Document AI API is enabled.
* **Processor Type:** Switched from "Form Parser" to "Document OCR".
* **Console UI:** The "Start Batch Processing" button is consistently missing, showing only "Test".
**Screenshot:**
For illustration, please see the attached screenshot showing the Document AI processor page in the Google Cloud Console. You can observe that only the "Test" button is available, with no option to "Start Batch Processing" or other management features.
**4. Full Python Script Code (for reference):**
```python
import os
from urllib.parse import urlparse
from google.cloud import documentai
from google.api_core.client_options import ClientOptions
from google.cloud import storage
# --- My Configuration ---
PROJECT_ID = "docai-test-akihiro-new"
LOCATION = "us"
PROCESSOR_ID = "bf52b8e242a187ff"
GCS_INPUT_FILE_URI = "gs://docai-new-input-XXXXXX-2025-final/ZZZ07_600dpi.pdf"
GCS_OUTPUT_PREFIX = "processed_results/"
def batch_process_document(project_id, location, processor_id, gcs_input_uri, gcs_output_prefix):
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
client = documentai.DocumentProcessorServiceClient(client_options=opts)
processor_name = client.processor_path(project_id, location, processor_id)
parsed_uri = urlparse(gcs_input_uri)
gcs_input_bucket = parsed_uri.netloc
input_config = documentai.BatchDocumentsInputConfig(
gcs_documents=documentai.GcsDocuments(
gcs_uri=gcs_input_uri
)
)
gcs_output_bucket_uri = f"gs://{gcs_input_bucket}/{gcs_output_prefix}"
output_config = documentai.DocumentOutputConfig(
gcs_output_config=documentai.DocumentOutputConfig.GcsOutputConfig(
gcs_uri=gcs_output_bucket_uri
)
)
request = documentai.BatchProcessRequest(
name=processor_name,
input_documents=input_config,
document_output_config=output_config,
)
print(f"Document AI で非同期処理を開始します...")
try:
operation = client.batch_process_documents(request=request)
print(f"処理が開始されました。Operation ID: {operation.operation.name}")
print("完了を待機中... (これには時間がかかる場合があります)")
operation.result()
print("Document AI Batch Processing が完了しました!")
print(f"結果は GCS バケット '{gcs_input_bucket}' の '{gcs_output_prefix}' 以下に保存されています。")
storage_client = storage.Client()
blobs_in_output = storage_client.list_blobs(gcs_input_bucket, prefix=gcs_output_prefix)
found_result_files = False
print("\nGCSの出力パスで結果ファイルを探しています...")
for blob_item in blobs_in_output:
if blob_item.name.endswith(".json"):
print(f"-> 結果ファイルが見つかりました: gs://{gcs_input_bucket}/{blob_item.name}")
found_result_files = True
if not found_result_files:
print("注意: 指定されたGCS出力パスに結果ファイルが見つかりませんでした。Cloud LoggingとGCSを確認してください。")
except Exception as e:
print(f"Document AI Batch Processing 中にエラーが発生しました: {e}")
print("Cloud Logging で詳細なエラーログを確認してください。")
print("特に、Document AI プロセッサが Cloud Storage バケットへのアクセス権を持っているか確認してください。")
if __name__ == "__main__":
batch_process_document(
PROJECT_ID,
LOCATION,
PROCESSOR_ID,
GCS_INPUT_FILE_URI,
GCS_OUTPUT_PREFIX
)
5. Seeking Assistance:
I suspect this might be an issue related to the Cloud Shell environment itself, client library versioning, or possibly a configuration/permission issue with the Document AI processor that affects both API calls and the Console UI.
Any guidance or insights from the community would be highly appreciated.
Thank you