Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Loading data from Datastore exports taking time

I have a table in data store with about 15k records. It's reference data, so the same table is in A-production and B-test. I have a script to export it to cloud storage then import to bigquery. On B-test, it takes a few seconds to export to cloud storage. On production it takes minutes if it finishes at all. I can see the files getting created on production, so it feels like something is stuck finalizing the export. The export task just stays in the PROCESSING state.

Solved Solved
1 1 619
1 ACCEPTED SOLUTION

The issue you're encountering with Google Cloud Datastore exports taking significantly longer in your production environment compared to your test environment is not uncommon. There are several factors that could contribute to this difference in performance. Here are some steps and considerations to help you diagnose and potentially resolve the issue:

1. Investigate the Production Environment:

  • Data Volume and Complexity: Compare the size, complexity, and number of entities between the production and test environments. Larger or more complex datasets in production can lead to longer processing times.
  • Datastore Indexing: Ensure that the production environment has proper indexing. Inefficient indexing can slow down data retrieval and export processes.
  • Resource Allocation: Assess if the production environment has adequate CPU, memory, and network resources compared to the test environment.
  • Cloud Storage Limits: Check for any storage quota limitations or throttling in the production environment that might impede data transfer.
  • Concurrent Operations: Look for other operations in production that might be consuming significant resources, affecting the export process.
  • Logs and Monitoring: Analyze logs and monitoring data for any errors or warnings that could shed light on the export issues.

2. Retry the Export:

  • Re-run the Export Script: Temporary issues can sometimes disrupt processes. Re-running the export might yield different results.
  • Incremental Exports: If the dataset is large, consider exporting in smaller batches to isolate potential problem areas.
  • Task Management: If the export remains stuck, cancel the current task and initiate a new one.

3. Compare Configuration and Permissions:

  • Script Configurations: Ensure that script configurations and permissions in the production environment match those in the test environment.
  • Network Configuration: Verify that network settings, such as firewall rules, are not hindering the export process.

    5. Additional Considerations:

    • Datastore Maintenance: Regular maintenance, like cleaning up unused indexes and optimizing entity designs, can improve datastore performance.

View solution in original post

1 REPLY 1

The issue you're encountering with Google Cloud Datastore exports taking significantly longer in your production environment compared to your test environment is not uncommon. There are several factors that could contribute to this difference in performance. Here are some steps and considerations to help you diagnose and potentially resolve the issue:

1. Investigate the Production Environment:

  • Data Volume and Complexity: Compare the size, complexity, and number of entities between the production and test environments. Larger or more complex datasets in production can lead to longer processing times.
  • Datastore Indexing: Ensure that the production environment has proper indexing. Inefficient indexing can slow down data retrieval and export processes.
  • Resource Allocation: Assess if the production environment has adequate CPU, memory, and network resources compared to the test environment.
  • Cloud Storage Limits: Check for any storage quota limitations or throttling in the production environment that might impede data transfer.
  • Concurrent Operations: Look for other operations in production that might be consuming significant resources, affecting the export process.
  • Logs and Monitoring: Analyze logs and monitoring data for any errors or warnings that could shed light on the export issues.

2. Retry the Export:

  • Re-run the Export Script: Temporary issues can sometimes disrupt processes. Re-running the export might yield different results.
  • Incremental Exports: If the dataset is large, consider exporting in smaller batches to isolate potential problem areas.
  • Task Management: If the export remains stuck, cancel the current task and initiate a new one.

3. Compare Configuration and Permissions:

  • Script Configurations: Ensure that script configurations and permissions in the production environment match those in the test environment.
  • Network Configuration: Verify that network settings, such as firewall rules, are not hindering the export process.

    5. Additional Considerations:

    • Datastore Maintenance: Regular maintenance, like cleaning up unused indexes and optimizing entity designs, can improve datastore performance.