Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform project Compilation failure

When compiling dataform project we encounter this error :
{
    "error": {
        "code": 400,
        "message": "Retrieving remote files took too long; repository or diff is too large.",
        "status": "INVALID_ARGUMENT"
    }
}

We expect that the project compile and the api return a compilation id

knowing that :
dataform compile --timeout 4m  commad took : 25.99s
dataform compile --timeout 4m --json > graph.json command generate a 17MB json file

0 1 71
1 REPLY 1

Hi @mks0ff,

Welcome to Google Cloud Community!

The error you're seeing can be caused by several factors:

  • Retrieving remote files took too long: This indicates that Dataform had difficulty fetching the necessary files or information to complete the compilation process.

  • Repository or diff is too large: This suggests that either the repository or the changes (diff) between versions are too large for the API to process within the specified time limit.

  • HTTP Status Code 400 (INVALID_ARGUMENT): This typically means that the arguments you're sending to the API are invalid, likely due to the size or complexity of the data.

While your local dataform compile command completes in about 26 seconds and generates a 17MB JSON file, the remote API operates in a sandboxed environment with stricter resource constraints and no internet access. This can sometimes cause timeouts or performance issues.

Here’s what you can try to resolve the issue:

  1. Break Down the Compilation: Consider splitting your data into smaller chunks or files. You can also optimize your project to reduce the overall size of the graph or its dependencies.
  2. Review Dataform Core Package: Check the configuration and version of the Dataform core package. Outdated or incompatible configurations may contribute to slower performance.
  3. Monitor Repository Resource Limits: Ensure your repository isn’t hitting any resource limits (like memory or processing time) during compilation. You might need to adjust settings to better handle larger repositories.
  4. Investigate Compilation Failures: If the issue persists, examine logs for specific errors or bottlenecks that could indicate where the process is failing.
  5. Best Practices for Repositories: Review the best practices for structuring and managing Dataform repositories, as this can help streamline the compilation process and prevent issues like this.

Additionally, you may find this documentation useful for resolving common issues with Dataform.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.