Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform Fundamental question

Hi all and @ms4446 

I have a very fundamental question in Dataform.

Let us say I have the Dataform open in my GCP. Several users (say 10) users are accessing same repository. Each of the 10 users create a workspace. They have the same dataset/project in each workspace

Now they start debugging with same queries and in process run the full execution and access BigQuery to see if tables are being created.

Can we do as I described?

Because each user is creating tables in BigQuery by running execution , will this have any  interference? 

0 5 146
5 REPLIES 5

Can someone please help, if possible @ms4446 

It has been less than a full day, and directly pinging a community leader will not expedite the process.

To answer your question, yes— 10 unique users can concurrently develop and execute the same workflows. Each workspace is effectively a repo branch that you're checking out and developing in. If you're not familiar with GIT fundamentals, that maybe the first step. Within the Workflow Execution Log tab, you will see all executions, each associated with an entry in the Source column that identifies the user who triggered the workflow.

However, is that the best approach? Likely not. If two users execute the same workflow, the most recent execution will overwrite the previous one, leading to conflicts during development. In most cases, multiple developers should not be debugging the same SQLX query simultaneously while executing workflows. Instead, an alternate approach is to have users "run" the query to analyze the output or creating temp tables without triggering full execution.

 

Hi @ayushmaheshwari,

Welcome to Google Cloud Community!

I agree with @DataEngineer, users can execute workflows concurrently in Dataform. The user should avoid using the same tables or datasets that may lead to conflicts or overwrites. 

Note: A high volume of queries from multiple users may also impact the performance of BigQuery and Dataform. To help you mitigate performance issues, you may refer to this documentation.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

I can't help but find it comical that your post simply says you agreed with me and then told the original poster to accept your post as the "Solution". I'm assuming it just a template response but I couldn't help but laugh.

You can do as described.

Like mentioned: "start execution" from within your workspace will overwrite to the live (production) tables by default.

Go to Settings in the top nav bar of your Repository
Under "Workspace compilation overrides", click Edit

Fill in `${workspaceName}` under either Schema suffix or Table prefix

This will create a dataset with `_yourworkspacename` appended to your default output dataset, or `yourworkspacename_` prefixed to your output tables. See https://cloud.google.com/dataform/docs/workspace-compilation-overrides