Coming Soon! We’re launching a new sub-community within the Google Cloud Community dedicated to cloud security: The Google Cloud Security Community. In preparation for the launch, this site will be in read only mode from 22 September 12am PST - 23 September 7pm PST
These best practices reflect recommendations shared by a cross-functional team of seasoned Googlers. The practices are written to work for many GCP deployments, but as always use best judgment when implementing.
We are often asked for best practices for how to set-up projects in GCP to support scaling out different workloads running on BigQuery. A common practice for new Data Analytics customers is to store enterprise data as well as data ingestion & analytic workloads in the same project. This practice has a few unintended consequences including: creating a bottleneck in terms of resources (as BigQuery Slots and Quotas are allocated at the project level) as well as making it more difficult to allocate costs to specific departments or applications. So, we’re here to help with some high-level recommendations for project set-up!
Create Separate Projects for Data and Consumption
We recommend creating a project that will be primarily used for storage. Often this project will hold the end result of a complex ETL/ELT process that ingests data from numerous sources that need to be consumed by multiple business units and applications. Having all of your “Enterprise Data Warehouse” (EDW) data in one project helps simplify applying permissions to your enterprise data assets. From there, we recommend creating projects for specific business units or workloads that will query data from the data project. Creating separate projects for data consumers is beneficial as each BigQuery on-demand project has access to their own separate pool of 2000 BigQuery slots, fresh project based quotas and project based billing. In the case of Looker, it can look like the image below:
In this case, our “Data” projects store the data and the “Consumption” projects will be used to issue queries. Again, benefits include slots / resources being assigned at the project level so your Looker instances are not competing against your ML pipelines, for example. So, how do we implement this?
Create a project for data storage. Likely, this is already completed. See documentation for creating and managing projects here.
Create project(s) for Looker workloads. We might need to segment out a different project for dev, test, prod or even for different business units and applications. This allows separation of billing and allocation of slots for Looker workloads, in an on-demand environment.
Create a separate Looker Service Account in each of your “Looker Projects”.These projects will require the following roles:
BigQuery JobUser to issue queries.
BigQuery Data Editor for the use of persistent derived tables (PDTs).
Note: For more information on BigQuery roles, see documentation here.
Grant each Looker Service Account DataViewer in “Data Project”. You just created service accounts in the “consumption” projects.Now, iIn the “Data Project”, you will need to grant the Looker Service Account “BigQuery DataViewer” accessto the required datasets or tables.
a.)Note: BigQuery DataViewer can be applied at the project, dataset or table level. We always recommend the principle of least privilege when granting data access to resources.
b.) To do this for a dataset (screenshots below), select “Sharing” >> “Permissions” >> “Add Principal” and insert the Looker service account email you created in Step 3.
c.) To complete this at the project level, select the project in console > IAM & Admin > Grant Access > Add Principals
Optional: Create looker_scratch dataset in each “Looker Project”. If using Persistent Derived Tables (PDTs), we recommend placing the scratch schemas in the “Looker projects” as it contains all the compute resources as well application-specific workloads to that project. This has the advantage of simplifying permissions by only allowing Looker write access to its own project. However, if PDTs are being used by other applications or workloads, a case could be made to right back to the “Data Project”. However, if this is the case, we would generally consider creating views in BigQuery instead of using PDTs.
Create Connection in Looker.
Project ID: “Looker Project”
Dataset: “Data Project”
Service Account: Service Account created in “Looker Project”