This content, written by Ken Cunanan, was initially posted in Looker Blog on Jul 25, 2018. The content is subject to limited support.
At Google Cloud Next ‘18 today, Google took a step toward more accessible machine learning with the announcement of a new feature for Google BigQuery called BigQuery Machine Learning (BQML). BQML is a fully managed service that makes it easier for data scientists to build and train machine learning models in BigQuery using SQL syntax.
Most organizations have because the data science workflow requires a lot of resources, and the largest resource consumption often has little to do with the actual discipline of data science or the creation of machine models.
A typical data science workflow can look like this:
You might guess that the most important part of the workflow, building and validating a data model, takes the most time in a data scientist’s workflow. However, the breakdown of time actually looks like this:
Frequently the most interesting portion of a data scientist’s job (really their core competency as data scientists)—analyzing and interpreting data—is only a small fraction of their day-to-day responsibilities. Much more of their time is spent munging and cleaning dirty data. In fact, “dirty data” was by far the biggest barrier faced by respondents in .
And this is because data environments within many companies are messy. Data is strewn across various tools and departments, so data scientists spend a vast amount time simply preparing the dataset for their analysis and moving that data into a place where they can do their work.
Google, one of the leaders for AI and machine learning, is leveraging their BigQuery database solution to help address this problem.
With BigQuery Machine Learning data scientists can now build machine learning (ML) models directly where their data lives, in Google BigQuery, which eliminates the need to move the data to another data science environment for certain types of predictive models.
Data scientists will still want to leverage dedicated data science environments such as R-Studio and Jupyter Notebooks for more complex analyses. However, for common types of linear and logistic regression models, a data scientist can dramatically reduce time spent moving and consolidating data by iterating on their machine learning models directly in BigQuery.
Once the model has been built and is ready for testing, a data scientist must ensure that the outputs of the model are piped back into the database and made surfaceable for business users. Traditionally, this step might require pushing the data back into a data warehouse or setting up a new data pipeline to bring the data scientist’s work closer to the broader organization.
With Looker on top of BigQuery, this step is eliminated. Because the data never leaves BigQuery, data scientists are able to easily unlock the value of this final step for their business users by immediately pushing the output of their models to their end users in the same methods already being employed on top of BigQuery.
Now, with BQML + Looker, the workflow for data science looks like this:
Connecting directly with Google BQML reduces additional complexity for data scientists by eliminating the need to move outputs of predictive models back into the database for use, while also increases the time-to-value for business users, allowing them to operationalize the outputs of predictive metrics to make better decisions every day.
We believe the future of data lies in amplifying the capabilities of everyone, from data scientists, to analysts to deliver more value and insights to their organizations, and we’re proud to work with Google to make this vision a reality.
Want to learn more about how Looker improves the data science workflow? Visit our and to increase the efficiency of their data science workflows.
Want to understand how to use Looker to leverage your Google Cloud platform? Visit our to learn more about Looker’s integration with Google BigQuery.
Ready to see Looker and BQML in action? to see the benefits of Google BigQuery and Looker on your data.