Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Log operations "save results" in BigQuery

Good afternoon, inside the company we want to log the click of the "Save Results" button from the browser UI to BigQuery ("CSV*", "JSON", "Copy to Clipboard", etc.). This is necessary to build a leak control process.

The problem is that when analyzing the logs of the "Logging" service (for example, using Logs Explorer or Splunk), no difference was found between the usual "SELECT" and "SAVE RESULTS".

The difference was found only when proxying browser requests through a proxy server, there is a "downloadFormat" field among the parameters.

How to log it using "Logging (stackdriver)" service?

Снимок экрана 2023-02-13 в 15.14.13.png

2 4 1,390
4 REPLIES 4

Here is an example of a browser request intercepted by proxyСнимок экрана 2023-02-13 в 15.06.25.pngСнимок экрана 2023-02-13 в 15.06.43.pngСнимок экрана 2023-02-13 в 15.04.12.png

Sadly, I don't think we are going to find a good answer.  Here is my thinking.

The Console UI is (presumably) a Google written application that "uses" the BigQuery APIs.  When you submit a SELECT statement request, the Console UI sends that SELECT statement to BigQuery and BigQuery executes it.  The resulting data is then returned to the Console UI App which displays the results.  At this point, the Console UI App "has" the data.   The "Save Results" button is a "contract" between the user and the Console UI App to save the results.   Now let us imagine that the same human user opened up a Cloud Shell and ran the "bq" command to execute the same statement and directed the output to a file and then downloaded the file through Cloud Shell.   Now we have TWO mechanisms to extract the data.  I think we are likely going to be playing "whack-a-mole" trying to detect downloads of data.  It now dawned on me that we might also have to look at "Connected Sheets" as well.

One solution would be to prevent a user from running queries directly by taking away their BigQuery permissions but that would likely not fly because I am presuming they need that permission to perform their roles.

I'm wondering if you have VPC Service Controls implemented?   This is commonly used to assist with data exfiltration by restricting where requests to services like BigQuery can originate and also where exports of data from BigQuery can "go to".  There might be something in VPC Service Controls that can assist us.

 
VPC Service Controls is the first thing we started to use in the company to limit the perimeters and block leaks. Before that, it was easy to execute the "bq cp" operation and copy a terabyte dataset to a personal project.
Now we want not to block all uploads, but we want to monitor them in order to be able to accumulate statistics and see "who saves too much and often saves data to their personal computer."
I was hoping that I would see a parameter in the Stackdriver logs that would allow me to separate regular "SELECT" queries from "SAVE RESULT", but they are exactly the same. I don't know what to do and how to monitor it with GCP tools

so is there any workaround provided so we can control the data leak through the export options provided in bigquery?

is there any report that we can create to monitor who are the users did the extraction?