Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Guidance on optimizing queries

I'm just new to bigquery and have just been slapped with bill for $1400 for 9 days of queries on my workspaces logs. I have a script pulling back my workspace gmail logs every minute with the following 

SELECT gmail, CURRENT_TIMESTAMP(), TIMESTAMP_MICROS(gmail.event_info.timestamp_usec)
FROM `workspace-logs-XXXXX.wsl.activity`
WHERE TIMESTAMP(TIMESTAMP_MICROS(gmail.event_info.timestamp_usec)) > '2023-12-16 11:11:52.776000'
ORDER BY gmail.event_info.timestamp_usec ASC;

 It's costing me a fortune so looking for ways to optimise this query. Sure can query less regularly but is there anything else I can do. 

More info for each query
Duration 1 sec
Bytes processed 22.12 GB
Bytes billed 22.12 GB
milliseconds 599902

Solved Solved
1 5 714
2 ACCEPTED SOLUTIONS

Looking at your query, you seem to be retrieving all records that are after some date (2023-12-16).  You are telling me that you are doing this once a minute ... for every minute.  This feels like you are returning the same records over and over again as opposed to returning just the new / changed records.  In the On-Demand billing model ... Google charges $6.25 for every Terrabyte scanned.   Looking at your query and the data you are showing ... we have:

22.12GB/1TB * 60 calls/hour * 24 hours * $6.25 = $194.41 / day  => ~ $1749 for 9 days

The way to reduce your costs are:

1. Make fewer calls
2. Reduce the data scanned per call

What is the schema/structure of the table you are querying?  I'd be looking to see if the table is partitioned.  Partitioning means that when you perform a query you only need consider a subset of the data.  I'd also be interested in hearing about how the table is being updated.  The thought that keeps coming to me is that the data being returned seems to be the same data over and over.

Let's also consider a value proposition ... if you consider $194/day a poor value, what would you consider a good value?  How much time and energy are you willing to invest to change the situation (1 year would be ~$70K ... so it feels like a good investment).

View solution in original post

I think if I'm reading this article correctly ...  I am seeing the following:

BigQuery Exports collect Google Workspace data from the previous day's events. The result shows data from the previous day to the export date.

This seems to say to me that the audit log data is written to BigQuery once per day and doesn't then change for a 24 hour period.  I'd be tempted to suggest that you run a scheduled job, once per day .... that does the equivalent of:

CREATE TABLE <archive_root>_YYYY_MM_DD AS (
   SELECT * FROM <AuditTable> WHERE <AuditRecordDate> >= YYYY_MM__DD
)

You can use Cloud Scheduler, BigQuery Scheduled Queries, or Cloud Composer (or other) to schedule and run the job.

View solution in original post

5 REPLIES 5