Solved: Re: Creating a Sink for BigQuery Audit Logs Across...

ylee1212 · 10-16-2023 07:11 AM

Hello!

I'm currently working on setting up a sink for BigQuery Audit Logs, and I'm wondering if it's possible to configure a single sink to collect audit logs from either entire organizations or multiple projects into a single BigQuery table. Is there a way to achieve this, or do I need to create separate sinks for each project individually? I appreciate your assistance in addressing my query!

ms4446

Hi @ylee1212 ,

Aggregated Log Sink Definition: An aggregated log sink is a sink that collects logs from multiple resources, such as projects, folders, or an entire organization. It is created at a higher level (organization or folder) with the --include-children flag (if using gcloud).

Console Steps: To create an aggregated log sink to collect BigQuery audit logs from multiple projects or an entire organization in the Google Cloud Console, follow these steps:

Go to the Google Cloud console: https://console.cloud.google.com/.
Click the hamburger menu (three horizontal lines) in the top left corner of the page.
Select Logging.
Click the Sinks tab.
Click the Create sink button.
In the Create sink dialog box, enter the following information:
- Sink name: Enter a name for your sink.
- Destination: Select BigQuery table.
- Project: Select the project where your BigQuery table is located.
- Dataset: Select the dataset where your BigQuery table is located.
- Table: Select the BigQuery table where you want to collect your audit logs.
- Parent: Select the organization or folder where you want to collect audit logs from.
Under Filter, select Create a new filter.
In the Filter expression field, enter the following filter expression to collect all BigQuery audit logs:

resource.type="bigquery_resource" AND protoPayload.serviceName="bigquery.googleapis.com"

Click the Create sink button.

Service Account Permissions: After you have created the sink, you will need to grant the sink service account permission to write to the BigQuery table. To do this, follow these steps:

Go to the IAM & Admin page in the Google Cloud console.
Click the Service accounts tab.
Click the name of the sink service account.
Click the Permissions tab.
Click Add.
In the Search roles field, enter BigQuery Data Writer.
Select the BigQuery Data Writer role.
Click Add.
Click Save.

Once you have granted the sink service account permission to write to the BigQuery table, it will start collecting audit logs. You can then query the BigQuery table to view your audit logs.

View solution in original post

ms4446

Hi @ylee1212 ,

Thank you for the additional information. The presence or absence of the protopayload_auditlog.metadataJson field in audit logs should not inherently differ based on whether the logs are collected at the project or organization level. Both levels should capture the same fields for the same types of events.

If you're observing a discrepancy, it might be due to the specific types of logs being captured, the filters applied, or other configuration differences between the two sinks. The metadataJson field provides additional metadata for certain types of audit log entries, particularly data access logs.

To troubleshoot, ensure both sinks are capturing the same types of logs and have consistent configurations.

View solution in original post

ms4446

Hi @ylee1212 ,

Aggregated Log Sink Definition: An aggregated log sink is a sink that collects logs from multiple resources, such as projects, folders, or an entire organization. It is created at a higher level (organization or folder) with the --include-children flag (if using gcloud).

Console Steps: To create an aggregated log sink to collect BigQuery audit logs from multiple projects or an entire organization in the Google Cloud Console, follow these steps:

Go to the Google Cloud console: https://console.cloud.google.com/.
Click the hamburger menu (three horizontal lines) in the top left corner of the page.
Select Logging.
Click the Sinks tab.
Click the Create sink button.
In the Create sink dialog box, enter the following information:
- Sink name: Enter a name for your sink.
- Destination: Select BigQuery table.
- Project: Select the project where your BigQuery table is located.
- Dataset: Select the dataset where your BigQuery table is located.
- Table: Select the BigQuery table where you want to collect your audit logs.
- Parent: Select the organization or folder where you want to collect audit logs from.
Under Filter, select Create a new filter.
In the Filter expression field, enter the following filter expression to collect all BigQuery audit logs:

resource.type="bigquery_resource" AND protoPayload.serviceName="bigquery.googleapis.com"

Click the Create sink button.

Service Account Permissions: After you have created the sink, you will need to grant the sink service account permission to write to the BigQuery table. To do this, follow these steps:

Go to the IAM & Admin page in the Google Cloud console.
Click the Service accounts tab.
Click the name of the sink service account.
Click the Permissions tab.
Click Add.
In the Search roles field, enter BigQuery Data Writer.
Select the BigQuery Data Writer role.
Click Add.
Click Save.

Once you have granted the sink service account permission to write to the BigQuery table, it will start collecting audit logs. You can then query the BigQuery table to view your audit logs.

ylee1212

Different Sink Output Schema Between Project and Organization Level Audit Logs

Firstly, I'd like to extend my gratitude to @ms4446 for the comprehensive response provided earlier.

To give some context, I recently set up a sink using Terraform following the guidance you've shared. During the implementation, I consulted a Google blog post and noticed some disparities in the output schema of our sinks.

For the project level sink, I can query the field protopayload_auditlog.metadataJson. However, this particular field seems to be absent from our organization level schema. Here are the snapshots for reference:

Organization BigQuery Audit Log Sink Snapshot:

Project Level Sink Snapshot:

Given the absence of the metadataJson field, I'm finding it challenging to execute the following query from the aforementioned blog:

SELECT 
      timestamp AS time_of_access,
      protopayload_auditlog.authenticationInfo.principalEmail as user_email,
      protopayload_auditlog.requestMetadata.callerIp as ip,
      auth.permission as auth_permission,
      auth.granted as auth_granted,
      data_access.resource.labels.project_id AS job_execution_project,
      SPLIT(protopayload_auditlog.resourceName, '/')[SAFE_OFFSET(1)] AS referenced_project,
      SPLIT(protopayload_auditlog.resourceName, '/')[SAFE_OFFSET(3)] AS referenced_dataset,
      SPLIT(protopayload_auditlog.resourceName, '/')[SAFE_OFFSET(5)] AS referenced_table,      ARRAY_LENGTH(SPLIT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.tableDataRead'), '$.fields'), ','))  as num_fields,
      SPLIT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.tableDataRead'), '$.fields'),",") as fields
FROM `my-project`.my_dataset.cloudaudit_googleapis_com_data_access As data_access, UNNEST(protopayload_auditlog.authorizationInfo) AS auth
WHERE
protopayload_auditlog.methodName = "google.cloud.bigquery.v2.JobService.InsertJob"
AND data_access.resource.type = 'bigquery_dataset'
AND JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.tableDataRead'), '$.reason') = '"JOB"';

I'm curious to understand if there's an inherent difference between the project level and organization level audit logs. Why might we be witnessing these inconsistencies?

ms4446

Hi @ylee1212 ,

Thank you for the additional information. The presence or absence of the protopayload_auditlog.metadataJson field in audit logs should not inherently differ based on whether the logs are collected at the project or organization level. Both levels should capture the same fields for the same types of events.

If you're observing a discrepancy, it might be due to the specific types of logs being captured, the filters applied, or other configuration differences between the two sinks. The metadataJson field provides additional metadata for certain types of audit log entries, particularly data access logs.

To troubleshoot, ensure both sinks are capturing the same types of logs and have consistent configurations.

ylee1212

Happy Friday @ms4446 !

It seems like `resource.type="bigquery_resource"` filter that is removing the newer version of the log bigqueryAuditMetadata. Thanks for your advise! I am removing this field and about to apply the filter:

protoPayload.serviceName="bigquery.googleapis.com" 
    AND protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"

so we get the only newer version of the log.

ms4446

Yes, Happy Friday @ylee1212 😊

That's right. Using the resource.type="bigquery_resource" filter might exclude the newer version of the BigQuery audit log, which is represented by the bigqueryAuditMetadata field.

The filter you've provided:

protoPayload.serviceName="bigquery.googleapis.com" 
AND protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"

will indeed target the newer version of the BigQuery audit logs.

winklermatt22

Hello, I'm attempting to set this up but am not seeing the log tables actually created in my target dataset (I called it bigquery_audit_logs).

I have granted the following permissions to the log agent service account: LoggingAdmin, Bigquery Data Editor, Bigquery User.

My inclusion filter is ```

protoPayload.serviceName="bigquery.googleapis.com"

AND protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"```

Any ideas where I might be going wrong?

winklermatt22

Do I need to create the tables within the dataset manually, or something?

ms4446

Here's are potential causes and steps to investigate why your BigQuery audit log tables aren't being created:

Sink Configuration:

Sink Destination: Confirm that the sink points to the correct project and dataset (bigquery_audit_logs). This ensures tables are created in the intended location.
Parent Resource: Verify that the sink's "parent" is correctly set to the organization or folder intended for log collection. Incorrect settings here could affect log aggregation.
Sink Status: Check the sink's status in the Cloud Logging console to ensure it's active and error-free.

Permissions:

Table Creation Permissions: The roles you've assigned should generally cover the necessary permissions. However, ensure the logging service account has the BigQuery Data Editor role (roles/bigquery.dataEditor) on the target dataset. This role implicitly allows for table creation and data insertion.

Log Availability:

Activity Monitoring: Make sure there's ongoing activity in the projects monitored by your sink. Without BigQuery actions, no audit logs (and consequently no tables) will be generated.

Filters:

While your filter appears correctly configured to capture BigQuery audit metadata logs, consider broadening your filter temporarily to see if it affects log flow.

Troubleshooting Steps:

Cloud Logging Console: Inspect the Cloud Logging console for any errors related to your sink. Look for logs or warnings that might indicate issues with log delivery.
IAM Permissions:
- Navigate to the IAM page for your dataset (bigquery_audit_logs).
- Ensure the service account used by your logging sink has the BigQuery Data Editor role.
- This role is crucial for allowing the service account to create tables and insert data into BigQuery.
Perform some BigQuery operations within the projects your sink monitors. This can help generate audit logs and trigger table creation.
There might be a delay from when logs are generated to when they are visible in BigQuery due to processing times.
The sink should automatically manage table creation upon receiving the first relevant log entry. Manual creation is not necessary and could complicate the process.

winklermatt22

thank you @ms4446 ! This was a log availability problem 😅. I'm good to go now.

jisu

Hello, based on the above content, I'm trying to set up an audit log sink in a BigQuery dataset.
An issue persists, causing errors as follows: "Cloud Logging sink configuration error in orgid, sink all_auditlogs_sink_bq: dataset_org_policy_denied ()".
I have granted the service account used by the sink the BigQuery Data Editor role in the project where the BigQuery dataset is located. What could be the problem? Please check.

Creating a Sink for BigQuery Audit Logs Across Organizations or Projects