Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Bigquery INFORMATION_SCHEMA.SHARED_DATASET_USAGE view not recording the usage of shared dataset

Former Community Member
Not applicable

I read the authorised dataset tables via view from different project, but the data usage of the authorised dataset is not recorded in the  INFORMATION_SCHEMA.SHARED_DATASET_USAGE. when i query INFORMATION_SCHEMA.SHARED_DATASET_USAGE, it returns no data display.

Could someone please advise the reason?

Solved Solved
0 3 1,365
2 ACCEPTED SOLUTIONS

In Google Cloud BigQuery, if you read the authorized dataset tables via a view from a different project, the data usage of the authorized dataset will not be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

Here are a few workarounds for this:

  1. Create a materialized view of the shared dataset in your own project. When you query the materialized view, the data usage will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
  2. Use the BigQuery audit logs to track data usage for shared datasets. The audit logs record all API calls made to BigQuery,including calls to read data from shared datasets.
  3. Use a third-party tool to track data usage for shared datasets.There are a number of tools available that can help you track data usage across multiple projects and datasets.

Here is an example of how to create a materialized view of a shared dataset:

CREATE MATERIALIZED VIEW my_project.my_dataset.my_view
AS
SELECT *
FROM project_id.dataset_id.shared_table;

Once you have created the materialized view, you can query it as follows:

SELECT *
FROM my_project.my_dataset.my_view;

The data usage for the materialized view will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

To use the BigQuery audit logs to track data usage for shared datasets, you can use the following query:

SELECT
  jsonPayload.authenticationInfo.principalEmail,
  jsonPayload.request.resource.table.projectId,
  jsonPayload.request.resource.table.datasetId,
  jsonPayload.request.resource.table.tableId,
  jsonPayload.request.resource.table.bytesRead
FROM
  `my_project_id.audit_dataset_id.audit_table_id`
WHERE
  jsonPayload.request.resource.type = "table"
  AND jsonPayload.request.resource.table.projectId = "project_id"
  AND jsonPayload.request.resource.table.datasetId = "dataset_id"
  AND jsonPayload.request.resource.table.tableId = "shared_table"
ORDER BY
  jsonPayload.authenticationInfo.principalEmail;

This query will return a list of all users and applications that have read data from the shared dataset, as well as the amount of data that was read.

Finally, there are a number of third-party tools available that can help you track data usage for shared datasets. I recommend looking into the Google Cloud Marketplace for monitoring solutions tailored for BigQuery.

View solution in original post

The audit_dataset_id and audit_table_id are the names of the dataset and table where your BigQuery audit logs are stored. If you've set up an export of BigQuery audit logs to a BigQuery dataset, you can find the export location as follows:

  1. Go to the Google Cloud Console: https://console.cloud.google.com/
  2. Click the Menu button (three horizontal lines) in the top left corner of the page.
  3. Navigate to Logging > Logs Router.
  4. Here, look for an existing sink that directs logs to BigQuery. If one exists, the "Destination" column will indicate the destination dataset (and potentially table) where logs are being exported. This is where you'll find your audit_dataset_id and potentially audit_table_id.

By default, BigQuery audit logs are enabled, but if you want to export them to a specific location for long-term storage and analysis, you'd create a sink in the Logs Router.

Once you have found the audit_dataset_id and audit_table_id, you can use them in the query provided to track data usage for shared datasets. The SQL query you provided is accurate and offers a good example of how to utilize these IDs.

View solution in original post

3 REPLIES 3

In Google Cloud BigQuery, if you read the authorized dataset tables via a view from a different project, the data usage of the authorized dataset will not be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

Here are a few workarounds for this:

  1. Create a materialized view of the shared dataset in your own project. When you query the materialized view, the data usage will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
  2. Use the BigQuery audit logs to track data usage for shared datasets. The audit logs record all API calls made to BigQuery,including calls to read data from shared datasets.
  3. Use a third-party tool to track data usage for shared datasets.There are a number of tools available that can help you track data usage across multiple projects and datasets.

Here is an example of how to create a materialized view of a shared dataset:

CREATE MATERIALIZED VIEW my_project.my_dataset.my_view
AS
SELECT *
FROM project_id.dataset_id.shared_table;

Once you have created the materialized view, you can query it as follows:

SELECT *
FROM my_project.my_dataset.my_view;

The data usage for the materialized view will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

To use the BigQuery audit logs to track data usage for shared datasets, you can use the following query:

SELECT
  jsonPayload.authenticationInfo.principalEmail,
  jsonPayload.request.resource.table.projectId,
  jsonPayload.request.resource.table.datasetId,
  jsonPayload.request.resource.table.tableId,
  jsonPayload.request.resource.table.bytesRead
FROM
  `my_project_id.audit_dataset_id.audit_table_id`
WHERE
  jsonPayload.request.resource.type = "table"
  AND jsonPayload.request.resource.table.projectId = "project_id"
  AND jsonPayload.request.resource.table.datasetId = "dataset_id"
  AND jsonPayload.request.resource.table.tableId = "shared_table"
ORDER BY
  jsonPayload.authenticationInfo.principalEmail;

This query will return a list of all users and applications that have read data from the shared dataset, as well as the amount of data that was read.

Finally, there are a number of third-party tools available that can help you track data usage for shared datasets. I recommend looking into the Google Cloud Marketplace for monitoring solutions tailored for BigQuery.

Former Community Member
Not applicable

Thank you so much for you reply.
Could you let me know where i can find "audit_dataset_id" and "audit_table_id"?

The audit_dataset_id and audit_table_id are the names of the dataset and table where your BigQuery audit logs are stored. If you've set up an export of BigQuery audit logs to a BigQuery dataset, you can find the export location as follows:

  1. Go to the Google Cloud Console: https://console.cloud.google.com/
  2. Click the Menu button (three horizontal lines) in the top left corner of the page.
  3. Navigate to Logging > Logs Router.
  4. Here, look for an existing sink that directs logs to BigQuery. If one exists, the "Destination" column will indicate the destination dataset (and potentially table) where logs are being exported. This is where you'll find your audit_dataset_id and potentially audit_table_id.

By default, BigQuery audit logs are enabled, but if you want to export them to a specific location for long-term storage and analysis, you'd create a sink in the Logs Router.

Once you have found the audit_dataset_id and audit_table_id, you can use them in the query provided to track data usage for shared datasets. The SQL query you provided is accurate and offers a good example of how to utilize these IDs.