I read the authorised dataset tables via view from different project, but the data usage of the authorised dataset is not recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE. when i query INFORMATION_SCHEMA.SHARED_DATASET_USAGE, it returns no data display.
Could someone please advise the reason?
Solved! Go to Solution.
In Google Cloud BigQuery, if you read the authorized dataset tables via a view from a different project, the data usage of the authorized dataset will not be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
Here are a few workarounds for this:
Here is an example of how to create a materialized view of a shared dataset:
CREATE MATERIALIZED VIEW my_project.my_dataset.my_view
AS
SELECT *
FROM project_id.dataset_id.shared_table;
Once you have created the materialized view, you can query it as follows:
SELECT *
FROM my_project.my_dataset.my_view;
The data usage for the materialized view will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
To use the BigQuery audit logs to track data usage for shared datasets, you can use the following query:
SELECT
jsonPayload.authenticationInfo.principalEmail,
jsonPayload.request.resource.table.projectId,
jsonPayload.request.resource.table.datasetId,
jsonPayload.request.resource.table.tableId,
jsonPayload.request.resource.table.bytesRead
FROM
`my_project_id.audit_dataset_id.audit_table_id`
WHERE
jsonPayload.request.resource.type = "table"
AND jsonPayload.request.resource.table.projectId = "project_id"
AND jsonPayload.request.resource.table.datasetId = "dataset_id"
AND jsonPayload.request.resource.table.tableId = "shared_table"
ORDER BY
jsonPayload.authenticationInfo.principalEmail;
This query will return a list of all users and applications that have read data from the shared dataset, as well as the amount of data that was read.
Finally, there are a number of third-party tools available that can help you track data usage for shared datasets. I recommend looking into the Google Cloud Marketplace for monitoring solutions tailored for BigQuery.
The audit_dataset_id
and audit_table_id
are the names of the dataset and table where your BigQuery audit logs are stored. If you've set up an export of BigQuery audit logs to a BigQuery dataset, you can find the export location as follows:
audit_dataset_id
and potentially audit_table_id
.By default, BigQuery audit logs are enabled, but if you want to export them to a specific location for long-term storage and analysis, you'd create a sink in the Logs Router.
Once you have found the audit_dataset_id
and audit_table_id
, you can use them in the query provided to track data usage for shared datasets. The SQL query you provided is accurate and offers a good example of how to utilize these IDs.
In Google Cloud BigQuery, if you read the authorized dataset tables via a view from a different project, the data usage of the authorized dataset will not be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
Here are a few workarounds for this:
Here is an example of how to create a materialized view of a shared dataset:
CREATE MATERIALIZED VIEW my_project.my_dataset.my_view
AS
SELECT *
FROM project_id.dataset_id.shared_table;
Once you have created the materialized view, you can query it as follows:
SELECT *
FROM my_project.my_dataset.my_view;
The data usage for the materialized view will be recorded in the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.
To use the BigQuery audit logs to track data usage for shared datasets, you can use the following query:
SELECT
jsonPayload.authenticationInfo.principalEmail,
jsonPayload.request.resource.table.projectId,
jsonPayload.request.resource.table.datasetId,
jsonPayload.request.resource.table.tableId,
jsonPayload.request.resource.table.bytesRead
FROM
`my_project_id.audit_dataset_id.audit_table_id`
WHERE
jsonPayload.request.resource.type = "table"
AND jsonPayload.request.resource.table.projectId = "project_id"
AND jsonPayload.request.resource.table.datasetId = "dataset_id"
AND jsonPayload.request.resource.table.tableId = "shared_table"
ORDER BY
jsonPayload.authenticationInfo.principalEmail;
This query will return a list of all users and applications that have read data from the shared dataset, as well as the amount of data that was read.
Finally, there are a number of third-party tools available that can help you track data usage for shared datasets. I recommend looking into the Google Cloud Marketplace for monitoring solutions tailored for BigQuery.
Thank you so much for you reply.
Could you let me know where i can find "audit_dataset_id" and "audit_table_id"?
The audit_dataset_id
and audit_table_id
are the names of the dataset and table where your BigQuery audit logs are stored. If you've set up an export of BigQuery audit logs to a BigQuery dataset, you can find the export location as follows:
audit_dataset_id
and potentially audit_table_id
.By default, BigQuery audit logs are enabled, but if you want to export them to a specific location for long-term storage and analysis, you'd create a sink in the Logs Router.
Once you have found the audit_dataset_id
and audit_table_id
, you can use them in the query provided to track data usage for shared datasets. The SQL query you provided is accurate and offers a good example of how to utilize these IDs.