Hello everyone!
I am currently utilizing the Analytics Hub (AH) to exchange multiple datasets across several projects. We intend to track the usage of the listings to identify the most frequently used datasets and queries made by users.
After a brief inspection of the users' project log, I discovered that only the new names of linked datasets were referenced in it rather than the original name or some values to AH. In detail, I cannot see any difference between a query made against a normal dataset and a linked one. This could be a problem to do also because they could remove the linked dataset and create a new one with a different name.
Is there a way to retrieve this information?
Good day @DarioTra ,
Welcome to Google Cloud Community!
A linked dataset in Bigquery is a read-only dataset which is a symbolic link to a shared dataset. Subscribers can view the data but cannot update or add or remove items within it because subscribing to a listing creates a linked dataset in their project rather than a copy of the dataset which means that a subscriber will not have access to your projects dataset but rather when the subscriber subscribes to a listing, a linked dataset is created in the users project, so if they perform their query, it is only available in their project. You can learn more here: https://cloud.google.com/bigquery/docs/analytics-hub-introduction#subscriber_workflow
However, You will be able to see the subscriptions to your listings, you can check this link to learn more: https://cloud.google.com/bigquery/docs/analytics-hub-manage-listings#view_all_subscriptions
Additionally, the Subscribers can't edit the metadata or data in a linked dataset, they can only update the labels and description of the linked datasets, which will not affect the publishers shared dataset and they can't also perform snapshots on the linked dataset which means they can't preserve the contents of the table. You can check this link to learn more: https://cloud.google.com/bigquery/docs/analytics-hub-view-subscribe-listings
https://cloud.google.com/bigquery/docs/analytics-hub-introduction#limitations
Hope this helps!
Hello @kvandres,
Thanks for your answer and the linked documentation!
The advantages and the limitations to use a linked dataset are clear and it is also easy to manage the subscriptions from the UI.
We are already using this type of architecture to share our datasets, but now we would like to retrieve some statistical/audit data about the usage of these datasets. We are interested in these linked datasets and not in the user private dataset.
I was wondering if it could be possible to set a specific filter in the logs to retrieve only the information about the queries done against a linked dataset, without knowing the exact name of the linked dataset but only the listing.