Hi,
I am using Dataplex to run both Data profile and Data Quality scans. I want to be able to publish these results so have ticked this box when setting the scan settings.
My expectation is that then, every time run the scans, the results will show in Big Query here, but there is not data showing.
Can anyone advise where I am going wrong or if any additional settings are required?
Thanks
Solved! Go to Solution.
If your Dataplex scan results are not showing up in BigQuery, there could be several reasons behind this issue. Below is a breakdown of potential causes and troubleshooting steps to help you resolve the problem:
Permissions
Verify Dataplex Service Account Permissions: The Dataplex service account needs specific roles in BigQuery to publish scan results. Ensure the service account has the roles/bigquery.dataEditor
role, which allows it to create tables and insert data. The roles you mentioned, such as roles/bigquery.dataViewer
and roles/bigquery.jobUser
, are important for other tasks but not sufficient for publishing scan results. To grant these permissions:
In the BigQuery console, navigate to the dataset intended for the results.
Click "Share" and grant the Dataplex service account the roles/bigquery.dataEditor
role.
Table Creation
Existing Table: If you've directed Dataplex to use an existing BigQuery table for the results, ensure the table's schema is compatible with the expected format of Dataplex scan results. The required schema details can be found in the Dataplex documentation.
New Table: If Dataplex is configured to create a new table for the results, verify that the table has been successfully created in your BigQuery dataset. Tables created by Dataplex typically follow specific naming conventions, which might include a prefix like dataplex_
, but you should refer to the scan configuration or Dataplex documentation for the exact naming pattern.
Publishing Issues
Verify Checkbox: Double-check that the "Publish results to the BigQuery and Dataplex Catalog UI" option was indeed selected when setting up the scan. This step is crucial for ensuring results are published.
Scan Completion: Confirm that the Data Profile or Data Quality scans have completed successfully. Scan results will not be published if the scans fail or do not complete for any reason.
Delays
Processing Time: Be aware that there might be a delay before the results appear in BigQuery, especially for large datasets or complex scans. Allow some time for the process to complete.
Additional Tips
Check Dataplex Logs: Review the Dataplex logs for any error messages. These logs can provide specific clues about any issues encountered during the scan or the publishing process.
Consult Dataplex Documentation: For the most up-to-date instructions and potential limitations, refer to the official Dataplex documentation.
Example Scenario
Imagine you've set up publishing, but the results are not visible in BigQuery. Assuming you did not specify a table, meaning Dataplex is responsible for creating one, here's how you might proceed:
Check Permissions: Ensure the Dataplex service account has roles/bigquery.dataEditor
permissions for the target dataset in BigQuery.
Look for Tables: In your BigQuery dataset, search for a new table that matches the naming convention used by Dataplex (e.g., starting with dataplex_
). If no such table exists, the creation process might have encountered issues.
Examine Logs: Review the Dataplex logs for any errors related to table creation in BigQuery. This can provide insights into what went wrong.
If your Dataplex scan results are not showing up in BigQuery, there could be several reasons behind this issue. Below is a breakdown of potential causes and troubleshooting steps to help you resolve the problem:
Permissions
Verify Dataplex Service Account Permissions: The Dataplex service account needs specific roles in BigQuery to publish scan results. Ensure the service account has the roles/bigquery.dataEditor
role, which allows it to create tables and insert data. The roles you mentioned, such as roles/bigquery.dataViewer
and roles/bigquery.jobUser
, are important for other tasks but not sufficient for publishing scan results. To grant these permissions:
In the BigQuery console, navigate to the dataset intended for the results.
Click "Share" and grant the Dataplex service account the roles/bigquery.dataEditor
role.
Table Creation
Existing Table: If you've directed Dataplex to use an existing BigQuery table for the results, ensure the table's schema is compatible with the expected format of Dataplex scan results. The required schema details can be found in the Dataplex documentation.
New Table: If Dataplex is configured to create a new table for the results, verify that the table has been successfully created in your BigQuery dataset. Tables created by Dataplex typically follow specific naming conventions, which might include a prefix like dataplex_
, but you should refer to the scan configuration or Dataplex documentation for the exact naming pattern.
Publishing Issues
Verify Checkbox: Double-check that the "Publish results to the BigQuery and Dataplex Catalog UI" option was indeed selected when setting up the scan. This step is crucial for ensuring results are published.
Scan Completion: Confirm that the Data Profile or Data Quality scans have completed successfully. Scan results will not be published if the scans fail or do not complete for any reason.
Delays
Processing Time: Be aware that there might be a delay before the results appear in BigQuery, especially for large datasets or complex scans. Allow some time for the process to complete.
Additional Tips
Check Dataplex Logs: Review the Dataplex logs for any error messages. These logs can provide specific clues about any issues encountered during the scan or the publishing process.
Consult Dataplex Documentation: For the most up-to-date instructions and potential limitations, refer to the official Dataplex documentation.
Example Scenario
Imagine you've set up publishing, but the results are not visible in BigQuery. Assuming you did not specify a table, meaning Dataplex is responsible for creating one, here's how you might proceed:
Check Permissions: Ensure the Dataplex service account has roles/bigquery.dataEditor
permissions for the target dataset in BigQuery.
Look for Tables: In your BigQuery dataset, search for a new table that matches the naming convention used by Dataplex (e.g., starting with dataplex_
). If no such table exists, the creation process might have encountered issues.
Examine Logs: Review the Dataplex logs for any errors related to table creation in BigQuery. This can provide insights into what went wrong.
Sorry for the late reply, your answer is most appreciated.
I found the main reason was that I have 2 projects for which I am using Dataplex and I was creating quality scans for a dataset under one project , and was trying to view the results via bigquery UI for the other project. (hence they were not visible).