Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

What does Dataplex's Entry details: BigQuery compatible mean?

ra
Bronze 3
Bronze 3

Dataplex discovery is interesting but I do not quite understand the reason why asset from a BigQuery itself is flagged as not compatible to BigQuery. 

ra_0-1647984934013.png

Is there any rule for determining compatibility to BigQuery?  I suspect that the reason is there is no column that can be used for partition.

1 13 2,501
13 REPLIES 13

Hello Ra, did you manage to answere this question? If so i'd like to know what you discovered!

Hi, I'm looking for a solution to this since I can't see the tables in bigquery from dataplex and the same thing appears, not compatible with bigquery, did you find a solution?

In the context of Google Cloud Dataplex, "Entry details: BigQuery compatible" refers to the data assets that are discoverable and queryable by BigQuery within the Dataplex platform. Dataplex enables you to manage, integrate, and analyze data across different storage systems, such as Google Cloud Storage, BigQuery, and more.

When Dataplex ingests data from various sources, it can create and manage BigQuery tables for the data. This means the data is stored in a format optimized for querying in BigQuery, which allows you to run analytics and gain insights from your data using BigQuery's features and performance capabilities.

For data to be considered BigQuery compatible within Dataplex, it should follow the same rules and considerations mentioned in the previous answer, such as file format, schema, data structure, data size and limits, data consistency, partitioning and clustering, and special characters and encoding.

In addition to these rules, Dataplex offers some specific features to ensure compatibility with BigQuery:

  1. Data format conversion: Dataplex can automatically convert data stored in Google Cloud Storage (GCS) from file formats like Avro, Parquet, ORC, and JSON to BigQuery-compatible formats.

  2. Schema detection and inference: Dataplex can automatically detect and infer schema information for data stored in GCS, ensuring that it's compatible with BigQuery.

  3. Data lake integration: Dataplex can create "data lake" tables in BigQuery, allowing you to query data stored in GCS directly from BigQuery using external tables or materialized views, making the data more easily accessible and compatible with BigQuery.

Thank you very much for the answer, in the zones I am loading Big Query dataset assets, then it discovers all the tables of the dataset. Once the entire lake is configured when I go to search and enter the table belonging to the asset, I have no way to open it in BigQuery, and that non-compatible option appears. Where should I check?

When you're encountering a "non-compatible" message or cannot open the table in BigQuery after loading a dataset into Dataplex, there could be several reasons for this issue. Here's a list of common checks to identify and resolve the problem:

  1. Access permissions: Ensure that the Google Cloud Platform (GCP) service account used by Dataplex has appropriate permissions to access the dataset and its tables in BigQuery. You can verify and update the permissions in the GCP Console under the IAM & Admin section.

  2. Dataset location: Make sure that the dataset is stored in a region that is supported by both BigQuery and Dataplex. If the dataset is in an unsupported region, consider transferring the dataset to a supported region or creating a new dataset in a supported region.

  3. File formats and schema: Verify that the tables in the dataset are using supported file formats and have a well-defined schema that is compatible with BigQuery. You can check this in the BigQuery Console under the dataset details.

  4. Data storage location: Confirm that the underlying data is stored in a Google Cloud Storage (GCS) bucket that is accessible by both BigQuery and Dataplex. Check the bucket permissions and ensure that the GCP service account used by Dataplex has read access to the bucket.

  5. Data freshness: If the dataset has been recently updated or its schema has changed, there might be a delay in reflecting these changes in Dataplex. Try refreshing the metadata in Dataplex to ensure that the most recent version of the dataset is being used.

  6. Configuration issues: Double-check the configurations in Dataplex, such as the connection settings and zone configurations, to ensure that they are set up correctly. Make sure that the dataset assets are correctly linked to the BigQuery dataset.

Greetings,  Thank you for the answer but I am not sure it is addressing the heart of the question.  I also have similar issue as  nahuel0688.  The asset discovered into the Lake => Zone is a big query dataset itself.  In my case the tables are generated in that dataset through CDC using Datastream.  The question that needs answered is how a big query dataset not compatible with big query?  On the surface, it does not seem to make sense.

I look forward to hear more about this as it seems to be a common problem encounterd.

Kind Regards,

edeneault

It's certainly unusual for a BigQuery dataset to display as incompatible with BigQuery itself. Here are a few potential reasons for such a scenario:

  • The dataset might be in a file format that BigQuery does not support, such as the .zip format.
  • The schema of the dataset could be incompatible with BigQuery. This may occur if the dataset includes columns with data types not recognized by BigQuery.
  • The dataset might be encrypted, which is not supported by BigQuery.
  • The dataset could reside in a Virtual Private Cloud (VPC) network that is inaccessible to BigQuery.

Furthermore, here are a few more checks you can carry out:

  • Ensure that the dataset's region aligns with the Dataplex lake's region.
  • Confirm that the dataset is not encrypted.
  • Double-check that the dataset isn't located in an inaccessible VPC network from BigQuery's perspective.

Greetings,

Thank you for your reply.
- The dataset is not in .zip format, it is in the format that a GCP
Datastream replication of cloud SQL Postgres creates in Big Query.
- All of the columns should be valid as they are valid in the datasets
table.
- Dataset region is 'us-central1' and lake region is 'us-central1'

Checking on the VPC possibility.

If I can't solve it, I will contact our support with cloudwerx.

Kind regards,
Etienne Deneault
Senior Software Engineer, PurePM
(PII Removed by Staff)
(PII Removed by Staff)

Here are a couple of additional troubleshooting steps you could take:

  1. Check IAM Permissions: Ensure that the IAM roles assigned to the service accounts for Dataplex and BigQuery have appropriate permissions. The service account associated with Dataplex should have necessary permissions to view and manage BigQuery resources.

  2. Inspect Schema Changes: As Datastream is responsible for CDC, any changes in the schema at the source (Cloud SQL Postgres) after the Datastream job was set up could possibly result in compatibility issues. Review the recent schema changes, if any, and compare them with the schema in BigQuery.

  3. Review Datastream Configuration: Review the Datastream configuration to ensure there's no issue during the data replication process. Check for any failures or error messages.

  4. Check BigQuery and Dataplex Configurations: Validate all the configurations again, including the zone setup in Dataplex and the BigQuery dataset settings, to make sure nothing is amiss.

I'm experiencing the same issue. All scanned data assets based on Google Cloud Storage are visible in BigQuery and allow me to leverage Dataplex's governance capabilities. However, assets that are originally BigQuery tables are not reflected in BigQuery. I've validated permissions and ensured the service account has BigQuery Admin privileges, even on the data source. According to the documentation https://cloud.google.com/dataplex/docs/common-explore-errors, only tables residing in Cloud Storage and compatible with BigQuery will be published as external tables. Any ideas on how to resolve this issue?

jbocachica_0-1717599227352.png

 

While you've correctly identified the documentation stating only Cloud Storage tables compatible with BigQuery are published as external tables, the issue might lie beyond simple compatibility. Here are some additional things to check:

  1. Table Schema Compatibility:
    • Even though the table resides in BigQuery, ensure its schema is fully compatible with BigQuery data types.
    • Dataplex might not recognize or process tables with unsupported data types like complex nested structures or specific custom types.
    • Double-check the schema definition of the BigQuery table and compare it with the supported data types listed in the BigQuery documentation.
  2. Dataplex Lake and BigQuery Region Alignment:
    • Verify that the Dataplex lake and the BigQuery dataset containing the table reside in the same region.
    • Dataplex discovery and processing might be limited within the same region for optimal performance and data access.
  3. Table Encryption and VPC Network Access:
    • Confirm that the BigQuery table is not encrypted using methods unsupported by Dataplex.
    • Additionally, ensure the BigQuery table isn't located within a Virtual Private Cloud (VPC) network that restricts access from Dataplex services.
  4. Dataplex Service Account Permissions:
    • While you've mentioned BigQuery Admin privileges, double-check if the Dataplex service account also has specific permissions granted on the individual BigQuery dataset and table you're trying to access.
    • Granular dataset and table level permissions might be required for complete visibility and governance within Dataplex. Ensure roles like BigQuery Data Viewer, BigQuery Metadata Viewer, and any specific roles related to Dataplex are assigned.
  5. Dataplex Lake Configuration and Refresh:
    • Review the Dataplex lake configuration specifically related to BigQuery data discovery and ensure it's properly enabled and configured.
    • Sometimes, a manual refresh of the Dataplex lake metadata might be necessary to pick up any recent changes or updates to the BigQuery table.

Additional Tips:

  • Utilize the Dataplex logs and monitoring tools to identify any specific error messages or warnings related to the BigQuery table discovery process.
  • Consider reaching out to Google Cloud support for further assistance if the issue persists after trying these troubleshooting steps. They can provide more in-depth analysis and potential solutions based on your specific environment and configuration.

Thanks for your help. I followed the instructions, but there were no changes. To rule out other problems, I created a new GCP environment with a new BigQuery dataset (simple structure), Dataplex, lake, zone, and scan for the BigQuery asset. I double-checked access and permissions throughout. Still, Dataplex reports all scanned BigQuery tables as incompatible.

Regards

John

Since the issue with Dataplex and BigQuery is still happening even in a new Google Cloud environment, it looks like there might be a configuration setting or limitation that needs to address.Here is an approach you can take:

Potential Issues and Additional Checks

  1. Dataplex and BigQuery: How do they work together?

    • Data Asset Types: Double-check how Dataplex handles your data. Data in Google Cloud Storage is treated differently from tables within BigQuery itself. Make sure our setup is in sync with what we're trying to manage.
    • Integration: is Dataplex is directly integrated with BigQuery datasets, not just scanning them like external sources?
  2. Scanning Capabilities of Dataplex:

    • Discovery Settings: Verify the settings for scanning BigQuery assets in Dataplex. Is it configured to include BigQuery tables, not just GCS data?
    • Asset Handling: When scanning, does Dataplex have specific rules for BigQuery tables (and are we treating native tables differently from external tables)?
  3. Compatibility Requirements:

    • Compatibility Matrix: Have you checked the Dataplex documentation for any special requirements for BigQuery tables?
    • Feature Availability: Are all the features you need for this integration enabled and up-to-date in Dataplex?

Next Steps to Troubleshoot

  • Verify Dataplex Setup:
    • Check your Dataplex configuration specifically for BigQuery datasets (not just GCS).
    • Ensure scanning includes BigQuery tables as native assets.
  • Validate Metadata:
    • Double-check metadata syncing in Dataplex.
    • Make sure BigQuery table metadata is up-to-date and configured properly.
  • Inspect Logs:
    • Look closely at Dataplex logs for any errors or warnings about BigQuery table integration.
  • Test a Minimal Setup:
    • Create a very basic BigQuery table and a simple Dataplex configuration to rule out complex issues.

Refreshing and Synchronizing Metadata:

  • Open the Dataplex console, go to your Lake and Zone, then find the option to refresh or resync metadata.
  • Keep an eye on logs during this process.