Dataplex Datascan Pricing and Publishing

Hello, I'm new to GCP and I'm trying to use Data Profiling from Dataplex. I have a couple of questions:

  1. Is there a way to know how much a specific DataScan costs? For example, I created and ran a DataScan to profile a table. I would like to know how much it costs.
  2. I created a Python script that creates and runs a DataScan DataProfiling, but I haven't found a way to publish the profiling into the BigQuery UI through code.
Solved Solved
0 1 790
1 ACCEPTED SOLUTION

The cost of running a DataScan in Google Cloud's Dataplex is influenced by a variety of factors, including:

  • Dataplex Compute Unit (DCU) usage: DCUs are a measure of the compute resources used by Dataplex. The cost of DCU usage is based on the type of usage (standard, premium, or shuffle storage) and the duration of the usage.
  • Usage of underlying services:When you run a DataScan,Dataplex may use other Google Cloud services, such as BigQuery and Dataproc Serverless. The cost of using these services is in addition to the cost of DCU usage.
  • Data size and complexity: The size and complexity of the data being scanned can also affect the cost of a DataScan. For example,scanning a large dataset with a complex schema will typically be more expensive than scanning a small dataset with a simple schema.

Unfortunately, there is no single formula that can be used to calculate the exact cost of a DataScan. However, you can estimate the cost by considering the factors listed above.

In addition to the above, the following factors may also affect the cost of a DataScan:

  • Frequency of scans: If you run DataScans on a regular basis, you may be able to negotiate a discounted rate with Google Cloud.
  • Commitment level: If you make a commitment to use a certain amount of Dataplex resources,you may be able to get a discounted rate.

To get a more accurate estimate of the cost of running DataScans, you can use the Google Cloud Pricing Calculator.

Regarding your second question:

There is currently no documented way to publish DataScan profiling results into the BigQuery UI through code. However, there may be a way to automate this process using Apache Airflow or other APIs or libraries

View solution in original post

1 REPLY 1

The cost of running a DataScan in Google Cloud's Dataplex is influenced by a variety of factors, including:

  • Dataplex Compute Unit (DCU) usage: DCUs are a measure of the compute resources used by Dataplex. The cost of DCU usage is based on the type of usage (standard, premium, or shuffle storage) and the duration of the usage.
  • Usage of underlying services:When you run a DataScan,Dataplex may use other Google Cloud services, such as BigQuery and Dataproc Serverless. The cost of using these services is in addition to the cost of DCU usage.
  • Data size and complexity: The size and complexity of the data being scanned can also affect the cost of a DataScan. For example,scanning a large dataset with a complex schema will typically be more expensive than scanning a small dataset with a simple schema.

Unfortunately, there is no single formula that can be used to calculate the exact cost of a DataScan. However, you can estimate the cost by considering the factors listed above.

In addition to the above, the following factors may also affect the cost of a DataScan:

  • Frequency of scans: If you run DataScans on a regular basis, you may be able to negotiate a discounted rate with Google Cloud.
  • Commitment level: If you make a commitment to use a certain amount of Dataplex resources,you may be able to get a discounted rate.

To get a more accurate estimate of the cost of running DataScans, you can use the Google Cloud Pricing Calculator.

Regarding your second question:

There is currently no documented way to publish DataScan profiling results into the BigQuery UI through code. However, there may be a way to automate this process using Apache Airflow or other APIs or libraries