Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

What is incurring Dataplex costs?

I have a little Hello World Dataplex project where I've created a lake, added 2 zones, and added a few assets to each zone in the form of Cloud Buckets.  I've added a couple very small (10 or so items per file) json files to the assets.  I also intentionally added an illformed json file to one bucket to test what would happen.  Overall, I have maybe a few megabytes of data stored.

I've also been playing with Big Query. I've directly loaded a few 10's of thousands of rows of data (less than 100,000 rows total) directly into the BQ system either by adding JSON directly to the file storage API or manually uploading JSON files. I've run a few queries and played around with a few Looker worksheets.

The last time I played with the system I also turned on the beta features that were being advertised. I think it was Python notebooks and maybe something along the lines of versioning or something?

The last time I played with the system was on a Wednesday.  On Friday I got an alert that I had incurred one dollar worth of Dataplex charges.  I thought maybe there was a delay in some action I had done reaching the billing page.  Over the next 24 hours, I incurred an additional $3 worth of Dataplex charges.  I didn't even log into the system, so something is being automatically run to incur charges, but I cannot think of what it could be.

 

workmaster2n_0-1708203836225.png

This is the only dataplex charge on my account.  

Any ideas how to track down what this processing is coming from?

I've disabled billing on the project so as to stop accruing charges.

Thanks!

1 6 2,112
6 REPLIES 6

We can look up the pricing sheet on Dataplex found here.  This shows us the distinct potential charges that can be levied based upon Dataplex usage.  In there, we find a description of the SKU for Dataplex Processing.  That points us to the Discover data page.  What is likely happening is that when you configured your assets in zones and lakes, you switched on (or left default) the discovery feature.  What this does is periodically and automatically scan the GCS buckets looking for new or changes objects in the bucket.  When they are detected, Dataplex indexes those objects into the Data Catalog and creates external BigQuery tables referencing them.  Since Google has to spend resources to perform this work, these costs are passed on to you.  It looks like there is an option to either switch off discovery or change its schedule (eg. to once a day or once a week).

Hi Kolban - Thank you for that reply! I had kind of thought it might be related to auto discovery, but that didn't fit with the amount I was being charged. If I read that pricing correctly, The charge is $0.06 per hour of compute time, and we get 100 free hours.

workmaster2n_0-1708396271483.png

I checked my data lake, and I have 1 zone, with 3 assets in that zone. Each asset was configured for hourly data discovery. Each data discovery took < 10 minutes

workmaster2n_1-1708396338968.png

workmaster2n_2-1708396409516.png

If we multiply that out, we have 3 assets, 10 minutes per hour, 24 hours a day, 6 cents per hour.  That is 720 minutes, or 12 hours per day times 6 cents should be 72 cents per day. I had $3.95 in one day.
Am I misreading the logs about Discovery? Or are DCU's calculated differently than I am thinking?

*EDIT* I just verified that Data Lineage is not enabled, so I don't think I'd be using the Premium DCU.

Thanks so much!

 

 

I've noticed that this is converting to Australian Dollars (I thought I had my account billed in USD). USD->AUD is about 1.5, so instead of 6 cents per hour, it should be 9 cents per hour.  That would be just over $1 per day. I'm still off by a factor of 4 somehow.

workmaster2n_3-1708400506371.png

 

Well ... I have many projects that I use for sandbox testing and I seemed to remember I created one for playing with Dataplex.  Like you, I did NOT load in huge amounts of data, merely samples.  I looked back through my project history of spend and, just like you, I found charges of about $3-$4 a day.  Apparently, I switched something off (likely I deleted the project) ... but I can't remember for sure what I did.  The point of this post is to also share that you are not alone in having a consumption cost that neither of us can fully explain.  The SKU that I am seeing being charged for is: "Dataplex Processing (milli DCU-hr) Iowa".

kolban_0-1708402021009.png

 

I have the same issue, I have no idea why I'm using the premium processing instead of standard, and I couldn't find any place to switch or I tried to set up a new project and I'm not aware of anywhere to make the choice between premium and standard. It's terrible

I have the same issue, I already disabled the dataplex API and deleted the data lake as well but I still see this $3-$4 charges and can't find out why exactly we're receiving this. Any update here?