Does Datastream increase physical storage?

I have original table where it has physical storage of around 26.85MB which we are syncing by just running create or replace table from cloud SQL, same table I have synced to Bigquery through Datastream but it has physical storage of around 17.47GB.What is bringing huge amount of physical storage difference and can we prevent this from happening or do we have to include increased storage cost too while calculating Datastream pricing.

Pranavi_0-1695186300160.pngPranavi_1-1695186373316.png

 

0 3 190
3 REPLIES 3

Hey Pranavi,

You are observing higher physical storage costs because of your "create or replace table from cloud SQL" approach. As called out in our documentation [ref], "When you set your storage billing model to use physical bytes, the total active storage costs you are billed for include the bytes used for time travel and fail-safe storage." This means by overwriting your destination table every day, you are paying for the historical log of both time travel storage and fail-safe storage for 14 days ago (7 days for time travel storage and 7 days for fail-safe storage). Basically you are paying for an additional 14 historical copies of your table.

For this use case, the logical bytes storage billing model may be more beneficial as there's no additional cost for the time travel storage or fail-safe storage (both of these features are still available to the logical bytes storage billing model). Alternatively, you can update your data pipeline to append new data vs overwrite the whole table too.

Hi @nickorlove ,

I think the question is misunderstood here. We are syncing the same table to BigQuery in 2 ways.
1.Aurora SQL to cloud SQL through DMS and cloud SQL to BigQuery using python script where the script has "create or replace from external_query".This script syncs data to Bigquery once in a day.This table has active physical bytes 26.85MB
2.Aurora SQL to BigQuery using Datastream.This table is syncing lively. Now this table has active physical bytes 17.47GB.
So I am not understanding why there is a huge difference in active physical bytes despite they are having same data.

Hey Pranavi,

What is your configured max_staleness [ref] for your Datastream setup in #2? Is it very low like 15 minutes or less? Also, how much of your table is being updated/overwritten? Is it a large ratio?

If you have a large ratio of your table being overwritten and your staleness is configured with a low value (let's assume 5 minutes), that would mean Datastream is updating large portions of your table every 5 minutes, and each update is being retained for the duration of time travel and fail-safe, meaning your physical storage would be much larger for this environment compared to your more manually replicated environment which only replicates data once per day.

That would explain the delta.