Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Bigquery storage conversion from logical to compressed

Hi,

We have schemas in Bigquery where we do daily ingestion to raw and then refine layer.

We are currently planning to move logical storage to compressed storage in order to save the cost but I wanted to check

1. would it impact our daily ingestion process in same BQ schemas?

2. The newly ingested data will belong to logical storage or compressed storage?

Thanks.

Solved Solved
2 7 2,010
1 ACCEPTED SOLUTION

  1. Moving from logical to compressed storage will not impact your daily ingestion process. You can continue to ingest data into the same schemas as before.
  2. The newly ingested data will belong to compressed storage. This is because compressed storage is the default storage type for new tables.

Here are some things to keep in mind when moving to compressed storage:

  • Compressed storage can save you up to 50% on storage costs.
  • Compressed storage can slightly increase query latency.
  • You cannot downgrade a table from compressed storage to logical storage.

View solution in original post

7 REPLIES 7

  1. Moving from logical to compressed storage will not impact your daily ingestion process. You can continue to ingest data into the same schemas as before.
  2. The newly ingested data will belong to compressed storage. This is because compressed storage is the default storage type for new tables.

Here are some things to keep in mind when moving to compressed storage:

  • Compressed storage can save you up to 50% on storage costs.
  • Compressed storage can slightly increase query latency.
  • You cannot downgrade a table from compressed storage to logical storage.

Thanks so much for the help. Just last question Can we also perform other dmls like delete or update directly on compressed data? 

Yes, you can perform other DML like Update and Delete directly on compressed data.

There should be no latency changes when moving between managed logical and managed physical storage.  It is only a billing flag that is being changed, no changes to the underlying data.

Also, you can change back and forth between logical and physical, although there is a 14 day waiting period between each change.  This was a new change as of GA July 5.

 

Moving from logical storage to compressed storage in BigQuery should not directly impact your daily ingestion process. The change in storage type mainly affects how the data is stored and billed, rather than the ingestion process itself.

When you ingest new data into BigQuery, it will still be stored in the same tables within your schemas. However, the storage type of the new data will depend on the table configuration.

If you configure your tables to use compressed storage, any new data ingested will be stored in compressed format. The compressed storage type uses columnar storage and advanced compression techniques to reduce the amount of storage required. This can lead to cost savings, especially for data that has high redundancy or compressibility.

It's worth noting that existing data in your tables will remain in logical storage unless you explicitly convert it to compressed storage. The storage type is determined at the table level, so you can have a mix of logical and compressed storage within the same dataset and schema.

To convert existing data from logical to compressed storage, you would need to create a new table with the desired compressed storage configuration and then copy the data from the logical storage table to the new compressed storage table. Once the data is copied, you can drop the original table if it's no longer needed.

It's recommended to thoroughly test the impact of the storage conversion on your specific workload and query patterns to ensure it meets your performance and cost requirements. If my advices did not help you, you can seek help from experts in Data Analytics https://tech-stack.com/services/big-data-and-analytics 

Hi there,

Just to clarify and make a few corrections for BigQuery:

Is it common to see a delay in the billing information seen in the billing console reports?  I enabled it across all datasets in a project 3 days ago but I had to wait 2 days before some (not all) bytes were reported as physical.