Cloud Bigtable : Is there Garbage Collection Callback or Listener

I was looking at Bigtable Garbage collection based on TTL.

Bigtable GC is on Column Family level : https://cloud.google.com/bigtable/docs/garbage-collection#overview

A garbage collection policy is a set of rules you create that state when data in a specific column family is no longer needed.

Expiring values (age-based): https://cloud.google.com/bigtable/docs/garbage-collection#age-based

  • You can set a garbage collection rule based on the timestamp for each cell. For example, you might not want to keep any cells with timestamps more than 30 days before the current date and time. With this type of garbage collection rule, you set the time to live (TTL) for data. Bigtable looks at each column family during garbage collection and deletes any cells that have expired.

Is a row automatically deleted if all the values are garbage-collected?

https://stackoverflow.com/questions/54616872/is-a-row-automatically-deleted-if-all-the-values-are-ga...

>> If all cells are garbage collected for a row key, then the row is indeed deleted. Note that garbage collection is asynchronous, and it can take up to a week for data to be totally removed.

When Data is deleted : 

Question:

I am trying to find out if there is a way for an application to listen to (or Calledback) on which rowKey(s) are being deleted by Bigtable async process... Or is there a way to setup a pub-sub or kafka topic where the deleted rowKeys will be posted.

I need this information to sync-up some of the other application data we keep in Elastic Search based on these rowKeys

Solved Solved
0 2 94
1 ACCEPTED SOLUTION

Hi @viveksharma0wmt,

Welcome to Google Cloud Community!

I would suggest using Google Cloud PubSub for this as you can create a schema and view the change logs.

Please check this documentation on Stream changes to Pub/Sub using optional Cloud Function trigger as this contains the steps and sample schema that you can follow that could be useful to your setup.

Hope this helps.

View solution in original post

2 REPLIES 2

Hi @viveksharma0wmt,

Welcome to Google Cloud Community!

I would suggest using Google Cloud PubSub for this as you can create a schema and view the change logs.

Please check this documentation on Stream changes to Pub/Sub using optional Cloud Function trigger as this contains the steps and sample schema that you can follow that could be useful to your setup.

Hope this helps.

Thanks @robertcarlos for you reply.

Looked at the document you mentioned, this looks good.

Looking at "Configure change stream" doc it says :

A change stream tracks data changes made by calls to the Bigtable Data API, during garbage collection, and when ranges of rows are dropped. Changes resulting from schema changes, like deletions from dropping a column family, are not captured in a change stream

Few follow up questions :

  1. I suppose the change logs will be sent for every type of change, insert into the table, update to cells and delete cells. I was wondering if there is an option to only enable change stream for garbage collection that too when row deletes happen. This is more from trying to reduce the costs of maintaining this pub-sub topic when all we are interested in is row deletion events from GC. In usual day to day business there would be mostly row inserts / updates to cells.
  2. What would a row delete event look like, do you have a sample handy ? I see one example where the pub sub topic has data like :  
    Pub/Sub message: {"rowKey":"user789","modType":"SET_CELL","isGC":false,"tieBreaker":0,"columnFamily":"cf","commitTimestamp":1695653833064548,"sourceInstance":"YOUR-INSTANCE","sourceCluster":"YOUR-INSTANCE-c1","sourceTable":"change-streams-pubsub-tutorial","column":{"bytes":"col1"},"timestamp":{"long":1695653832278000},"timestampFrom":null,"timestampTo":null,"value":{"bytes":"ghi"}}

and I see in that document the possible values for mod type enum are : "name": "ModType", "type": "enum", "symbols": ["SET_CELL", "DELETE_FAMILY", "DELETE_CELLS", "UNKNOWN"]}

not immediately clear what the type will be when GC async deletes the entire rowKey when all its cells have been deleted.

One other question I had is that is there a way to use Kafka topic instead of pub-sub for this change events stream from bigtable ?