Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to filter Bigtable data conditioning on the absence of the column qualifier?

Hi,

Is there a way to filter, on the server side, Bigtable data based on the absence of a column qualifier?

For example, scan the table and give me all row keys of rows without column "price".

 

Thanks

 

 

 

 

1 1 1,568
1 REPLY 1

Bigtable doesn't directly support server-side filtering of data based on the absence of a column qualifier. The filters in Bigtable are primarily designed to include data that meet specific conditions, not to exclude it.

To identify rows that lack a certain column qualifier, you would need to implement a two-step process as you've described. However, this could be resource-intensive for large tables as it involves scanning the entire table twice.

Currently, there isn't a more efficient method to achieve this in Bigtable.

Nevertheless, if you're only interested in a small subset of rows, there's a potential workaround using the RowFilter.Interleave filter. This filter allows you to combine two filters: one that matches rows with the column qualifier and another that matches all rows. The RowFilter.Interleave filter will return all rows that match either filter.

Here's an example of how you can find all rows that do not have the column "price":

import com.google.cloud.bigtable.data.v2.RowFilter;
import com.google.cloud.bigtable.data.v2.RowFilter.ColumnQualifierFilter;
import com.google.cloud.bigtable.data.v2.RowFilter.Interleave;

public class MyClass {

public static void main(String[] args) {
// Create a RowFilter that matches rows with the column "price".
RowFilter priceFilter = RowFilter.newBuilder()
.setColumnQualifierFilter(ColumnQualifierFilter.newBuilder()
.setFamilyName("product")
.setQualifier("price")
.build())
.build();

// Create a RowFilter that matches all rows.
RowFilter allRowsFilter = RowFilter.newBuilder()
.build();

// Create a RowFilter that matches rows without the column "price".
RowFilter notPriceFilter = RowFilter.newBuilder()
.setInterleave(Interleave.newBuilder()
.addFilters(priceFilter)
.addFilters(allRowsFilter)
.build())
.build();

// Scan the table and get all row keys that match the filter.
Table table = ...;
Iterable<Row> rows = table.readRows(notPriceFilter);
for (Row row : rows) {
System.out.println(row.getRowKey());
}
}
}

This code scans the table and prints all row keys of rows that do not have the column "price".

Please note that this workaround is suitable only when dealing with a small number of rows. For larger datasets, you'll need to use the two-step process. See Below:

  1. Use a filter to get all rows that have the column qualifier.
  2. Scan the entire table and find the difference between the full set of rows and the rows from step 1.

Please note that this process could be resource-intensive for large tables as it involves scanning the entire table twice.