Whats the best way to read rows with multiple filters (more than two) in JAVA?

We have a scenario where from UI, the user can send multiple filters in request payload & on the backend data should be read from bigtable adhering to passed filters.
Considering the same example from documentation page :

ashwanikumar415_0-1699267680212.png

But how can I get all rows with (with all columns or at least some of them mentioned as part of the query object) where "data_plan_05gb = true AND connected_wifi = 1 AND connected_cell = 0 " ?
So far, I have not found even a single example anywhere with multiple filters 
Please help me with the above code our company is considering big table
Thank you in advance

1 8 1,462
8 REPLIES 8

Here's how you can read rows with multiple filters in Google Cloud Bigtable using Java:

 

import com.google.cloud.bigtable.hbase.BigtableConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public class BigtableReadWithFilters {
    public static void main(String[] args) {
        // Configure Bigtable connection
        String projectId = "your-project-id";
        String instanceId = "your-instance-id";
        Connection connection = BigtableConfiguration.connect(projectId, instanceId);

        try {
            TableName tableName = TableName.valueOf("mobile-time-series");
            Table table = connection.getTable(tableName);

            // Create a new Scan object
            Scan scan = new Scan();

            // Create a filter list with MUST_PASS_ALL (AND) operator
            FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL);

            // Filter for data_plan_05gb = true
            SingleColumnValueFilter dataPlanFilter = new SingleColumnValueFilter(
                Bytes.toBytes("stats_summary"),
                Bytes.toBytes("data_plan_05gb"),
                CompareFilter.CompareOp.EQUAL,
                Bytes.toBytes("true"));
            filters.addFilter(dataPlanFilter);

            // Filter for connected_wifi = 1
            SingleColumnValueFilter wifiFilter = new SingleColumnValueFilter(
                Bytes.toBytes("stats_summary"),
                Bytes.toBytes("connected_wifi"),
                CompareFilter.CompareOp.EQUAL,
                Bytes.toBytes("1"));
            filters.addFilter(wifiFilter);

            // Filter for connected_cell = 0
            SingleColumnValueFilter cellFilter = new SingleColumnValueFilter(
                Bytes.toBytes("stats_summary"),
                Bytes.toBytes("connected_cell"),
                CompareFilter.CompareOp.EQUAL,
                Bytes.toBytes("0"));
            filters.addFilter(cellFilter);

            // Add the filter list to the scan
            scan.setFilter(filters);

            // Execute the scan
            try (ResultScanner scanner = table.getScanner(scan)) {
                for (Result result : scanner) {
                    // Process each row
                    // ...
                }
            }
        } catch (IOException e) {
            // Handle exception
        } finally {
            // Close the connection
            try {
                if (connection != null && !connection.isClosed()) {
                    connection.close();
                }
            } catch (IOException e) {
                // Handle exception
            }
        }
    }
}

thanks for your reply. Is it not possible without hadoop.hBase ?
Is it possible to use condition like below : 

Filter filter =
        FILTERS
            .condition(
                FILTERS
                    .chain()
                    .filter(FILTERS.qualifier().exactMatch("data_plan_05gb"))
                    .filter(FILTERS.value().exactMatch("true")))
            .then(FILTERS.label("passed-filter"))
            .otherwise(FILTERS.label("filtered-out"));
// i dont know how further i can add more filters like : "connected_wifi = 1 AND connected_cell = 0"

Yes, however the Java client for Bigtable provides a different way to interact with Bigtable. The Java client for Bigtable provides the Filters class for building complex filter conditions. While Filters.condition() serves specific needs like applying filters based on matched conditions, it's not suitable for a straightforward AND operation.

Implementing AND Logic with Filters:

For achieving an AND operation, you can leverage the following methods:

  • Filters.interleave(): Combines multiple filters, returning rows that match any of the provided filters.
  • Filters.chain(): Creates a sequence of filters where a row must match all filters to be included. This effectively performs an AND operation.

Here's an example demonstrating how to chain multiple filters using Filters.chain() in the Java client:

 
import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.Filters;
import com.google.cloud.bigtable.data.v2.models.Query;
import com.google.cloud.bigtable.data.v2.models.Row;
import com.google.api.gax.rpc.ServerStream;

public class BigtableReadWithFilters {
    public static void main(String[] args) {
        try (BigtableDataClient dataClient = BigtableDataClient.create("your-project-id", "your-instance-id")) {
            // Create a query with chained filters
            Query query = Query.create("mobile-time-series")
                    .filter(Filters.FILTERS.chain()
                            .filter(Filters.FILTERS.qualifier().exactMatch("data_plan_05gb"))
                            .filter(Filters.FILTERS.value().exactMatch("true"))
                            .filter(Filters.FILTERS.qualifier().exactMatch("connected_wifi"))
                            .filter(Filters.FILTERS.value().exactMatch("1"))
                            .filter(Filters.FILTERS.qualifier().exactMatch("connected_cell"))
                            .filter(Filters.FILTERS.value().exactMatch("0"))
                    );

            // Read rows based on the query
            ServerStream<Row> rows = dataClient.readRows(query);
            for (Row row : rows) {
                // Process each row
                // ...
            }
        } catch (Exception e) {
            // Handle exception
        }
    }
}

hello @ms4446 
multiple chaining like this is not working and will not work.
because the first filter will only return columns adhering to the first condition. i.e 
with the below filter :

                            .filter(Filters.FILTERS.qualifier().exactMatch("data_plan_05gb"))
                            .filter(Filters.FILTERS.value().exactMatch("true"))

 Only data_plan_05gb column will be returned. And next filter chaining will look for the connected_wifi column which will not be present.
Have you tried running the above code?

Hi @ashwanikumar415 

Apologies, I see chaining filters as I previously suggested would not work because each filter would narrow down the data to the point where subsequent filters might not have any data to operate on.

To apply multiple filters in Bigtable, you should use a FilterList with a MUST_PASS_ALL operator (which acts as an AND operator between filters). However, this is specific to the HBase client for Bigtable.

For the native Google Cloud Bigtable client, you would use Interleave for OR conditions and Chain for AND conditions, but with the understanding that each filter in a Chain operates on the previously matched data. To match multiple conditions across different columns, you should use Condition filters within the Chain.

Here's an example of how you might apply multiple filters across different columns using the native Google Cloud Bigtable client:

 

import com.google.api.gax.rpc.NotFoundException;
import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.Filters;
import com.google.cloud.bigtable.data.v2.models.Filters.Filter;
import com.google.cloud.bigtable.data.v2.models.Query;
import com.google.cloud.bigtable.data.v2.models.Row;

public class BigtableReadWithFilters {
    public static void main(String[] args) {
        // Initialize the Bigtable data client
        try (BigtableDataClient dataClient = BigtableDataClient.create("your-project-id", "your-instance-id")) {
            // Create a filter for data_plan_05gb = true
            Filter dataPlanFilter = Filters.FILTERS
                    .qualifier().exactMatch("data_plan_05gb")
                    .then(Filters.FILTERS.value().exactMatch("true"));

            // Create a filter for connected_wifi = 1
            Filter wifiFilter = Filters.FILTERS
                    .qualifier().exactMatch("connected_wifi")
                    .then(Filters.FILTERS.value().exactMatch("1"));

            // Create a filter for connected_cell = 0
            Filter cellFilter = Filters.FILTERS
                    .qualifier().exactMatch("connected_cell")
                    .then(Filters.FILTERS.value().exactMatch("0"));

            // Combine filters with a condition that all must pass
            Filter allFilters = Filters.FILTERS.chain()
                    .filter(dataPlanFilter)
                    .filter(wifiFilter)
                    .filter(cellFilter);

            // Create a query with the combined filter
            Query query = Query.create("mobile-time-series").filter(allFilters);

            // Read rows using the query
            for (Row row : dataClient.readRows(query)) {
                // Process each row
                // ...
            }
        } catch (NotFoundException e) {
            System.err.println("Failed to read from Bigtable: " + e.getMessage());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The code you suggested above is exactly the same as the previous code (the only difference is different filters in different variables now).
The problem with chain() is it will exclude all other columns and only return the column from the first chain().
So is it really not possible to have a set of rows with the below condition?
"data_plan_05gb = true AND connected_wifi = 1 AND connected_cell = 0 ".
Could you please share a code snippet of how it's done? So far I have not seen anywhere code with multiple conditions, that was my original question also.
Please help me with if it possible with the code 

In practice, to achieve an AND-like behavior, you may need to perform additional client-side filtering after retrieving the rows from Bigtable. This is because Bigtable filters are designed to work on a single column at a time, and combining multiple filters to work across different columns is not straightforward.

If you need to perform complex multi-column filtering, you might need to reconsider your data model or use a different database that supports such queries natively. Alternatively, you could retrieve the rows that match any of the conditions and then perform additional filtering in your application code to find rows that match all conditions.

Filters.Filter filter = FILTERS.chain()
        .filter(FILTERS.condition(
                        FILTERS.chain().filter(FILTERS.qualifier().exactMatch(columnName))
                                .filter(FILTERS.value().exactMatch(value))
            ).then(FILTERS.pass()))
        .filter(FILTERS.condition(
                FILTERS.chain().filter(FILTERS.qualifier().exactMatch(columnName))
                                .filter(FILTERS.value().exactMatch(value))
            ).then(FILTERS.pass()))


Query query = Query.create(tableName) // use .prefix() or .rowKey() here to narrow search    
                .filter(filter)