Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Feature Request: Support for Approximate Quantile Extensions (DataSketches KLL, t-digest) in Cloud S

Hi Google Cloud SQL Team & Community,

We are using Cloud SQL for PostgreSQL and rely heavily on its capabilities. We have a use case where we need to efficiently calculate approximate quantiles (like median, percentiles, and potentially averages based on unique underlying entities) directly within our database queries.

Our Scenario:

We aggregate user feedback data (e.g., ratings) which is linked to interaction spans and organized into hierarchical clusters. A key requirement is to aggregate metrics up the hierarchy, ensuring each unique interaction span contributes only once. While the supported postgresql-hll extension works perfectly for calculating unique counts (like the volume of unique spans) efficiently using HLL sketches, we lack a comparable solution for value-based aggregations like average ratings or percentile distributions that respect this uniqueness rule at scale.

The Need:

Performing exact calculations (percentile_cont, etc.) or joins that retain unique IDs across the hierarchy are often too slow or resource-intensive due to the scale of the data. We need an efficient, in-database method for approximate quantile/distribution analysis, similar to how HLL handles counts.

Feature Request:

We request that Google Cloud consider adding support for one or both of the following well-established PostgreSQL extensions to Cloud SQL for PostgreSQL:

  1. Apache DataSketches (datasketches-postgresql): Specifically, the KLL sketch functionality within this library provides mergeable, approximate quantile sketches. This would align well with the existing HLL support.
  2. t-digest (tvondra/tdigest): Another popular extension offering mergeable sketches for approximate quantiles and rank-based statistics.

Why in Cloud SQL?

Having these capabilities directly within Cloud SQL would allow us to perform these complex aggregations efficiently without resorting to:

  • Moving large datasets out of Cloud SQL for external processing.
  • Implementing significantly slower, less accurate approximations within standard SQL.
  • Migrating to self-managed PostgreSQL on Compute Engine solely to install these extensions (which adds considerable operational overhead).

Adding support for robust approximate quantile sketching extensions like DataSketches KLL or t-digest would significantly enhance Cloud SQL's capabilities for analytical workloads dealing with large datasets and complex aggregation needs.

Could the Cloud SQL team investigate the feasibility of supporting these extensions?

Thank you!

Solved Solved
0 2 183
1 ACCEPTED SOLUTION

Hi @romainrey,

Welcome to Google Cloud Community!

Since you're relying on Cloud SQL and need approximate quantile capabilities that respect uniqueness (similar to how HLL handles counts), the best next step is to submit a feature request through the Google Cloud Issue Tracker. Google actively monitors this channel, and many existing extensions were added based on customer demand through that process.

Make sure to clearly describe your use case, the performance and scalability needs, and how the lack of KLL or t-digest limits your current workflows. Framing it around enabling efficient in-database analytics without moving to self-managed solutions will help your case.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

2 REPLIES 2

Hi @romainrey,

Welcome to Google Cloud Community!

Since you're relying on Cloud SQL and need approximate quantile capabilities that respect uniqueness (similar to how HLL handles counts), the best next step is to submit a feature request through the Google Cloud Issue Tracker. Google actively monitors this channel, and many existing extensions were added based on customer demand through that process.

Make sure to clearly describe your use case, the performance and scalability needs, and how the lack of KLL or t-digest limits your current workflows. Framing it around enabling efficient in-database analytics without moving to self-managed solutions will help your case.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Thank you @mcbsalceda , just created an issue here (https://issuetracker.google.com/issues/413322052).

 

Thanks!