Hi Google Cloud SQL Team & Community,
We are using Cloud SQL for PostgreSQL and rely heavily on its capabilities. We have a use case where we need to efficiently calculate approximate quantiles (like median, percentiles, and potentially averages based on unique underlying entities) directly within our database queries.
Our Scenario:
We aggregate user feedback data (e.g., ratings) which is linked to interaction spans and organized into hierarchical clusters. A key requirement is to aggregate metrics up the hierarchy, ensuring each unique interaction span contributes only once. While the supported postgresql-hll extension works perfectly for calculating unique counts (like the volume of unique spans) efficiently using HLL sketches, we lack a comparable solution for value-based aggregations like average ratings or percentile distributions that respect this uniqueness rule at scale.
The Need:
Performing exact calculations (percentile_cont, etc.) or joins that retain unique IDs across the hierarchy are often too slow or resource-intensive due to the scale of the data. We need an efficient, in-database method for approximate quantile/distribution analysis, similar to how HLL handles counts.
Feature Request:
We request that Google Cloud consider adding support for one or both of the following well-established PostgreSQL extensions to Cloud SQL for PostgreSQL:
Why in Cloud SQL?
Having these capabilities directly within Cloud SQL would allow us to perform these complex aggregations efficiently without resorting to:
Adding support for robust approximate quantile sketching extensions like DataSketches KLL or t-digest would significantly enhance Cloud SQL's capabilities for analytical workloads dealing with large datasets and complex aggregation needs.
Could the Cloud SQL team investigate the feasibility of supporting these extensions?
Thank you!
Solved! Go to Solution.
Hi @romainrey,
Welcome to Google Cloud Community!
Since you're relying on Cloud SQL and need approximate quantile capabilities that respect uniqueness (similar to how HLL handles counts), the best next step is to submit a feature request through the Google Cloud Issue Tracker. Google actively monitors this channel, and many existing extensions were added based on customer demand through that process.
Make sure to clearly describe your use case, the performance and scalability needs, and how the lack of KLL or t-digest limits your current workflows. Framing it around enabling efficient in-database analytics without moving to self-managed solutions will help your case.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @romainrey,
Welcome to Google Cloud Community!
Since you're relying on Cloud SQL and need approximate quantile capabilities that respect uniqueness (similar to how HLL handles counts), the best next step is to submit a feature request through the Google Cloud Issue Tracker. Google actively monitors this channel, and many existing extensions were added based on customer demand through that process.
Make sure to clearly describe your use case, the performance and scalability needs, and how the lack of KLL or t-digest limits your current workflows. Framing it around enabling efficient in-database analytics without moving to self-managed solutions will help your case.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Thank you @mcbsalceda , just created an issue here (https://issuetracker.google.com/issues/413322052).
Thanks!