Solved: Re: Google Cloud SQL Monitoring Affecting PostgreS...

DimitarTachev · 09-26-2024 01:57 AM

Hi,

While reviewing sporadically slow queries, we found that the cloudsqladmin user is constantly executing several monitoring queries over our database. There is a perfect correlation between these monitoring queries and the sporadic performance issues we are investigating.

We have approximately 10,000 schemas and 1,300,000 tables, and the monitoring queries are constantly selecting data from system tables with millions of rows.

For example, the following Cloud SQL monitoring query was slowing down even the simplest selects by approximately 2000ms when the pg_stat_statements the extension was enabled, most likely causing lock contention:

SELECT COALESCE(SUM(pg_stat_get_live_tuples(c.oid)),0) AS n_live_tup, COALESCE(SUM(pg_stat_get_dead_tuples(c.oid)),0) AS n_dead_tup, current_timestamp FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char", 'p'::"char"]) AND (n.nspname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND n.nspname !~ '^pg_toast'::text;

Additionally, the following query is executed every 30 seconds but is always aborted (canceled due to user request) as its execution takes more than 30 seconds, leading to a 24/7 running query that never returns anything to the Cloud SQL monitoring:

SELECT d.datname, pg_catalog.pg_database_size(d.datname), current_timestamp FROM pg_catalog.pg_database d ORDER BY d.datname LIMIT $1

The two queries above are the most problematic, but we are detecting at least 25 queries constantly executed by cloudsqladmin and the application cloudsqlagent.

Is there a way to stop these queries or at least modify their frequency and timeout settings?

ms4446

In Cloud SQL, the cloudsqladmin user runs built-in monitoring queries to gather metrics for database performance and health monitoring. Unfortunately, these queries are essential for Cloud SQL's functionality, and their frequency, execution, or timeout settings are not configurable directly by the user.

Given your database's size—with approximately 10,000 schemas and 1,300,000 tables—these monitoring queries can become resource-intensive, as they frequently query system tables with millions of rows, leading to performance degradation. Here are some steps you can take to mitigate their impact:

Contact Google Cloud Support.
Optimize Database Structure:
- Consolidate Schemas and Tables: Consider reducing the number of schemas and tables if possible. This high count significantly impacts performance due to system catalog bloat.
- Partitioning: Use partitioned tables instead of creating numerous separate tables to manage data more efficiently.
Disable Resource-Intensive Extensions:
- Disable pg_stat_statements: This extension, while useful for performance monitoring, adds overhead and can cause lock contention. Disabling it can reduce strain on system resources.
Improve Autovacuum Settings:
- Ensure autovacuum is effectively cleaning up dead tuples in system catalogs. Adjust autovacuum settings specifically for system tables to reduce query overhead.
Scale Resources:
- Consider scaling up your Cloud SQL instance by increasing CPU and memory resources to handle the additional load from monitoring queries.
Optimize Logging and Flags:
- Review and adjust logging settings conservatively to minimize performance impacts while still capturing necessary metrics.

View solution in original post

ms4446

In Cloud SQL, the cloudsqladmin user runs built-in monitoring queries to gather metrics for database performance and health monitoring. Unfortunately, these queries are essential for Cloud SQL's functionality, and their frequency, execution, or timeout settings are not configurable directly by the user.

Given your database's size—with approximately 10,000 schemas and 1,300,000 tables—these monitoring queries can become resource-intensive, as they frequently query system tables with millions of rows, leading to performance degradation. Here are some steps you can take to mitigate their impact:

Contact Google Cloud Support.
Optimize Database Structure:
- Consolidate Schemas and Tables: Consider reducing the number of schemas and tables if possible. This high count significantly impacts performance due to system catalog bloat.
- Partitioning: Use partitioned tables instead of creating numerous separate tables to manage data more efficiently.
Disable Resource-Intensive Extensions:
- Disable pg_stat_statements: This extension, while useful for performance monitoring, adds overhead and can cause lock contention. Disabling it can reduce strain on system resources.
Improve Autovacuum Settings:
- Ensure autovacuum is effectively cleaning up dead tuples in system catalogs. Adjust autovacuum settings specifically for system tables to reduce query overhead.
Scale Resources:
- Consider scaling up your Cloud SQL instance by increasing CPU and memory resources to handle the additional load from monitoring queries.
Optimize Logging and Flags:
- Review and adjust logging settings conservatively to minimize performance impacts while still capturing necessary metrics.

Google Cloud SQL Monitoring Affecting PostgreSQL Performance