Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

CloudSQL instance failed to update Postgres and now its stuck in maintenance mode.

We had a routine database version upgrade from Postgres 15 to Postgres 17.  21 instances upgraded without issues but one did not.

Instance information:

  • db-custom-1-3840
  • maintenanceVersion: POSTGRES_15_8.R20240910.01_02
  • 10gb sdd
  • HA mode
  • database size approx 4 GB
  • located  in europe-west3-c

Symptoms:

After initiating the upgrade ~15min passed and then the error was displayed in Operations and logs:

akorolkovs_0-1731781507882.png

The instance itself shows that it is under maintenance.

Troubleshooting:

  1. All actions such as restart/patch/failover from WEB-UI and gcloud not working which is expected for instance that is in maintenance mode.
  2. Looking at logs did not show any new information. Logs ended on instance shutdown.
  3. After waiting a few hours backup restore was initiated.  In the same region and same specifications new instance was created.  The process did not finish in two hours. This time no errors are visible. Now new instance is also stuck in maintenance mode.
  4. Then,  the third instance was created now in europe-north1-b. Same result as with the previous attempt. It's stuck for 20 at this time.

Edit: The second restore attempt failed with the same error in  Operations and logs after 2 hours.

0 1 167
1 REPLY 1

Hi @a-korolkovs

It looks like your Cloud SQL instance got stuck due to a few potential issues, according to the troubleshooting documentation. Here are some common reasons this might happen:

  • Large Temporary Data Size: The instance can get stuck if there's a lot of temporary data being created during the upgrade or high query load, especially when it exceeds the available disk space. This can happen if many temporary tables are created at once.
  • Fatal Upgrade Error: Sometimes, a fatal error can occur during an upgrade, which leaves the instance stuck in maintenance mode, unable to complete the upgrade process.
  • Running Out of Disk Space: If the instance runs out of disk space, especially during an upgrade, it can get stuck on restart.

You may try the following options to resolve the issue:

  • Temporary Tables and Storage: If large temporary tables are causing the issue, one workaround is to create them with ROW_FORMAT=COMPRESSED. This stores the temporary tables in file-per-table tablespaces within the temporary file directory, which can help reduce the load on the instance. However, note that this might come with a performance tradeoff, as creating and removing these tablespaces can be slower.
  • Restart the Instance: Unfortunately, the only way to shrink the ibtmp1 file is by restarting the service, which can help clear up any excess temporary data that’s clogging up the system.
  • Automatic Storage Increase: If your instance runs out of storage, and automatic storage increase isn’t enabled, the instance will go offline. To prevent this, you can enable automatic storage increase for future instances. This ensures your instance can scale up automatically when needed, avoiding outages.
  • Logs Are Limited: If the logs aren’t providing much insight, it may be time to reach out to Google Cloud Support. They can help force the recreation of the instance if needed.

Hope this helps!