Re: CloudSQL PostgreSQL in reboot - try to recover...

bmaehr · 05-11-2023 04:27 PM

Help!

Our production CloudSQL database is not able to start up. It starts a recovery and aften about half an hour the database shut downautomatically and it starts with the recover again.

INFO 2023-05-11T22:41:43.735402Z 2023-05-11 22:41:43.735 UTC [7]: [5-1] db=,user= LOG: received fast shutdown request
INFO 2023-05-11T22:41:43.746234Z 2023-05-11 22:41:43.746 UTC [29349]: [1-1] db=,user= LOG: shutting down
ALERT 2023-05-11T22:41:43.803170Z 2023-05-11 22:41:43.802 UTC [29348]: [1-1] db=postgres,user=pubx FATAL: the database system is starting up
INFO 2023-05-11T22:41:43.863410Z 2023-05-11 22:41:43.863 UTC [7]: [6-1] db=,user= LOG: database system is shut down
INFO 2023-05-11T22:41:56.857256Z 2023-05-11 22:41:56.856 UTC [7]: [1-1] db=,user= LOG: starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
INFO 2023-05-11T22:41:56.857916Z 2023-05-11 22:41:56.857 UTC [7]: [2-1] db=,user= LOG: listening on IPv4 address "0.0.0.0", port 5432
INFO 2023-05-11T22:41:56.858011Z 2023-05-11 22:41:56.857 UTC [7]: [3-1] db=,user= LOG: listening on IPv6 address "::", port 5432
INFO 2023-05-11T22:41:56.862951Z 2023-05-11 22:41:56.862 UTC [7]: [4-1] db=,user= LOG: listening on Unix socket "/pgsql/.s.PGSQL.5432"
INFO 2023-05-11T22:41:56.871604Z 2023-05-11 22:41:56.871 UTC [28]: [1-1] db=,user= LOG: database system was shut down in recovery at 2023-05-11 22:41:43 UTC
INFO 2023-05-11T22:41:56.873870Z 2023-05-11 22:41:56.873 UTC [28]: [2-1] db=,user= LOG: database system was not properly shut down; automatic recovery in progress
INFO 2023-05-11T22:41:56.879619Z 2023-05-11 22:41:56.879 UTC [28]: [3-1] db=,user= LOG: redo starts at 5FAB/1201EA00

ms4446

Here are some steps you can take to address this problem:

Check available disk space: Insufficient disk space can lead to recovery failures. Verify that your CloudSQL instance has enough available disk space for the recovery process to complete successfully. If necessary, increase the disk space allocation. (If the Enable automatic storage Increases option is not enabled)
Review recovery settings: Examine the recovery settings of your CloudSQL instance. Check if there are any specific recovery options or parameters set that might be causing issues. Ensure that the recovery configuration is appropriate for your database's requirements.
Examine database corruption: Database corruption can prevent successful recovery. Run integrity checks on your database to identify and repair any potential corruption issues. You can use tools like pg_rewind or pg_verify_checksums to detect and resolve corruption problems.
Consider instance size and performance: If your CloudSQL instance is undersized or experiencing performance issues, it could impact the recovery process. Evaluate the instance size and consider upgrading to a larger machine type to ensure sufficient resources for the recovery process.
Review database logs: Analyze the database logs during the recovery process to identify any error messages or warnings that could provide insights into the underlying issue. Look for any specific error codes or error messages that might help in troubleshooting the problem.

If the issue persists despite your troubleshooting efforts, it's recommended to reach out to Google Cloud Support for further assistance.

bmaehr

Hello ms4446,

Thank you for your support.
The problem was solved by the "Product Engineering Team" (2nd level support) after opening a ticket.
I don't know what they did but as far I understood it was no setting I as a customer would have been able to change. The recovery run successful - it was running about 4 hours where as before the database was restarted ever 30 minutes.
The support suspected, that the recovery crashed because of to less memory (the instance had 80 GB!), because during the recovery process it used more memory than the limit. The memory available seems to be reserved for the database and cache, but if recovery runs perhaps something else needs additional memory which was not taken in account on the intial memory allocation.

Could you explain which "recovery settings" you mean? Am I as customer able to change them?

ms4446

In Google Cloud SQL for PostgreSQL, there are several recovery settings that you can configure to manage the data recovery process. Options that are configurable can be found here:

https://cloud.google.com/sql/docs/postgres/instance-settings

https://cloud.google.com/sql/docs/postgres/flags

CloudSQL PostgreSQL in reboot - try to recover - loop