Re: never ending maintenance for gcp sql instance

purusottam · 06-20-2023 06:29 PM

Need help from community.

We have a db instance (SQL Server 2019 Standard) which has been under maintenance for last 2 days. I can see that the last maintenance event failed with some INTERNAL_ERROR. But, on the console, it's still showing that the Instance is under maintenance and I am unable to edit or delete the instance either.

I ran "gcloud sql operations list --instance=<instance_name>" and I can see that the last operation failed with INTERNAL_ERROR.

We are paying while we wait for the maintenance operation to complete.

Any help is much appreciated.

ms4446

I'm sorry to hear about the issues you're experiencing with your CloudSQL- SQL Server instance. A prolonged maintenance period, especially with an INTERNAL_ERROR message, is indeed unusual.

I recommend checking the logs in the Google Cloud Console for any additional error messages. These logs can potentially provide more information about the issue you're facing.

Here are the steps to access the logs:

Open the Google Cloud Console in your web browser.
In the left-side menu, go to "Logging" > "Logs Explorer".
In the resource drop-down list, select "Cloud SQL Database".
Apply any additional filters as needed, and then click "Submit Filter" to view the logs.

Look for any error messages or warnings that occurred around the time the issue began. Please share your findings here.

purusottam

Thanks for your quick response. There are a few errors in the logs.

There are 2 failures (for instance update and delete action) when instance was getting into maintenance mode:

error message - Instance '<gcp_project>:<sql_instance>' is not accessible to user: MAINTENANCE

There are 5 errors (once every hour - 5 times) - during the maintenance.

error message - The instance isn’t running. Make sure that the instance is running.

Logs: for 2 failures

{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"code": 9,
"message": "Instance '<gcp_project>:<sql_instance>' is not accessible to user: MAINTENANCE "
},
"authenticationInfo": {
"principalEmail": "<user_email>",
"principalSubject": "user:<user_email>"
},
"requestMetadata": {
"callerIp": "35.198.194.90",
"requestAttributes": {
"time": "2023-06-19T10:38:10.491779Z",
"auth": {}
},
"destinationAttributes": {}
},
"serviceName": "cloudsql.googleapis.com",
"methodName": "cloudsql.instances.delete",
"authorizationInfo": [
{
"resource": "projects/<gcp_project>/instances/<sql_instance>",
"permission": "cloudsql.instances.delete",
"granted": true,
"resourceAttributes": {
"service": "sqladmin.googleapis.com",
"name": "projects/<gcp_project>/instances/<sql_instance>",
"type": "sqladmin.googleapis.com/Instance"
}
}
],
"resourceName": "projects/<gcp_project>/instances/<sql_instance>",
"request": {
"instance": "<sql_instance>",
"project": "<gcp_project>",
"@type": "type.googleapis.com/google.cloud.sql.v1beta4.SqlInstancesDeleteRequest"
},
"response": {}
},
"insertId": "-x0do1ad89vv",
"resource": {
"type": "cloudsql_database",
"labels": {
"region": "us-east1",
"project_id": "<gcp_project>",
"database_id": "<gcp_project>:<sql_instance>"
}
},
"timestamp": "2023-06-19T10:38:10.456659Z",
"severity": "ERROR",
"logName": "projects/<gcp_project>/logs/cloudaudit.googleapis.com%2Factivity",
"receiveTimestamp": "2023-06-19T10:38:11.147487806Z"
}

Logs for 5 errors:

{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"code": 13,
"message": "INTERNAL"
},
"authenticationInfo": {},
"requestMetadata": {
"requestAttributes": {},
"destinationAttributes": {}
},
"serviceName": "cloudsql.googleapis.com",
"methodName": "cloudsql.instances.automatedBackup",
"resourceName": "projects/<gcp_project>/instances/<instance>",
"metadata": {
"windowEndTime": "2023-06-20T08:00:00Z",
"windowStartTime": "2023-06-20T04:00:00Z",
"@type": "type.googleapis.com/speckle.AutomatedBackupEventLog",
"message": "The instance isn’t running. Make sure that the instance is running.",
"windowStatus": "STATUS_ATTEMPT_FAILED"
}
},
"insertId": "-4pjgybe79knq",
"resource": {
"type": "cloudsql_database",
"labels": {
"database_id": "<gcp_project>:<instance>",
"region": "US_EAST1",
"project_id": "<gcp_project>"
}
},
"timestamp": "2023-06-20T04:50:17.549830Z",
"severity": "ERROR",
"logName": "projects/<gcp_project>/logs/cloudaudit.googleapis.com%2Fsystem_event",
"receiveTimestamp": "2023-06-20T04:50:18.386248877Z"
}

Let me know if you need any additional details.

ms4446

Thank you for sharing these log details. I can see that there are two main types of errors here:

Instance '<gcp_project>:<sql_instance>' is not accessible to user: MAINTENANCE - This message typically appears when an action is attempted on an instance that is currently undergoing maintenance. As you've mentioned, it's unusual for maintenance to last this long, and the instance should typically become accessible again after the maintenance completes.
The instance isn’t running. Make sure that the instance is running. - This error message suggests that the database instance is not currently running, which may be why the maintenance operation is unable to complete.

It's also noteworthy that the automated backup process is failing due to the instance not running. This suggests there might be a problem with the underlying resources or state of the instance.

Given these errors, it seems like there may be an issue with the instance's state that's preventing it from completing maintenance and returning to a running state. This is likely something that would require intervention from Google Cloud Support, as it may be an issue on the infrastructure side that you cannot resolve yourself.

I recommend reaching out to Google Cloud Support with these log details and explaining the situation. They should be able to investigate further and help resolve the issue.

purusottam

Shouldn't the instance come back to a valid state after few hours if the maintenance event failed?

We do not have a paid Support plan and that's restricting us to reach to Cloud Support team.

ms4446

Since you mentioned that you don't have a paid Support plan, the options are limited.

The official documentation suggests that if a backup operation is stuck for many hours, you could ask Customer Support to force restart the instance

see the following link https://cloud.google.com/sql/docs/troubleshooting

While this may not be directly applicable to a maintenance event, Google Support might have tools or capabilities to intervene in long-running operations.