We are using Cloud SQL Proxy as a sidecar container in a kubernetes pod to access a Cloud SQL PostgreSQL instance. The proxy uses a kubernetes service account linked to a IAM service account which has the roles "Cloud SQL Instance User" and "Cloud SQL Client". Also a iam-service-policy has been added that the kubernetes service account can use the IAM account as a workloadIdentifyUser.
Usually the connection works good and we can do a JDBC connection from our spring boot application (using HikariCP) and a postgresql user account to connect. When the connection is established successfully, the auditlog shows that the connection has been done via the principal of the IAM service account, with it begin delegated by the kubernetes service account.
The HikariCP refreshes its connection at least once every 30 minutes, in which case the connection is closed and a new connection will be established. Sometimes this is not working properly: the connection is rejected with a status containing a code 7 and the message "boss::NOT_AUTHORIZED: Not authorized to access resource. Possibly missing permission cloudsql.instances.connect on resource instances/<NAME_OF_OUR_CLOUDSQL_INSTANCE>". The Cloud SQL Proxy container has the following log output: "[PROJECT:REGION:INSTANCE] failed to connect to instance: failed to get instance: Refresh error: failed to get instance metadata (connection name = \"PROJECT:REGION:INSTANCE\"): googleapi: Error 403: boss::NOT_AUTHORIZED: Not authorized to access resource. Possibly missing permission cloudsql.instances.get on resource instances/INSTANCE., forbidden"". Meanwhile our application shows error messages when trying to utilize a connection via the HikariPool.
This goes on for a random amount of time (between half an hour to maybe a couple of hours) before it suddently works again for hours or a day without issues. The behavior is unexplainable to us. I already tried creating a new IAM service account since i suspected it might have been broken somehow but the new user also leads to this unreliable behavior.
CloudSQLProxy container is version 2.1.0. We are using Google Cloud SQL Postgres 15.2 in private ip mode.
Thanks in advance for all your input!
The error message "boss::NOT_AUTHORIZED: Not authorized to access resource. Possibly missing permission cloudsql.instances.connect on resource instances/<NAME_OF_OUR_CLOUDSQL_INSTANCE>" indicates that the Cloud SQL Proxy is not authorized to connect to your Cloud SQL instance. This can happen for a few reasons:
cloudsql.instances.connect
permission, such as 'Cloud SQL Client'
or 'Cloud SQL Admin'
.Additionally, consider the following:
Look at the logs for both the Cloud SQL Proxy container and the Cloud SQL instance. Logs often provide valuable insights into connection failures and can be instrumental in troubleshooting.
The protoPayload in the failed connect requests seems to indicate that access to Cloud SQL was attempted without delegation to the IAM account:
authenticationInfo: {
principalSubject: "serviceAccount:PROJECT.svc.id.goog[NAMESPACE/SERVICEACCOUNT]"
serviceAccountDelegationInfo: [
0: {
}
]
}
A successful login looks like this:
authenticationInfo: {
principalEmail: "IAM-SERVICE-ACCOUNT@PROJECT.iam.gserviceaccount.com"
principalSubject: "serviceAccount:IAM-SERVICE-ACCOUNT@PROJECT.iam.gserviceaccount.com"
serviceAccountDelegationInfo: [
0: {
principalSubject: "serviceAccount:PROJECT.svc.id.goog[NAMESPACE/SERVICEACCOUNT]"
}
]
}
But this happens just for a short period of time (like ~ 2 hours) and at no time pattern that is recognizable.
Also we have two containers with separate applications running a sidecar container each, and while one container is having that issue, the other is working fine at the same time (might just be not disconnected and living on an "old" session).
We have updated to the latest CloudSQL container (2.6.1), double checked roles and bindings. The application is working most of the time, just sometimes it stops working and we are absolutely unsure how to proceed in that matter.
We will consult with a Google partner company later this day, but we discuss testing alternatives such as not using Cloud SQL Proxy at all. But we prefer to do it "the intended way", it just needs to work reliably.
Thank you for the additional information. It is very helpful to know that the protoPayload in the failed connect requests indicates that access to Cloud SQL was attempted without delegation to the IAM account. This suggests that the Cloud SQL Proxy is not correctly authenticating with Cloud SQL using the IAM service account.
There are a few possible reasons for this:
Here are some things you can check:
--service_account
flag in the Cloud SQL Proxy configuration.cloudsql.instances.connect
permission on the Cloud SQL instance. You can check this in the IAM & Admin console.If you are still having problems, you can contact Google Cloud support for assistance.
Here are some additional suggestions:
Hi All - We are dodging an exactly similar problem. We have a Java Spring Boot 3 App, hosted on GKE, enabled with a bespoke K8s service account that is wrapped into a Google service account. It tries to connect to a Cloud SQL PostgreSQL instance in a different project that has private ip mode only using sql auth proxy sidecar through workload identity. The GKE cluster and namespace is enabled with Istio. Network rules are all good, IAM roles have been applied however the connection fails with the same error stated here.