My company recently upgraded to a MySQL8 cloud SQL database. We wanted to make it compatible with 5.7 as well, so we are using utf8mb4_unicode_ci as the collation for the db. We set our collation_connection to match as well. We also have used the skip client handshake flag as well.
When I connect through the CLI and query the collation, it reflects utf8mb4_unicode_ci, as it takes the server default collation. When we connect through the proxy, it also takes MySQL8 default (utf8mb4_0900_ai_ci), and I am unsure why it doesn't take the default since we ignore the client handshake.
The Cloud SQL Proxy does not manage MySQL configuration settings, including collation settings. The collation_connection parameter is a system variable within MySQL, and it cannot be set within the Cloud SQL Proxy configuration because the proxy does not have a configuration file for MySQL settings.
In Google Cloud SQL, you do not have direct access to the MySQL configuration files. The settings are managed by Google Cloud, and you can only change certain settings via the Google Cloud Console or the gcloud command-line tool.
To ensure that your connections through the Cloud SQL Proxy use the utf8mb4_unicode_ci collation, you will need to set this collation for your database within the Google Cloud Console or by using the gcloud command-line tool.
Here's how you can set the character set and collation for a database in Google Cloud SQL:
Using the Google Cloud Console:
Using the gcloud Command-Line Tool:
To create a new database with a specific character set and collation:
gcloud sql databases create [DATABASE_NAME] --instance=[INSTANCE_NAME] --charset=utf8mb4 --collation=utf8mb4_unicode_ci
To modify an existing database:
gcloud sql databases patch [DATABASE_NAME] --instance=[INSTANCE_NAME] --charset=utf8mb4 --collation=utf8mb4_unicode_ci
After setting the character set and collation for the database, ensure that your application's database connection settings are also correctly configured to use utf8mb4 and utf8mb4_unicode_ci.
If you are connecting to the database using a client library or a database management tool, you can execute the SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
command after establishing the connection to ensure that the session uses the correct character set and collation.
So the database is currently set to utf8mb4 and utf8mb4_unicode_ci for collation. If you connect through the Google Cloud Console, the collation_connection is set to utf8mb4_unicode_ci; however, if you connect to the CloudSQL instance through the proxy, it still shows the utf8mb4_0900_ai_ci.
If the client handshake specifies a different character set or collation than the database instance is configured to use, the database instance will use the client's preferred character set and collation. This is why you are seeing the utf8mb4_0900_ai_ci collation when you connect through the proxy, even though the database is configured to use the utf8mb4_unicode_ci collation.
To ensure that all connections to the database use the utf8mb4_unicode_ci collation, you can configure your database client to explicitly set the collation for each connection.
After connecting to the database through the Cloud SQL Proxy, execute the following SQL command:
SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
It sets the character set and collation for the current connection to utf8mb4 and utf8mb4_unicode_ci, respectively.
By ensuring that the collation is explicitly set for each connection, you can avoid issues related to the default collation used by the Cloud SQL Proxy and ensure that all connections to the database use the utf8mb4_unicode_ci collation.