Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Can we use Google SQL Java JDBC connector IAM Authentication from Amazon EMR

Hi There,

     We see that Google SQL Java JDBC connector provides SSL/TLS internally during the call. Please confirm if that is true. Also I see the authentication done is using IAM Authentication. So could we use the Google SQL Java JDBC connector from Pyspark code running on an Amazon EMR because the authentication is done through IAM. Does IAM Authentication from AMazon EMR. '

Google SQL Java JDBC Connector Documentation: https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/blob/main/docs/jdbc-mysql.md

Also if possible what kind of parameters are needed to be passed from Pyspark using the Google SQL Java JDBC connector to connect to the Google MySQL?

Any help provided is appreciated!

Thanks,

Avinash

0 2 1,020
2 REPLIES 2

Yes, the Google SQL Java JDBC connector provides SSL/TLS internally during the call. It also uses IAM authentication. This means that you can use the Google SQL Java JDBC connector from Pyspark code running on an Amazon EMR cluster, as long as the cluster has the connector installed and configured.

To connect to Google SQL from Pyspark using the Google SQL Java JDBC connector, you will need to pass the following parameters:

  • url: The JDBC URL for your Google SQL database. This will be in the format jdbc:google:mysql://<PROJECT_ID>:<REGION>:<INSTANCE_NAME>.
  • driver: The class name of the Google SQL Java JDBC driver.This is com.google.cloud.sql.jdbc.GoogleDriver.
  • user: Your Google Cloud Platform (GCP) service account email address.
  • password: The private key for your GCP service account.

You can also pass additional parameters to the connector, such as:

  • sslmode: The SSL mode to use.The default value is REQUIRED.
  • socketFactory: The socket factory to use. The default value is com.google.cloud.sql.jdbc.GoogleSocketFactory.

Here is an example of how to connect to Google SQL from Pyspark using the Google SQL Java JDBC connector:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Set the JDBC connection parameters
jdbc_url = "jdbc:google:mysql://my-project:us-central1:my-instance"
jdbc_driver = "com.google.cloud.sql.jdbc.GoogleDriver"
jdbc_user = "my-service-account@my-project.iam.gserviceaccount.com"
jdbc_password = "my-service-account-private-key"

# Create a Spark DataFrame from the Google SQL database
df = spark.read.format("jdbc").option("url", jdbc_url).option("driver", jdbc_driver).option("user", jdbc_user).option("password", jdbc_password).option("dbtable", "my_table").load()

# Do something with the DataFrame
df.show()

Please note that you will need to install the Google SQL Java JDBC connector on your Amazon EMR cluster before you can use it. You can do this by following the instructions in the documentation.

Thank you, @ms4446 ! I already had SSL certificates but could make JDBC calls because of some issues. So that is why I wanted to take this route.  But I'm able to figure the out SSL cert issues and the code worked file MySQL Connector/J. Here is the link for connecting to Google MySQL using MySQL Connector/J: https://www.googlecloudcommunity.com/gc/Databases/Unable-to-connect-to-Google-MySQL-using-JDBC-Conne...