Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error connecting to jdbc with pyspark in dataproc

Hi,

 
I am trying to connect to a mssql server via jdbc with pyspark in dataproc.

I am getting an error:

 

py4j.protocol.Py4JJavaError: An error occurred while calling o79.jdbc. : java.lang.ClassNotFoundException: mssql-jdbc-12.4.0.jre11.jar

 

The main file (main.py):

 

spark = SparkSession.builder.appName('my_app').getOrCreate()
connection_string = f'jdbc:sqlserver://1.2.3.4:1433;databaseName=my_db;'
properties = { 'user':'my_user', 'password':'my_password' }
df = spark.read.jdbc(
    url=connection_string,
    table='my_table',
    properties=properties
)

 

The gcloud command:

 

gcloud dataproc batches submit pyspark \
--batch my_batch main.py  \
--jars mssql-jdbc-12.4.0.jre11.jar \
--properties driver=mssql-jdbc-12.4.0.jre11.jar

 

 

 

 

0 1 2,568
1 REPLY 1