Hi,
I have a Dataproc cluster and using the web interface (Jupyterlab) inside the cluster, I am trying to read a table from SQL server hosted using Cloud SQL and loading it into a pyspark dataframe.
When I am trying to run the query to write the table into SQL server, it is giving a 'Data Source Not Found' error.
Below is the code I am trying to run:
server_name = "jdbc:sqlserver://<servername>"
database_name = "name"
url = server_name + ";" + "databaseName=" + database_name + ";"
table_name = "table"
username = "testserver"
password = "password" # Please specify password here
DF = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password) \
.option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver')\
.load()
Below is the error i am getting:
Py4JJavaError: An error occurred while calling o125.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark.
Caused by: java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.spark.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
Can you guide me on what might be going wrong here?
PS: I am not submitting a job in dataproc cluster, I am trying to read a table in SQL server in a jupyter notebook inside the cluster.
Thanks