How to connect cloud spanner via jdbc driver in py...

umeshchandra · 03-09-2023 07:21 PM

Hi all,

I am trying to connect my spanner, it's done but when I reading/writing to table then getting error like (String literal issue.

For I got the issue is "columns_name" vs `columns_name`

. The problem is how I can create custom jdbc dialect in pyspark.

Connection is done my spanner is connected. Reading and writing is issue.

Is issue is "" vs ``.

Can you help me to solve the issue.

2.I am using cloud spanner product,

I have created my spanner instance, MySQL/postgress SQL database and table.

I have connection of cloud spanner instance database through cloud spanner JDBC driver in pyspark.

But problem is that when I am trying to read/write operation then getting error "String literal issue".

Like if table column name is id then error getting: String literal issue \"id"/ .

As I research on that and getting that cloud spanner database table column name look like `id` .

But in pyspark dataframe table column name is "id".

Maybe `` backticks vs double coat "" creating problem.

Can you provide best way how I can apply read/write operation through JDBC driver in pyspark

robertcarlos

Hi @umeshchandra,

Welcome to Google Cloud Community!

I am trying to connect my spanner, it's done but when I reading/writing to table then getting error like (String literal issue.

I'm a bit confused here as you mentioned in your title summary that you wanted to connect cloud spanner via jdbc driver in pyspark. Did you manage to connect it? You may want to check this documentation on connecting JDBC to a PostgreSQL-dialect database.

But problem is that when I am trying to read/write operation then getting error "String literal issue".

Backtick is used by quoted identifiers. String literals use either single (') or double (") quotation marks, or triple-quoted with groups of three single (''') or three double (""") quotation marks. Please check this documentation on Lexical structure and syntax in GoogleSQL.

umeshchandra

from pyspark.sql import SparkSession
from google.cloud import spanner
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
import os

OPERATION_TIMEOUT_SECONDS = 240
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./credentials_dev.json"
credentials = "./credentials.json"
spark = SparkSession.builder.appName(
"Submit local CSV to Spanner").getOrCreate()

spark.sparkContext.addFile("google-cloud-spanner-jdbc-2.8.0.jar")

jdbc_url = "jdbc:cloudspanner:/projects/dev-data/" + \
"instances/spanner-test/databases/testdb?" + \
"credentials=credentials_dev.json;autocommit=false"

driverClass = "com.google.cloud.spanner.jdbc.JdbcDriver"
connection_properties = {
'driver': 'com.google.cloud.spanner.jdbc.JdbcDriver'
}

# ---------read operation-----------------------------------------------
# df = spark.read.jdbc(url=jdbc_url,table='stories', properties=connection_properties)
# print(df)
# df.show()

schema = StructType([
StructField("author", StringType(), True),
StructField("s_by", StringType(), True),
StructField("dead", StringType(), True),
])

csv = 'smalldata.csv'

#reading local csv file--------

df1 = spark.read.schema(schema).csv("smalldata.csv",inferSchema=True,header=True)
# df1.show()

#write operation------------

df1.write.jdbc(url=jdbc_url, table="stories", mode="append",properties=connection_properties).save()

Hi @robertcarlos ,
I have shared my code which I'm doing for that task .
Could you guide me what is missing and why I am getting String littral issue?

Pyspark version =3.3.2

Cloudspannnerjdbc version=2.8.0

robertcarlos

Like I mentioned in my previous reply, you need to check your CSV file for Lexical structure and syntax in GoogleSQL as instead of being a string literal it is read as a quoted identifier.

How to connect cloud spanner via jdbc driver in pyspark