Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Serverless Dataproc Batch Request Always Returns com.mysql.cj.jdbc.Driver ClassNotFoundException

Hello!

I am a new user of GCP Dataproc API and I have trying to test serverless batch processing to solve a business problem for moving data from a MySQL database via the Dataproc JDBCTOGCS template.

I have no issues creating and authenticating the requests, but every config I try returns this exception:

 

 

ClassNotFoundException: com.mysql.cj.jdbc.Driver

 

 

I have provided a snippet below, with the sensitive details obfuscated to xxx.

This code is not intended to be used in production; I'm just trying to get my head around how to make the requests at all.

From reading around the error, I think this is something to do how I'm pointing to the MySQL connector .jar file (line 42), but nothing I have tried can get me past this error. As far as I can tell, I'm doing everything the quick-start guide is asking of me.

I'm clearly doing something wrong, so any pointers would be greatly appreciated.

 

 

import requests as rq
import os
import google.auth
import google.auth.transport.requests

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'xxx.json'

credentials, project_id = google.auth.default(
        scopes=['https://www.googleapis.com/auth/cloud-platform','https://www.googleapis.com/auth/cloud-platform.read-only'])
request = google.auth.transport.requests.Request()
credentials.refresh(request)
token = credentials.token

url = 'https://dataproc.googleapis.com/v1/projects/xxx/locations/europe-west2/batches'
headers = {'Authorization': f'Bearer {token}'}
body = {
  "environmentConfig": {
    "executionConfig": {
      "subnetworkUri": "projects/xxx/regions/europe-west2/subnetworks/xxx",
      'serviceAccount': "xxx@appspot.gserviceaccount.com"
    }
  },
  "runtimeConfig": {
    "version": "1.1"
    },
  "sparkBatch": {
    "mainClass": "com.google.cloud.dataproc.templates.main.DataProcTemplate",
    "args": [
      "--template=JDBCTOGCS",
      "--templateProperty","log.level=DEBUG",
      "--templateProperty","project.id=xxx",
      "--templateProperty","jdbctogcs.jdbc.url=jdbc:mysql://xxx.xxxx.xxxx.xxx:3306/xxx?user=xxx&password=xxx",
      "--templateProperty","jdbctogcs.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver",
      "--templateProperty","jdbctogcs.output.location=gs://bbt-test-bucket/",
      "--templateProperty","jdbctogcs.write.mode=overwrite",
      "--templateProperty","jdbctogcs.output.format=json",
      "--templateProperty","jdbctogcs.jdbc.fetchsize=10",
      "--templateProperty","jdbctogcs.sql=xxx"
    ],
    "jarFileUris": [
      "gs://dataproc-templates-binaries/latest/java/dataproc-templates.jar", 
      "gs://shareable_files/drivers/mysql/mysql-connector-java-5.1.30-bin.jar"
    ]
  }
}

res = rq.post(url=url,json=body,headers=headers)
res.json()

 

 

 

0 2 826
2 REPLIES 2

Hi @Tomalbon,

You need to make sure that the JDBC jar file is downloaded and hosted inside the GCS bucket. You may want to check this link for reference.

Here is a helpful article to guide you and that you may check the prerequisites in importing data from Databases into GCS(Via JDBC) using Dataproc Serverless.

Hope this helps.

 

 

 

Thanks, @anjelisa 

Sorry for the long delay. I had to put this project down for a while.

Unfortunately, I'm still having no luck. I have hosted the .jar via in the GCS bucket in the same region as the job is running and used the quickstart guide to submit a job over the API, and I still get the error: 

ClassNotFoundException: com.mysql.cj.jdbc.Driver

The job will create and submit to Dataptroc Batches and attempt to run, but the class error fails the job each time.