Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

files are not availalbe in classpath for dataproc serverless spark job submission

when submitted using REST API, the jars seems to be available in classpath and it calls the main class but it fails to find the config file which is passed via fileUris as suggested using google docs.

tries passing these files via archieve files and also in jarfiles but none worked.

{
"name": "projects/****/locations/us-central1/batches/batch-4",
    "sparkBatch": {
        "fileUris": [
            "gs://****/conf/application.conf",
            "gs://****conf/log4j2.properties"
        ],
        "jarFileUris": [        		   "gs://****/lib/*****ons-1.23.jar",
            "gs://***/lib/***tions-4.1.1.4.jar"
        ]
    }
}

 

0 3 715
3 REPLIES 3

The problem appears to be related to the use of the fileUris field for passing configuration files into the classpath. It seems that this method is not working as expected in your case, leading to issues with the accessibility of the configuration files. 

Here's a potential approach to address the issue:

  1. Package Configuration Files with JAR: You can package the configuration files (application.conf, log4j2.properties) within one of the JAR files that contain your main class. This ensures that the configurations are always available wherever the JAR file is executed. You can then access them using the class loader within your application.

  2. Use Custom Container Image: Since Dataproc Serverless for Spark allows the use of custom container images, you can create a custom image that includes both the JAR files and the configuration files. You can then specify the paths to these files within your application.

  3. Explicitly Load Configuration Files: If packaging the configuration files within the JAR or custom container image is not suitable, you may want to modify your application to explicitly load the configuration files from the specified URIs. You can pass the URIs of the configuration files as arguments to your main class and then use appropriate methods to read them.

Here's an example of how you might modify the sparkBatch configuration to include the configuration files as JAR files:

{
"name": "projects/****/locations/us-central1/batches/batch-4",
"sparkBatch": {
"jarFileUris": [
"gs://****/conf/application.jar",
"gs://****/conf/log4j2.jar",
"gs://****/lib/*****ons-1.23.jar",
"gs://***/lib/***tions-4.1.1.4.jar"
]
}
}

Thanks for providing these alternatives.

I have tried adding those config files in all Uris including jarFileUris( with original .conf extension though)  and it just does not make it available in classpath.

adding config file to code image reduces the flexibility. Btw, fileURI or jarURIs works fine on dataproc job submission. so I am hopping this is intended functionality but not working as documented for serverless job submit? .

Hi @JSeeker , I am having the same issue now and I just wonder if you ever found a solution to it.