I have a dataproc cluster with livy installed and it is working.
what is the command to run a 'hello world' script via livy?
Solved! Go to Solution.
Here's how to run a 'hello world' script via Livy on a Google Cloud Dataproc cluster. Here is step-by-step instructions and examples for both Python and Scala.
Prerequisites
Step 1: Create Your 'Hello World' Script
For Python (hello_world.py):
print("Hello, world!")
For Scala (HelloWorld.scala):
object HelloWorld {
def main(args: Array[String]): Unit = {
println("Hello, world!")
}
}
Important (Scala): You'll need to compile your Scala script into a JAR file using a build tool like SBT or Maven. For example, the sbt assembly command can be used to create the JAR.
Step 2: Submit Your Script to Livy
For Python scripts, submit directly as a batch job:
curl -X POST -d '{ "file": "gs://your-bucket/hello_world.py" }' -H "Content-Type: application/json" http://<cluster-name-m>:8998/batches
For Scala (or Java) JARs, include the className:
curl -X POST -d '{ "file": "gs://your-bucket/hello_world.jar", "className": "HelloWorld" }' -H "Content-Type: application/json" http://<cluster-name-m>:8998/batches
gs://your-bucket/hello_world.py
or gs://your-bucket/hello_world.jar
with the actual Google Cloud Storage paths to your Python script or Scala JAR.<cluster-name-m>
with the hostname of your Dataproc cluster's master node.Step 3: Monitor the Job Status
curl http://<cluster-name-m>:8998/batches/<batch-id>
Step 4: Retrieve the Output
Once the job completes, the output can typically be found in the Spark job logs. For batch jobs submitted through Livy, check the Dataproc cluster's YARN or Spark UI for the detailed logs and output.
Alternative Method: Using gcloud
As an alternative to Livy, you can submit Spark jobs directly to Dataproc using the gcloud command-line tool. For more information, consult the Google Cloud documentation: https://cloud.google.com/sdk/gcloud/reference/dataproc/jobs/submit
Important Considerations
--py-files
for Python or --jars
for Scala/Java when submitting the job.Here's how to run a 'hello world' script via Livy on a Google Cloud Dataproc cluster. Here is step-by-step instructions and examples for both Python and Scala.
Prerequisites
Step 1: Create Your 'Hello World' Script
For Python (hello_world.py):
print("Hello, world!")
For Scala (HelloWorld.scala):
object HelloWorld {
def main(args: Array[String]): Unit = {
println("Hello, world!")
}
}
Important (Scala): You'll need to compile your Scala script into a JAR file using a build tool like SBT or Maven. For example, the sbt assembly command can be used to create the JAR.
Step 2: Submit Your Script to Livy
For Python scripts, submit directly as a batch job:
curl -X POST -d '{ "file": "gs://your-bucket/hello_world.py" }' -H "Content-Type: application/json" http://<cluster-name-m>:8998/batches
For Scala (or Java) JARs, include the className:
curl -X POST -d '{ "file": "gs://your-bucket/hello_world.jar", "className": "HelloWorld" }' -H "Content-Type: application/json" http://<cluster-name-m>:8998/batches
gs://your-bucket/hello_world.py
or gs://your-bucket/hello_world.jar
with the actual Google Cloud Storage paths to your Python script or Scala JAR.<cluster-name-m>
with the hostname of your Dataproc cluster's master node.Step 3: Monitor the Job Status
curl http://<cluster-name-m>:8998/batches/<batch-id>
Step 4: Retrieve the Output
Once the job completes, the output can typically be found in the Spark job logs. For batch jobs submitted through Livy, check the Dataproc cluster's YARN or Spark UI for the detailed logs and output.
Alternative Method: Using gcloud
As an alternative to Livy, you can submit Spark jobs directly to Dataproc using the gcloud command-line tool. For more information, consult the Google Cloud documentation: https://cloud.google.com/sdk/gcloud/reference/dataproc/jobs/submit
Important Considerations
--py-files
for Python or --jars
for Scala/Java when submitting the job.User | Count |
---|---|
5 | |
1 | |
1 | |
1 | |
1 |