Hi All,
I am trying to connect to a SparkSession on Vertex AI's Workbench JupyterLab, but receive this error. Locally, my JAVA_HOME system environments and path environments are already set, and can work when I run Jupyter locally. But only on Vertex AI's Workbench JupyterLab I get this error.
Code:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('Jupyter BigQuery Storage')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar') \
.getOrCreate()
Full Error:
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /tmp/ipykernel_3404/1949393828.py in <module> 9 spark = SparkSession.builder \ 10 .appName('Jupyter BigQuery Storage')\ ---> 11 .config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar') \ 12 .getOrCreate() 13 /opt/conda/lib/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self) 226 sparkConf.set(key, value) 227 # This SparkContext may be an existing one. --> 228 sc = SparkContext.getOrCreate(sparkConf) 229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared 230 # by all sessions. /opt/conda/lib/python3.7/site-packages/pyspark/context.py in getOrCreate(cls, conf) 390 with SparkContext._lock: 391 if SparkContext._active_spark_context is None: --> 392 SparkContext(conf=conf or SparkConf()) 393 return SparkContext._active_spark_context 394 /opt/conda/lib/python3.7/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls) 142 " is not allowed as it is a security risk.") 143 --> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) 145 try: 146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, /opt/conda/lib/python3.7/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf) 337 with SparkContext._lock: 338 if not SparkContext._gateway: --> 339 SparkContext._gateway = gateway or launch_gateway(conf) 340 SparkContext._jvm = SparkContext._gateway.jvm 341 /opt/conda/lib/python3.7/site-packages/pyspark/java_gateway.py in launch_gateway(conf, popen_kwargs) 106 107 if not os.path.isfile(conn_info_file): --> 108 raise RuntimeError("Java gateway process exited before sending its port number") 109 110 with open(conn_info_file, "rb") as info: RuntimeError: Java gateway process exited before sending its port number
Do let me know if you have advice or help, thank you!
You would need to have Java installed on your Mac, Linux or Windows, without Java installation & not having JAVA_HOME environment variable set with Java installation path or not having PYSPARK_SUBMIT_ARGS, you would get this Exception.
You need to Set PYSPARK_SUBMIT_ARGS with master, this resolves Exception: Java gateway process exited before sending the driver its port number.
export PYSPARK_SUBMIT_ARGS="--master local[3] pyspark-shell"
vi ~/.bashrc , add the above line and reload the bashrc file using source ~/.bashrc
In case the issue is still not resolved, check your Java installation and JAVA_HOME environment variable.
You can see this troubleshooting documentation[1].
[1] https://cloud.google.com/vertex-ai/docs/general/troubleshooting
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |