Hi,
I am currently converting some Alteryx workflows into pyspark and wanted to test the codes for debugging and results validation.
For the testing purpose, I've created a jupyter notebook instance in workbench, but the thing is I am not able to install pyspark in the vertex AI jupyter notebook.
I did a lot of searching as well but haven't got a good resource yet.
It will be really helpful if you could guide me.
Thanks
This looks like a walk through of how to run a PySpark job against Dataproc in a Jupyter notebook. If this doesn't match what you are looking for, can you post the exact steps / recipes you have followed so far and where you are blocked?
You can also use the Jupyter notebooks under Vertex Workbench Instances and enable the Dataproc plug-in (it's enabled by default). They are a little bit more feature-rich than the notebooks that you can find under Dataproc. Instructions here