Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Execute a python script from cloud storage in dataflow job

Hi everyone, I'm working on a POC which executes python scripts that are stored in cloud storage. I first want to test on the cloud shell and then create a dataflow job for the same.  

I tried several commands to execute as below but no luck. Can anyone assist in correcting the command?

--Cloud shell

$ python -m gs://gluemigration/scripts/Pyspark.py --region us-central1 --runner DataflowRunner --project temp-1 --usePublicIps=false --no_use_public_ips

--Dataflow job

$ gcloud dataflow jobs run run_custom_scripts --gcs-location=gs://gluemigration/scripts/Pyspark.py --disable-public-ips --max-workers=1 --region=us-central1 --runner=DataflowRunner --project=temp-1

I know I must be missing something a little somewhere. Any help would be greatly appreciate. 

Thank you.

0 1 2,644
1 REPLY 1

You can stream objects from Google Cloud Storage without saving the objects themselves to files.

This also means you can pipe the data to the Python interpreter and run this script:

 

gsutil cp gs://<BUCKET_NAME/<OBJECT_NAME>.py - | python - <ARGS>

 

However, for your second command, which error were you receiving when executing it? It appears to be a valid command for gcloud.

Can you elaborate more about the types of scripts you are trying to execute? Since for the first command, you also are using the gcloud dataflow jobs create flags.