Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Updating a Google dataflow job in google cloud console not working

 

hi all,
I am trying to update an running data pipeline job, as per the below doc , I see options to update in java/python/go/gcloud cli.

https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#Launching

though google cloud console has  the below option to update, when I set it to true and submit the job, I get the error, job with same name already exists, 

Update Running Job
true
Set this to update currently running streaming job. Updates to batch jobs are not supported.

I would like to know whether update on a running job is not supported in google console options? Please clarify.

0 4 2,681
4 REPLIES 4

Updating Batch Pipelines are not supported. However streaming pipelines are supported. Please clarify is you are using Batch or Streaming.

I am using streaming job,
While creating a new job with update option, there is an option to extract gcloud cli command for that equivalent create option, I tried that as well in the gcloud cli and its not working as well.

Please see below for examples

Gcloud cli extract from console for creating a job and this is not working and below error occured

gcloud dataflow flex-template run test-pipeline \
--template-file-gcs-location gs://[BUCKET]/templates/gcloud-to-gcloud.json \
--region [REGION] \
--num-workers 2 \
--temp-location gs://[BUCKET]/tmp \
--subnetwork [SUBNET HTTP URL] \
--parameters inputTopic=[INPUT TOPIC],\
             windowDuration=5m,\
			 outputDirectory=gs://[BUCKET],\
			 outputFilenamePrefix=test,\
			 numShards=1,\
			 outputShardTemplate=W-P-SS-of-NN,\
			 stagingLocation=gs://[BUCKET]/staging,\
			 autoscalingAlgorithm=NONE,\
			 serviceAccount=[SERVICE ACCOUT],\
			 update=true
ERROR: (gcloud.dataflow.flex-template.run) ALREADY_EXISTS: (xxxx): There is already an active job named test-pipeline. If you want to  submit a second job, try again by setting a different name.

I looked into the gcloud sdk documentation and tried below and it worked

gcloud dataflow flex-template run test-pipeline \
--template-file-gcs-location gs://[BUCKET]/templates/gcloud-to-gcloud.json \
--region [REGION] \
--num-workers 2 \
--temp-location gs://[BUCKET]/tmp \
--subnetwork [SUBNET HTTP URL] \
--parameters inputTopic=[INPUT TOPIC],\
             windowDuration=5m,\
			 outputDirectory=gs://[BUCKET],\
			 outputFilenamePrefix=test,\
			 numShards=1,\
			 outputShardTemplate=W-P-SS-of-NN,\
			 stagingLocation=gs://[BUCKET]/staging,\
			 autoscalingAlgorithm=NONE,\
			 serviceAccount=[SERVICE ACCOUT],\
--update \
--disable-public-ips \
--max-workers 1 

It's great that you managed to get your job to update successfully using the gcloud SDK!

As per Google's documentation, only one job with a given name can exist in a project at a given time, and attempting to create a job with the same name as an already-existing job will return the existing job.

It's important to note that the --update flag is used to indicate that the command should update an existing job rather than create a new one.

 

yes, I was able to use update option from gcloud SDK, 

but, I was not able to do the same from google console ui in a web browser. When i pass true to update option in ui, I am getting the error job name already exists.

it would be really helpful if you have any option to update the pipeline from UI.

Please note that as per Google's documentation, the job name should be the same if we have to update a dataflow pipeline. please refer below for the link.

https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline

""Set the JOB_NAME to the same name as the job that you want to update.""