hi all,
I am trying to update an running data pipeline job, as per the below doc , I see options to update in java/python/go/gcloud cli.
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#Launching
though google cloud console has the below option to update, when I set it to true and submit the job, I get the error, job with same name already exists,
Update Running Job
true
Set this to update currently running streaming job. Updates to batch jobs are not supported.
I would like to know whether update on a running job is not supported in google console options? Please clarify.
Updating Batch Pipelines are not supported. However streaming pipelines are supported. Please clarify is you are using Batch or Streaming.
I am using streaming job,
While creating a new job with update option, there is an option to extract gcloud cli command for that equivalent create option, I tried that as well in the gcloud cli and its not working as well.
Please see below for examples
Gcloud cli extract from console for creating a job and this is not working and below error occured
gcloud dataflow flex-template run test-pipeline \
--template-file-gcs-location gs://[BUCKET]/templates/gcloud-to-gcloud.json \
--region [REGION] \
--num-workers 2 \
--temp-location gs://[BUCKET]/tmp \
--subnetwork [SUBNET HTTP URL] \
--parameters inputTopic=[INPUT TOPIC],\
windowDuration=5m,\
outputDirectory=gs://[BUCKET],\
outputFilenamePrefix=test,\
numShards=1,\
outputShardTemplate=W-P-SS-of-NN,\
stagingLocation=gs://[BUCKET]/staging,\
autoscalingAlgorithm=NONE,\
serviceAccount=[SERVICE ACCOUT],\
update=true
ERROR: (gcloud.dataflow.flex-template.run) ALREADY_EXISTS: (xxxx): There is already an active job named test-pipeline. If you want to submit a second job, try again by setting a different name.
I looked into the gcloud sdk documentation and tried below and it worked
gcloud dataflow flex-template run test-pipeline \
--template-file-gcs-location gs://[BUCKET]/templates/gcloud-to-gcloud.json \
--region [REGION] \
--num-workers 2 \
--temp-location gs://[BUCKET]/tmp \
--subnetwork [SUBNET HTTP URL] \
--parameters inputTopic=[INPUT TOPIC],\
windowDuration=5m,\
outputDirectory=gs://[BUCKET],\
outputFilenamePrefix=test,\
numShards=1,\
outputShardTemplate=W-P-SS-of-NN,\
stagingLocation=gs://[BUCKET]/staging,\
autoscalingAlgorithm=NONE,\
serviceAccount=[SERVICE ACCOUT],\
--update \
--disable-public-ips \
--max-workers 1
It's great that you managed to get your job to update successfully using the gcloud SDK!
As per Google's documentation, only one job with a given name can exist in a project at a given time, and attempting to create a job with the same name as an already-existing job will return the existing job.
It's important to note that the --update
flag is used to indicate that the command should update an existing job rather than create a new one.
yes, I was able to use update option from gcloud SDK,
but, I was not able to do the same from google console ui in a web browser. When i pass true to update option in ui, I am getting the error job name already exists.
it would be really helpful if you have any option to update the pipeline from UI.
Please note that as per Google's documentation, the job name should be the same if we have to update a dataflow pipeline. please refer below for the link.
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline
""Set the JOB_NAME to the same name as the job that you want to update.""