I'm currently using a GCP workflow for batch job processing, set to run every 30 minutes. However, I'm facing an issue where multiple jobs are sometimes created while old ones are still executing. How can I disable concurrency to ensure that only one job runs at a time? I want to avoid situations where new jobs are created while a job is still running. Any guidance on how to achieve this within the GCP workflow setup would be greatly appreciated.
- createAndRunBatchJob:
call: http.post
args:
url: ##{batchApiUrl}
query:
job_id: ##{jobId}
headers:
Content-Type: application/json
auth:
type: OAuth2
body:
taskGroups:
taskSpec:
runnables:
- container:
imageUri: ##{imageUri}
commands:
- "--script-location"
- "/mnt/disks/batch-scripts-${project_sha}/google_tv_aep/batch/code"
environment:
variables:
job_id: ##{jobId}
secret: projects/${project}/secrets/${secret_nm}/versions/latest
local_path_1: /mnt/disks/${local_path_1}
local_path_2: /mnt/disks/${local_path_2}
query_file: "tv_mkt_campaign_events_load_query.sql"
output_dataset_name: "tv_mkt_campaign_events_incremental_data.json"
project: "${project}"
aep_sandbox: "${aep_sandbox}"
aep_dataset_name: "TV mkt campaign Events"
load_mode: "incremental"
aep_connection_id: "${aep_connection_id}"
checkpoint_file: "mkt_campaign_checkpoint.param"
aep_flow_id: "${aep_flow_id}"
checkpoint_field: timestamp
volumes:
- mountPath: /mnt/disks/batch-scripts-${project_sha}
gcs:
remotePath: batch-scripts-${project_sha}
- mountPath: /mnt/disks/${bucket_nm}
gcs:
remotePath: ${bucket_nm}
computeResource:
cpuMilli: 2000
memoryMib: 16384
taskCount: 1
parallelism: 2
allocationPolicy:
network:
networkInterfaces:
network: projects/${project}/global/networks/spark-network
subnetwork: projects/${project}/regions/${location}/subnetworks/spark-subnet-pr
noExternalIpAddress: true
serviceAccount:
email: ${builder}
logsPolicy:
destination: CLOUD_LOGGING
result: createAndRunBatchJobResponse
Solved! Go to Solution.
Hi @sugesh,
Below is one example of submitting a new Job when all listed existing Jobs are completed.
Ref:
- https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#State
- https://cloud.google.com/workflows/docs/reference/syntax/conditions#yaml.
Hope this helps,
Wenyan
Hi @sugesh,
If you want tasks for a job to be executed in sequential order, you can use the `IN_ORDER` field in API's Scheduling Policy:
@wenyhu Thanks for the reply. While I wait for the feature support on Batch API. I will take your suggestion on setting a conditional check in the workflow; would you have any sample that I can refer to?
Hi @sugesh,
Below is one example of submitting a new Job when all listed existing Jobs are completed.
Ref:
- https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#State
- https://cloud.google.com/workflows/docs/reference/syntax/conditions#yaml.
Hope this helps,
Wenyan
@wenyhu Thanks much!
User | Count |
---|---|
7 | |
2 | |
1 | |
1 | |
1 |