Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Elevated Number of ListWorkflowInvocation API Calls

JVL
Bronze 3
Bronze 3

Our cloud environment uses dataform for ETL/ELT purposes. I have a cloud function that accepts some arguments and orchestrates an invocation of a dataform workflow, polls the invocation for completion, and returns a success/fail message to the caller of the function. Part of the orchestration is to avoid concurrent runs of the same workflow:

 

def is_run_in_progress(parent_name):
    list_invocation_request = dataform_v1beta1.ListWorkflowInvocationsRequest(parent=parent_name)
    result_pager = dataform_client.list_workflow_invocations(request=list_invocation_request)

    for response in result_pager:
        if response.state.name == 'RUNNING':
            logging.warning(f"Instance of {parent_name} already running, waiting to avoid concurrent run...")
            return True
    logging.info(f"No instances of {parent_name} running, proceeding.")
    return False

 

That is the only piece of code that calls client.list_workflow_invocations(). Looking at the logs for the cloud function, over the last hour, the is_run_in_progress function has executed ~18 times. Looking at the Dataform API/Service details screen, during that time we've had 1,541 ListWorkflowInvocations requests(!!). We noticed this because we began to see failures due to hitting the 6,000 total API calls per project per region quota. 

Are there any known issues regarding this? 

Edit - I've verified that all of the traffic to this API endpoint is coming from the service account that runs this cloud function. It runs nothing else.

Solved Solved
0 3 242
1 ACCEPTED SOLUTION

Thanks for the response!

The code above already does exit with True when if finds a single running instance, but your answer I think has led me to the heart of the issue - This repo has had 10s of thousands of invocations, so the result set from the ListWorkflowInvocations request is massive. The default page size seems to be 500. Even upping this to 1,000, which seems to be the maximum page size, results in way too many pages.

I'm in the process of deleting old invocations. I believe that if I delete any invocations older than say, 5 days, I should be able to keep this pattern working as intended. 

View solution in original post

3 REPLIES 3

Hi @JVL,

Welcome to Google Cloud Community!

The issue might be lying within the is_run_in_progress function, specifically with the loop iterating result_pager. The result_pager returned by dataform_client.list_workflow_invocations() doesn't contain just a single page of results, but rather it's a paginator object.

This means that every time you iterate over result_pager with the for loop, it likely makes another API call to fetch the next page of workflow invocations, even if the first page already contained a running invocation. This leads to a significant increase in API calls, especially if you have many workflow invocations or the page size for the API response is small.

Here's the workaround that you can do:

Check only the first page: You only need to check if there's any running invocation, not all of them. Modify your loop to break immediately after finding a running invocation or simply check the first page of results. You'll drastically reduce the number of ListWorkflowInvocations requests, preventing unnecessary API calls and staying within your quota limits.

I hope the above information is helpful.

Thanks for the response!

The code above already does exit with True when if finds a single running instance, but your answer I think has led me to the heart of the issue - This repo has had 10s of thousands of invocations, so the result set from the ListWorkflowInvocations request is massive. The default page size seems to be 500. Even upping this to 1,000, which seems to be the maximum page size, results in way too many pages.

I'm in the process of deleting old invocations. I believe that if I delete any invocations older than say, 5 days, I should be able to keep this pattern working as intended. 

The deletion of old invocations seems to have done the trick!