Solved: Google pubsub push subscription ack

akshay7534 · 01-11-2024 09:12 AM

I have a cloud run endpoint that gets triggered by a push type pubsub subscription and it recives the user specified message thorugh publsiher,I am trying to get the message acknowledged by the acknowldgement time but the cloud run endpoint is getting triggered multiple times ,not sure if my message is getting acknowldged or not.I want to check what is the process to make sure the message gets acknolwedged with status 200ok response to pubsub in python so that there will not be any retires after one successful attempt.

ms4446

Here is how to acknowledge a message in a Cloud Run endpoint triggered by a push-type Pub/Sub subscription and prevent retries:

Acknowledgment in Push Subscriptions:

For push subscriptions, where Pub/Sub sends messages to your Cloud Run service endpoint, acknowledgment occurs implicitly through an HTTP response. There's no need for the .ack() or .nack() methods used in pull subscriptions.

Acknowledgement with 200 OK:

To acknowledge a message successfully and prevent retries, your Cloud Run service needs to respond with an HTTP 200 OK status code. This tells Pub/Sub that the message was processed successfully and shouldn't be redelivered.

Example Code:

The following Python code demonstrates how to handle message processing and acknowledgment in a Cloud Run service:

from flask import Flask, request

app = Flask(__name__)

@app.route('/', methods=['POST'])
def index():
    # Extract Pub/Sub message from request
    envelope = request.get_json()
    message = envelope['message']

    try:
        # Process message
        # ...

        # Acknowledge message with 200 OK
        return '', 200
    except Exception as e:
        # Log exception
        # ...

        # Message not acknowledged, will be retried
        return '', 500

if __name__ == '__main__':
    app.run(port=8080, debug=True)

Additional Considerations:

Error Handling: Implement robust error handling and logging to capture any issues during message processing.
Idempotency: Ensure your message processing logic is idempotent, meaning it can be safely executed multiple times without unintended consequences. This is important as Pub/Sub might redeliver messages even after successful acknowledgment in certain scenarios.
Response Time: Be mindful of your service's response time. If it takes too long to respond, exceeding the acknowledgment deadline configured in the subscription, Pub/Sub might consider the message unreceived and send it again.

View solution in original post

ms4446

Here is how to acknowledge a message in a Cloud Run endpoint triggered by a push-type Pub/Sub subscription and prevent retries:

Acknowledgment in Push Subscriptions:

For push subscriptions, where Pub/Sub sends messages to your Cloud Run service endpoint, acknowledgment occurs implicitly through an HTTP response. There's no need for the .ack() or .nack() methods used in pull subscriptions.

Acknowledgement with 200 OK:

To acknowledge a message successfully and prevent retries, your Cloud Run service needs to respond with an HTTP 200 OK status code. This tells Pub/Sub that the message was processed successfully and shouldn't be redelivered.

Example Code:

The following Python code demonstrates how to handle message processing and acknowledgment in a Cloud Run service:

from flask import Flask, request

app = Flask(__name__)

@app.route('/', methods=['POST'])
def index():
    # Extract Pub/Sub message from request
    envelope = request.get_json()
    message = envelope['message']

    try:
        # Process message
        # ...

        # Acknowledge message with 200 OK
        return '', 200
    except Exception as e:
        # Log exception
        # ...

        # Message not acknowledged, will be retried
        return '', 500

if __name__ == '__main__':
    app.run(port=8080, debug=True)

Additional Considerations:

Error Handling: Implement robust error handling and logging to capture any issues during message processing.
Idempotency: Ensure your message processing logic is idempotent, meaning it can be safely executed multiple times without unintended consequences. This is important as Pub/Sub might redeliver messages even after successful acknowledgment in certain scenarios.
Response Time: Be mindful of your service's response time. If it takes too long to respond, exceeding the acknowledgment deadline configured in the subscription, Pub/Sub might consider the message unreceived and send it again.

a_train

I use this code but my messages are still not ACKed.

I suspect that there's a time limit for the operation done by the server for the session validity. I updated my pubsub ACK deadline and it is still NOT WORKING.

ms4446

If your Google Cloud Pub/Sub messages aren't being acknowledged even though your Cloud Run endpoint returns a 200 OK status, there could be a few reasons:

Processing Delays: Your service might take too long to process messages, exceeding the Pub/Sub acknowledgment deadline. Ensure your service responds quickly, even if it just returns a 200 OK initially.
Concurrency Issues: If your Cloud Run service scales to multiple instances, the same message could be processed multiple times, causing acknowledgment problems. Try setting concurrency to 1 for testing.
Subscription Configuration: Check your Pub/Sub subscription settings, including retry policy and dead letter topic configuration. Also, ensure your Cloud Run service account has the necessary permissions to acknowledge messages.
External Dependencies: If your message processing relies on external services, make sure they aren't causing delays or failures that prevent timely acknowledgments.

Troubleshooting

Implement immediate acknowledgment: Start by acknowledging messages immediately in your code and gradually add back your processing logic to identify the cause of the issue.
Use comprehensive logging: Detailed logs can help you track message processing and identify potential bottlenecks.

a_train

Thank you for your quick response.

My situation:

I am using EVENTARC (if this changes anything).I updated my ACK deadline to 600s and my processing takes about 1min30s. I tested with an immediate 200Ok, it works. When trying in real env (with my 1min30 processing or a time.sleep), the behavior I get is that messages are received almost every ten seconds until the first ACK return. Then it stops, all the others messages are still processed (they are duplicates). It behaves exactly like the default BUT I configured the ACK deadline to be 600s, it looks like it does not even consider it.

I hope that you have a better point of view of my problem, can please give me hints on that, what other parameters need to be changed ?

Some people talk about the min retry timeout but it does not seems right : https://stackoverflow.com/a/77332635/11406744

Also, PubSub is said to be "at least once delivery", should we consider that it often happens or that IT JUST HAPPENS. I don't know if my problem is just that PubSub is doing its "at least once" on each message.

PS:

That's a problem found over and over on the web, you should PLEASE add a thorough documentation on this behavior (examples and responses cases in different scenarios and how the different parameters come into play). It is very frustrating to look for answers on stackoverlow and forums whilst the documentation could have explained it without doubt.

https://stackoverflow.com/questions/73235468/duplicate-events-in-eventarc-triggered-google-cloud-run...

https://stackoverflow.com/questions/73164123/pub-sub-re-sending-message-after-10-sec-even-setting-ac...

https://stackoverflow.com/questions/72074775/why-does-gcp-pub-sub-publish-a-message-twice

naman2

Hello I have a similar problem,
I have a Google Cloud setup where a Cloud Function is triggered whenever a new object is created in a Cloud Storage bucket. The function does some processing, and when it detects a specific file (e.g., export_complete.txt), it publishes a message (with the bucket name and directory path) to a Pub/Sub topic.

This Pub/Sub topic then triggers a Cloud Run service, which downloads and zips all files in that directory. Pubsub subscription is pull type

I have also tried an alternative approach where:

An Eventarc trigger fires on Cloud Storage object creation.
A single Cloud Run service handles both the detection and zipping process.

The Problem

The Cloud Run service is receiving multiple POST requests for the same event, leading to duplicate processing. Some things I have tried to mitigate this:

Increased Pub/Sub timeout – Didn't help with duplicate triggers.
Threading in Flask (Cloud Run Service) – To send ACKs immediately, but here the process just stops randomly it dose not process all the files (It works for 10-15 files ~ 150mb ish but my actual application is gonna be 7gb).
Optimized download & zip logic – So it zips faster still no difference .
Cloud Function triggering a Cloud Run Job instead of a Service – But I’m struggling to pass parameters like the bucket name and file path to the Cloud Run Job.

What I Need Help With
How to ensure a single execution per event?
Is there a way to deduplicate events in Pub/Sub or Eventarc before triggering Cloud Run?
How can I correctly pass parameters from a Cloud Function to a Cloud Run Job?

Would really appreciate any insights or better practices to prevent multiple triggers and ensure efficient execution or an alternative approach

PS:- The only solution that has worked so far (sort of) is using a mutex lock so once a process occupies it no other request are entertained

ambeshsingh

@ms4446 I have a similar scenario where an eventarc is triggering my cloud run(push subscription). My code does some data processing tasks and it takes over 3 minutes for the process to complete before it sends the status code as 2xx. So unless my logic is executed, I cannot return the status code; hence, my cloud run gets triggered multiple times. I would appreciate a solution for this

ms4446

Hi @ambeshsingh,

In your scenario where a Cloud Run service triggered by Eventarc takes more than 3 minutes to process a task, leading to multiple triggers due to the service not sending a 2xx status code in time, you can consider the following solutions:

Increase Cloud Run Timeout: If your processing time is less than 60 minutes, the first step is to increase the timeout setting of your Cloud Run service. By default, Cloud Run has a 5-minute timeout, which can be extended up to 60 minutes. This might be sufficient for your needs.
Asynchronous Processing with Pub/Sub:
- Publish to Pub/Sub: Modify your Cloud Run service to quickly acknowledge the Eventarc trigger by returning a 2xx status code and then publish the task details to a Pub/Sub topic.
- Separate Worker Service: Create another Cloud Run service or Cloud Function that subscribes to this Pub/Sub topic. This service will handle the actual data processing task.
- Benefits: This approach decouples the receipt of the trigger from the processing task, allowing your initial service to respond quickly and avoid multiple triggers.
Use Cloud Tasks for Task Queuing:
- Enqueue Tasks: On receiving the Eventarc trigger, your Cloud Run service should quickly enqueue the processing task to Cloud Tasks and return a 2xx status code.
- Process Tasks Separately: Have another Cloud Run service or Cloud Function process these tasks. This service can have a longer timeout to accommodate the processing time.
- Advantages: This method also decouples the event handling from the processing and can handle tasks that take longer than the Cloud Run timeout limit.
Implement State Management:
- Track Processing State: Implement a system to track whether a task for a specific event has already been started or completed. This could involve using a database or a cache.
- Check Before Processing: When your Cloud Run service is triggered, it should check if the task for the incoming event is already being processed or has been completed. If so, it can return a 2xx status code immediately to avoid duplicate processing.
Optimize Processing Logic:
- If possible, optimize your data processing logic to complete within the Cloud Run timeout limits. This might involve algorithmic improvements or more efficient use of resources.
Cloud Run Jobs for Batch Processing:
- If your workload is suitable for batch processing, consider using Cloud Run Jobs, which can handle tasks with a longer execution time (up to 24 hours).

Each of these solutions has its own trade-offs and complexities. The choice depends on your specific requirements, such as the nature of the data processing tasks, the acceptable response time, and the architectural changes you are willing to implement.

ambeshsingh

hi, @ms4446 Thank you for your response.Appreciate it. Here are some queries based on your suggestion:
1)Increase Cloud Run Timeout: The issue is not of Cloud Run being timed out but the pub/sub acknowledgment that the cloud run has to send back. Since pub/sub will wait for a max of 600 sec to receive the ack, it will retrigger the Cloud Run if it does not receive 2xx status code in the given span of 600 sec. If I modify my Cloud Run service to quickly acknowledge the Eventarc trigger by returning a 2xx status code and then publish the task details to a separate Pub/Sub topic, the issue still remains the same as the new Pub/Sub will expect the ack from the Cloud run service within 10 mins. It seems like the 10-minute timeout for PubSub acknowledgments on push subscriptions is a big blocker for us, and also we canot switch to using a pull subscription, as that would require the Cloud Run service to be running continuously.

ms4446

Given your constraints and the challenges with the 10-minute acknowledgment timeout for Pub/Sub in a push subscription model, here are some revised strategies:

Immediate Acknowledgment with Asynchronous Processing:
- Modify your Cloud Run service to immediately acknowledge the Eventarc trigger by returning a 2xx status code as soon as it receives the event.
- Then, enqueue the task for asynchronous processing within the same service or dispatch it to another internal component or service. This could be done using in-memory queues, a database, or another internal mechanism that doesn't rely on Pub/Sub.
- This approach ensures that Pub/Sub receives an acknowledgment within its timeout window, while the actual processing happens independently.
State Management with Database or Cache:
- Implement a state management system using a database or cache (like Cloud Firestore or Cloud Memorystore).
- When your Cloud Run service receives an event, it should first record the event details and its processing state in the database/cache.
- After recording the state, immediately respond with a 2xx status code to acknowledge the event to Pub/Sub.
- Continue processing the event asynchronously. Once the processing is complete, update the state in the database/cache.
Cloud Tasks for Deferred Processing:
- Instead of using Pub/Sub for the second stage of processing, use Google Cloud Tasks.
- Upon receiving the event, your Cloud Run service should quickly create a task in Cloud Tasks and return a 2xx status code to acknowledge the event to Pub/Sub.
- Cloud Tasks can then trigger another Cloud Run service or Cloud Function to handle the actual processing. This service can have a longer timeout, and Cloud Tasks allows for flexible scheduling and retry policies.
Optimize Processing Time:
- While this might not always be feasible, look for ways to optimize the processing logic to complete within the Pub/Sub acknowledgment window.
- Consider parallelizing the workload, optimizing algorithms, or preprocessing data to reduce processing time.
Hybrid Approach with Event-Driven and Scheduled Jobs:
- Use the initial Cloud Run service to handle quick tasks and immediate acknowledgments.
- For longer-running tasks, schedule them as jobs (using Cloud Scheduler or a similar tool) to be processed independently by another Cloud Run service or Cloud Function.

The choice of strategy will depend on the specifics of your workload and the architectural changes you're willing to make.

henry-lxagroup

@ms4446, can you help to confirm that the "Acknowledgement deadline" set on the Pubsub push subscription is actually honoured by the call? Extending the deadline doesn't seem to take any effect.

henry-lxagroup

Just want to share my finding. When I created an eventarc trigger with a pubsub destination, the acknowledgement deadline of the push subscription, once created, it can't be adjusted, even though I tried modifying the push subscription directly. It seems the eventarc hardcode the ack deadline somehow, i.e. 10 secs.

When I changed to manually creating the push subscription myself (without eventarc), the acknowledgement deadline can actually be extended to a longer duration.