Hi all, I have a question re: pub/sub.
My set up is as follows
- I have a publisher creating about ~3000 messages which get published to a single subscription
- That subscription has >3000 subscription clients open, each with a streaming pull connection setup and ready to receive messages
- each client has a MaxOutstandingMessages set to 1, and the their thread counts are limited to 1
- I am using exactly once delivery, and I have a max ack deadline (client side) set to 60 minutes, with the min duration extension set to 9 minutes and max set to 10 minutes (the job I'm processing with each message is time intensive), messages come in bursts. Server side, I have the ack deadline set to 180 seconds
I would expect, given the number of streaming pull connections I have open that my subscription should be sending out all the messages in one go, but it seems like about ~99% of my messages get sent immediately, and the remaining 1% seem to be sent after about 10 minutes. I have noticed, on occassion, that this also seems to coincide with ack deadline expiries (which doesn't really seem to make sense to me given my deadlines are so high).
My questions:
(1) why are all my messages not being disseminated at once? Is there any configuration setting I should be following?
(2) is my usage of max deadline, min extension / max extension / the server side extension correct?
I'm stumped, and would appreciate any help. Happy to share more info. If it's helpful, I'm using google-cloud-cpp v2.1.0 for my client library.
(1) Why are not all my messages being disseminated at once?
There are a few possible reasons why your messages are not being disseminated at once:
(2) Is my usage of max deadline, min extension / max extension / the server side extension correct?
Yes, your usage of max deadline, min extension / max extension / the server side extension is correct.
The max deadline is the maximum amount of time that a subscriber has to acknowledge the receipt of a message before it expires. The min extension and max extension allow you to extend the deadline for a message if it is not possible to acknowledge it within the max deadline.
The server-side ack deadline sets the default time a subscriber has to acknowledge the receipt of a message. While clients can request to extend this time based on their processing needs, this server-side setting acts as an initial threshold.
Recommendations
Here are a few recommendations to help you improve the dissemination of messages from your subscription:
Re: your answer to 1, will pub/sub throttle the delivery rate even if I have extra servers/subscribers available for my subscription? I.e. I'm sending out bursts of 3000 messages every N minutes (after the prior batch of 3000 have been handled), but I have 3300 streaming pull connections/servers available (all of which are able to handle one message at a time), shouldn't all the messages be disseminated before pub/sub is able to adjust the delivery rate?
Certainly! Here's a revised version of your response:
Yes, Pub/Sub does manage the rate of message delivery based on several factors, even if extra servers/subscribers are available. Here's a breakdown:
Message backlog: A significant backlog means there are many messages waiting to be delivered. Pub/Sub aims to distribute these messages evenly among subscribers. However, having a backlog doesn't inherently lead to throttling. Instead, if subscribers are slow to process and acknowledge messages, causing the backlog to grow continuously, then throttling may come into play to prevent overwhelming subscribers.
Flow control and subscriber processing time: If subscribers take a long time to process messages or if the flow control settings are restrictive, then the rate of message delivery might be affected. Pub/Sub uses flow control to prevent subscribers from becoming overwhelmed.
Internal factors: While Google Cloud Pub/Sub is designed to scale and handle massive volumes of messages efficiently, occasional internal latencies or optimizations can affect the immediate dissemination of messages.
Given your scenario, where you're sending bursts of 3,000 messages but have 3,300 available streaming pull connections, it would be atypical to see significant delays in message delivery.
Recommendations:
Reduce any existing message backlog: Increase the number of subscribers or optimize the message processing time on the subscriber side.
Adjust flow control settings: You can modify settings like the maximum number of outstanding messages or the acknowledge deadline to better suit your subscribers' processing capabilities.
Monitoring: Use Google Cloud's Monitoring and Logging tools to keep an eye on message states and to identify any potential bottlenecks or issues with subscribers.
i've done all 3 of the recommendations, what else can i do?
is there any way I can adjust the delivery rate on the subscription side? i.e. force it to not throttle
If you've already implemented the primary recommendations and continue to experience consistent and significant delays in message delivery, contacting Google Cloud support might be the next logical step.
While you cannot directly prevent Pub/Sub from managing delivery rates, there are steps to optimize message throughput:
Lastly, it's important to remember that Pub/Sub's management of delivery rates is designed to ensure system stability and prevent subscribers from becoming overwhelmed. Balancing message throughput with subscriber capacity is crucial to prevent issues like excessive message redelivery or potential message loss.
that makes sense. I noticed that the issue stops showing up when I disable exactly once, any ideas why exactly once could be causing problems here?
"Exactly once" delivery ensures that messages are delivered to subscribers precisely one time, providing guarantees against message duplication. This feature, while powerful, introduces additional complexities to the message delivery process.
The overhead from "exactly once" semantics doesn't necessarily cause Pub/Sub to throttle delivery, but it can introduce latencies due to the acknowledgment requirements and potential redelivery of unacknowledged messages. If you observe improved delivery rates upon disabling "exactly once" delivery, it suggests that these inherent latencies might be contributing to the observed behavior.
For scenarios with bursts of messages, it's crucial to ensure not only rapid message delivery but also timely acknowledgment from subscribers. If "exactly once" delivery is essential for your use case, consider:
Lastly, consulting with Google Cloud Support can provide deeper insights and tailored recommendations to optimize your Pub/Sub configuration.