Credentials to access google pubsub is rotated every month and there is an overlap of 10 days between old and new keys before the validity of old key expires.
We have a pubsub subscriber that uses FixedCredentialsProvider to set the credentials (Code is at the bottom). The subscriber is started at the time of application boot up. I am guessing the subscriber will fail to read the messages once the credentials expire. We dont want to restart our service every month to reinitialize the subscriber with new credentials. What is the best way to refresh the key without having to restart the service ? How do we go about implementing a solution that reinitializes the subscriber with new credentials without causing any interruption to messages that are in flight ?
Subscriber subscriber = Subscriber .newBuilder(projectSubscriptionName, messageReceiver) .setCredentialsProvider(FixedCredentialsProvider.create(getKeyFromVault())).build(); subscriber.startAsync().awaitRunning();
Solved! Go to Solution.
Your approach of flipping a flag to stop new messages from being acknowledged by the old subscriber and having them redelivered to the new subscriber can work, especially if your system can tolerate the additional latency of message redelivery. Here are a few points to consider:
Message Ordering: If your system relies on the order of the messages, then redelivery might not be suitable as it might disrupt the order of messages.
Message Processing Time: Also consider how long it takes to process a message. If it takes a long time, you might have messages that were delivered to the old subscriber but have not been processed yet when the flag is flipped. You need to ensure that these messages are not lost or duplicated.
Redelivery Latency: Pub/Sub does not immediately redeliver unacknowledged messages. It uses an exponential backoff for redelivery, which means that it could take a significant amount of time for messages to be redelivered to the new subscriber.
As for your question about GCS, creating a new Storage object for every interaction with GCS would work but it's not very efficient. Creating a new object involves creating a new connection which can be time-consuming.
A better approach would be to keep a cache of Storage objects with different credentials and switch between them when necessary, similar to how you would switch between Subscribers. This would allow you to reuse connections and avoid the overhead of creating a new object for every operation.
Remember to ensure that your system can handle any errors that might occur during the transition period. For example, if you attempt to use an expired Storage object, the operation will fail, so your system needs to be able to handle that error, possibly by retrying the operation with the new credentials.
Also, be aware of Google Cloud's rate limits. If you are creating new connections too frequently, you might hit these rate limits which would cause your requests to be throttled or denied.
The Google Pub/Sub client libraries do not provide a built-in way to change the credentials of a running Subscriber. The FixedCredentialsProvider is designed to be, as the name suggests, fixed. Once the credentials are set, they cannot be changed. This means that the Subscriber must be recreated with new credentials if they are changed.
However, you can accomplish the refreshing of credentials without restarting the entire service. Here is a high level approach to achieve this:
Track the Expiration Date: Keep track of when the credentials are going to expire, possibly by saving the expiration date along with the credentials when you get them from the vault.
Create a New Subscriber: Before the old credentials expire (for example, when you're within the 10-day overlap period), create a new Subscriber using the new credentials. It can exist alongside the old Subscriber.
Transition Traffic: Gradually transition traffic from the old Subscriber to the new Subscriber. You can do this by having some kind of flag or switch that determines which Subscriber new incoming messages are directed to. Start by directing a small percentage of messages to the new Subscriber, then gradually increase the percentage over time.
Stop the Old Subscriber: Once all traffic has been successfully transitioned to the new Subscriber, and all in-flight messages have been processed by the old Subscriber, you can stop the old Subscriber.
Cleanup: Finally, cleanup any resources associated with the old Subscriber.
Remember that during this process, both the old and new Subscriber can exist at the same time and process messages independently. This means that your service can continue to process messages without interruption while the credentials are being refreshed.
Note: This is a high level approach. The actual implementation will depend on the specifics of your architecture and infrastructure. You might need to handle things like error handling, logging, and monitoring in your own way.
Also, always ensure that you are adhering to best practices when it comes to managing and rotating secrets. Avoid exposing them in logs or other outputs, and always transport them securely.
Lastly, you may want to consider using Google Cloud's Secret Manager, which is designed for this kind of use case. Secret Manager provides a secure and convenient method for storing API keys, passwords, certificates, and other sensitive data. It supports versioning and automated rotation, which might simplify your credential rotation process.
For the third step Transition traffic, my understanding is that we will not have a control over which subscriber can receive a message. When you have multiple subscribers ( one with old key and the other one with new key) for the same subscription, incoming messages are randomly distributed to both the subscribers. How exactly we can direct the traffic to the new subscriber ? Can you please elaborate on this ? You mentioned about the switch but can you clarify where would the switch be located - in the subscriber or a receiver or somewhere else ?
just giving this a bump. Still looking for insights on this.
Rotating credentials for Pub/Sub subscribers can be a bit tricky, especially when aiming to avoid message loss or service interruptions. A common misconception is that you need to actively redirect traffic from one subscriber to another during this update. However, when updating credentials for the same subscription, Pub/Sub’s built-in redelivery mechanism provides a seamless solution.
Here’s how it works: Establish a scheduled task—typically monthly—that initiates the credential rotation. This task creates a new subscriber configured with the updated credentials. Importantly, the existing subscriber is not immediately disabled. Instead, its behavior is adjusted to only acknowledge messages it is already processing while ignoring new ones. This step is crucial for maintaining smooth operations.
When the old subscriber stops acknowledging new messages, Pub/Sub automatically redistributes those unacknowledged messages through its redelivery system. The newly created subscriber, configured with the updated credentials, takes over and processes these messages, ensuring continuity without the need for complex routing or traffic switching.
After confirming that the new subscriber is functioning as expected, the old subscriber can be safely decommissioned. This approach simplifies the credential rotation process, minimizes setup complexity, and significantly reduces the risk of message loss or duplication. The key lies in leveraging Pub/Sub’s redelivery system to hand off the workload seamlessly from the old subscriber to the new one, ensuring a smooth and efficient transition.
Thanks for the detailed explanation. It is very helpful.
>Start by directing a small percentage of messages to the new Subscriber, then gradually increase the percentage over time.
Any reason why the transition has to be gradual ? Why cant we have the old subscribers stop accepting the new messages once the scheduled one month is elapsed. Here is the flow I am thinking:
Do you see any flaw with this approach ?
We also make use of cloud storage. Here is what we have:
this.storage = StorageOptions.newBuilder().setCredentials(keyManager.getGoogleCredentials()).build()
.getService();
storage object is cached and it will stop working when the credentials expire. Is it advisable to build a new storage object with the latest credentials for every interaction with GCS.
Your approach of flipping a flag to stop new messages from being acknowledged by the old subscriber and having them redelivered to the new subscriber can work, especially if your system can tolerate the additional latency of message redelivery. Here are a few points to consider:
Message Ordering: If your system relies on the order of the messages, then redelivery might not be suitable as it might disrupt the order of messages.
Message Processing Time: Also consider how long it takes to process a message. If it takes a long time, you might have messages that were delivered to the old subscriber but have not been processed yet when the flag is flipped. You need to ensure that these messages are not lost or duplicated.
Redelivery Latency: Pub/Sub does not immediately redeliver unacknowledged messages. It uses an exponential backoff for redelivery, which means that it could take a significant amount of time for messages to be redelivered to the new subscriber.
As for your question about GCS, creating a new Storage object for every interaction with GCS would work but it's not very efficient. Creating a new object involves creating a new connection which can be time-consuming.
A better approach would be to keep a cache of Storage objects with different credentials and switch between them when necessary, similar to how you would switch between Subscribers. This would allow you to reuse connections and avoid the overhead of creating a new object for every operation.
Remember to ensure that your system can handle any errors that might occur during the transition period. For example, if you attempt to use an expired Storage object, the operation will fail, so your system needs to be able to handle that error, possibly by retrying the operation with the new credentials.
Also, be aware of Google Cloud's rate limits. If you are creating new connections too frequently, you might hit these rate limits which would cause your requests to be throttled or denied.