Hello!
I have a question on how Pub/Sub works in case of disaster recovery.
As a example let say that I have all my services distributed in 2 different regions in US with not cross-region dependency (us-west1 and us-east1). In both region I'm sending messages to the global PubSub endpoint.
If my primary region is us-west1, that means that all the messages sent to PubSub topic will be stored in the us-west1 region right? If I'm not wrong PubSub by default store messages regionally with zonal replication (not sure if this is a bucket or something like that).
In case of outage us-west1 I'm going to use my secondary region us-east1 sending messages to the same PubSub Topic (Global endpoint)
According to this documentation https://cloud.google.com/pubsub/docs/reference/service_apis_overview#service_endpoints If I'm using the global PubSub endpoint in us-west1 and PubSub becomes unavailble in us-west1.
Regards!
Within a region, zones are designed to minimize the risk of correlated failures with other zones, and a service interruption in one zone would usually not affect service from another zone in the same region. An outage scoped to a zone doesn't necessarily mean that the entire zone is unavailable, it just defines the boundary of the incident. It is possible for a zone outage to have no tangible effect on your particular resources in that zone.
Regional resources are designed to be resistant to zone outages by delivering service from a composition of multiple zones. If one of the zones backing a regional resource is interrupted, the resource automatically makes itself available from another zone. Carefully check the product capability description in the appendix for further details.
If you are using regional resources and remain resilient to regional outages, then you must perform your own resource composition by designing, building, and testing their failover and recovery between regional resources located in multiple regions.
Multi-regional resources are designed to be resistant to region outages by delivering service from multiple regions. Multi-region products trade off between latency, consistency, and cost. The most common trade off is between synchronous and asynchronous data replication. Asynchronous replication offers lower latency at the cost of risk of data loss during an outage. So, it is important to check the product capability description in the appendix for further details.
See this part of the document[1] for better understanding on what you could do for avoiding losses in a regional outage and a multiregional outage.
[1]https://cloud.google.com/architecture/disaster-recovery#common_themes
Hello Joel,
I have the same doubts that you mentioned.
How did you manage the problem?
Regards
Hello @JoelAvalos ,
The same document that @josegutierrez mentioned has a section that describes Pub/Sub behavior in the event of zonal and regional outages: https://cloud.google.com/architecture/disaster-recovery#pubsub. Please take a look. I hope that helps.
Hello,
I've reviewed all of the disaster recovery for PubSub and am looking for some clarification related to use of the PubSub Global Endpoint. I have a similar question that was posted originally in this thread.
@JoelAvalos wrote:According to this documentation https://cloud.google.com/pubsub/docs/reference/service_apis_overview#service_endpoints If I'm using the global PubSub endpoint in us-west1 and PubSub becomes unavailble in us-west1.
- Will my services in us-west1 fail when they try to send the message to the global PubSub endpoint?
- Is there an automatic way to make the global endpoint sent messages to us-east1 PubSub region if PubSub becomes unavailable in us-west1 region?
If a component running in a GCP region is using the Pub Sub Global Endpoint for publishing and the control plane for PubSub is not available in the same region the component is running in, will the Global Endpoint Load Balance the request to another available PubSub region? If so, is that controlled by message storage policies or how does the load balancing work? If not, and the documentation suggests it doesn't, what is the advantage of using the Global Endpoint vs Locational Regional?
Best Regards!