Managing Capacity, Quota, and Stockouts in the Cloud: Concepts and Tips

Lauren_vdv · 09-09-2022 07:40 AM

Have you heard of the phrase, a “failure to plan is a plan to fail?” Well, this is very much the case when it comes to managing capacity in the cloud at scale.

In our latest Community event, Technical Account Manager, Stephen Tolan (@stolan) shared key concepts and best practices to effectively plan for and manage capacity, quotas, and stockouts in the cloud to avoid disruptions to your business.

In this article, we share key takeaways from the presentation, the session recording, written questions and answers, and supporting resources so you can refer back to them at any time.

Session recording
Key concepts: Regions, zones, services, and definitions
Regions and zones
Capacity and quota
Quota Increase Requests (QIR)
Stockout
Best practices: Capacity planning, quota management, stockouts, and reservations
1. Capacity planning
2. Quota management
3. Managing stockouts
4. Reservations
Common use cases: Organic vs burst
Organic growth
Burst workloads
Resource summary
Questions and answers

If you have any further questions, please add a comment below and we’d be happy to help!

It's our goal to provide a trusted space where you can receive support and guidance along your cloud journey. So if you have any feedback or topic requests for our next sessions, please let us know in the comments, or by submitting the feedback form.

You can keep an eye on upcoming sessions led by Technical Account Managers in the Community here. Thank you!

Session recording

Key concepts: Regions, zones, services, and definitions

Before diving into capacity management best practices, it’s important to understand key concepts and definitions.

Regions and zones

Zone: Deployment area for Google Cloud resources within a region. Zones can also be considered a single failure domain within a region. Example: us-central1-A or us-central1-B
Region: Independent geographic areas that consist of zones. Example: us-central1 for Iowa in North America or asia-southeast1 for Singapore in Asia

Zones and regions are logical abstractions of underlying physical resources provided in one or more physical data centers.

Multi-Region: Broad geographic area that includes more than one region. Example: With Cloud Storage, where you can have a multi-region bucket and have objects stored in two regions for better performance or lower latency.
Global: Can be any location. Example: Content Delivery Network (CDN) or Global Cloud Load Balancing

Capacity and quota

What is capacity and quota?

Capacity: Total availability of regional and zonal hardware resources that power our Google Cloud services.

Shared across all customers
Example: For Compute Engine, this would be like a VM family type made up of Cores, Ram, Disk, and GPUs

Quota: Upper bounds of resources set for a project or a particular Google Cloud service.

Can be physical or an API
Example: Number of API requests per day (API) or the number of VMs used by your project at a given time (physical)

Capacity does not equal quota.

Just because you have the quota granted, there’s a possibility that you might face something called a “stockout,” which we cover in more detail below.

Quota Increase Requests (QIR)

You may find you need to increase your current quota value to a new one, in which case you can do this with most quotas directly from the Google Cloud Console.

Navigate to the IAM page and then the quotas tab in the Console. From there, you can navigate to the quota for a specific service, API, or region and request an increase there. See step-by-step instructions here.

Occasionally there may be some quotas that need to be requested through a support case. Once you make a request, you will typically receive a response of approval or denial within a few business days, depending on the size and resources.

For large requests, such as 10,000 cores of N2Ds or more, we recommend working with your Account Manager. If you don’t have one, you can reach out to sales to start this discussion.

Also, please be aware that there are limits for certain quotas, so we recommend you check the product’s quota and limits documentation.

Learn more about working with quotas here.

Stockout

Even when your quota is set, you may face something called a stockout, which is when requested resources are not currently available. For example, with Compute Engine, the resources could be the VM family, the specific VM type or shape, or even the type and amount of disk needed.

When you face a stockout, you might see messages like the following in the Google Cloud Console or in Cloud Logging.

resource pool exhausted
The zone … does not have enough resources available to fulfill the request
ZONE_RESOURCE_POOL_EXHAUSTED
ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS

Jump to the next session for best practices to help prevent and mitigate stockouts.

Best practices: Capacity planning, quota management, stockouts, and reservations

Now that we’ve laid the foundation for what capacity, quotas, and stockouts are, let’s dive into four best practices to help make sure you have resources when you need them to avoid any disruptions to your business.

1. Capacity planning

Planning is one of the most important practices for effective capacity management. Below is an example you can use yourself or if you have a Technical Account Manager, they can share a template like this with you. This provides a way to collect the current and future capacity needs for your workloads that are running or will run on Google Cloud.

During the capacity planning phase, you’ll work with your Google Account team to:

Choose the right VM family and shape. This can be choosing compute-optimized VMs like our C2 or C2D VMs for heavy CPU workloads, or M1 or M2s for memory-optimized workloads that are memory heavy.
Assess and evaluate capacity requirements. You’ll want to identify spikes in usage patterns, the difference between your average and peak capacity requirements, and what you should expect to need at least three months ahead of planned launches, peak usage, or seasonal spikes. This will help ensure that the capacity will be available when needed.
Schedule a quarterly checkpoint with your Account Team to forecast capacity needs and organic growth demand. Starting the conversation early will greatly help with your capacity experience on the cloud and alleviate the potential for stockouts.

2. Quota management

There are three key components of effective quota management:

Plan: As part of the capacity planning exercise, review and set quotas for resources to match capacity requirements. Plan for spikes in usage and include them as part of the quota planning.
Submit: Based on your quota plan, submit any necessary quota increase requests (QIR) from the Cloud Console. QIRs should not be reactive fire drills! Submit QIRs proactively before the limit is reached.
Monitor: Use Cloud Monitoring to monitor quota usage and set alerting policies to be notified of approaching threshold. Leverage Quota Monitoring Solution to manage quota usage across multiple Google Cloud projects.

3. Managing stockouts

Stockouts can happen when there’s no capacity planning or quota management. As mentioned above, a stockout is a message that you receive when your requested resources are not currently available. When this happens, follow the below recommendations:

Wait and try again - capacity is dynamic! This can be anywhere from 15 minutes to an hour, but it depends on the resources, region, and popularity.
Use a different zone or region. Having another zone or even region where you can deploy your workload will increase the pool of resources to which you have access. This also lowers the risk of a single point of failure when you have multiple zones available (high availability (HA) best practice).
Consider a different machine family or shape. This helps to diversify the resources your workload can use and decouples the workload from the requirement of a specific VM family type/size. This again increases the pool of resources to which you have access.
If using a managed instance group (MIG) for autoscaling, consider changing the target distribution shape to BALANCED. This helps with scaling out your workload to balance the VMs needed across more than one zone if configured this way.
File a Google Cloud support case to report your issue and help track major impact. This helps to track major stockouts that you can bring to your Google Account team for review.
Engage your Google Account team to discuss other options. Often, there are times we can help out in a pinch.

Most importantly, proactively prepare and plan for your capacity requirements with your Google Account team to avoid experiencing a stockout.

4. Reservations

A reservation is a capacity fulfillment offering that provides Google Cloud customers very high assurance in obtaining capacity for their business critical workloads by “reserving” Google Cloud resources.

For example, with Compute Engine, you can reserve VMs in a specific zone. By default, the reservation is associated with the one project, but you can also create a “shared reservation” where the resource can be shared across projects, like a Shared VPC.

Reservations are billed at the same rate as the running resources they’re reserving, and as such, qualify for committed-use discounts and sustained-use discounts. Consider combining reservations with resource commitments to get discounted and reserved resources.

Please note that a VM instance can only use a reservation if its properties exactly match the properties of the reservation, including the region, zone, machine type, etc.

Currently, reservations apply only to Compute Engine, Dataproc, and Google Kubernetes Engine VM usage.

Reservations are especially a great option for high assurance of resource availability with:

Business critical workloads that require specificity (machine shape, location, time)
New workload launch or migration
Peak traffic surge events (e.g. Black Friday, Cyber Monday, new product launch, etc.)
Protecting minimum capacity of managed instance groups (MIGs)
Guaranteed capacity during instance recreations (e.g. rolling updates)

Common use cases: Organic vs burst

Organic growth

Organic growth is when a workload grows gradually over time, but not in spikes. With organic growth, capacity planning should continue on an ongoing basis - for existing workloads and even a new workload that is migrating to Google Cloud, but has steady growth patterns.

Organic growth is generally easier when it comes to capacity planning and quota management due to their predictable growth based on past trends.

Burst workloads

On the other hand, burst workloads represent a sudden increase in consumption that’s 5 to 20 times the normal volume. Burst workloads can include onboarding new SaaS customers, a disaster recovery exercise, experimentation or proof of concepts, and performance and load testing.

With burst workloads, it’s very important to start capacity planning and quota management as early as possible.

Resource summary

Google Cloud Technical Account Managers event series: Subscribe to events hosted by Technical Account Managers at Google Cloud, answering and addressing the most commonly-asked questions and topics from Cloud customers.
Event Management Service (EMS): Premium Support offers EMS for planned peak events, such as a product launch or major sales event. Contact your Technical Account Manager to get started.
Preparing for peak traffic events: Event recording to learn how to successfully plan for peak traffic and launch events to avoid any disruptions for your customers and your business.
About reservations
Working with quotas
Cloud Monitoring
Quota Monitoring Solution

Questions and answers

I’m trying to create a managed notebook. Although I have a quota increase, I haven’t been able to create the notebook in any US regions and always get the same error:

Could not create instance: Quota limit 'GPUsA100PerProjectPerRegion' has been exceeded. Limit: 0 in region us-central1

How much quota do I need so I can use this type of machine?

a2-highgpu-1g (Accelerator Optimized: 1 NVIDIA Tesla A100 GPU, 12 vCPUs, 85GB Ram)

This is an example where a service is actually using two Google Cloud services: 1) Notebooks (also known as Jupyter Notebooks) which runs on a VM, and 2) GPUs.

You’ll need to make sure you have both quotas raised - one for the Notebook and another for the GPU. In this case, you need to raise the A100 GPU (which is one of our latest GPUs on the cloud), for region us-central1. This GPU is often used for ML training.

After submitting the Quota Increase Request, you should get a response from Google Cloud within a few business days.
“I have a TPU VM that I last used a couple months ago. When I try and start it now, I get:

There is no more capacity in the zone "asia-east1-c"; you can try in another zone where Cloud TPU Nodes are offered (see https://cloud.google.com/tpu/docs/regions) [EID: 0xa4caf2ea47ece68e]

Is there any way to access the data? All the docs about migrating VMs don't seem to apply to TPU VMs.”

It sounds like you were able to use the TPU or “Tensor Processing Unit,” but after shutting it down, you weren’t able to relaunch it. This is a classic example of capacity being dynamic in the cloud.

This message says there are no TPUs currently available in asia-east1-c. This means zone “C” in region asia-east1.

You can wait some time and try again later, or you can try another zone if you need to stay in that region. The message contains the URL to check what other zones have TPUs in them.
“Is there a way to create alerts when your quota is about to reach its limit?”

This is one of the best practices for quota management. You can create alerting policies in Cloud Monitoring (previously called Stackdriver) that checks the quota every minute or so. This will create an alert you can see in the Google Cloud console when you’re about to reach your limit.

You can also further enhance this by integrating Slack using notification channels in Alerts under Cloud Monitoring. This way, any alert can be sent to your team within minutes of the alert getting triggered. This is common for SREs or infrastructure teams who own monitoring quotas.
“Is there a history or logs you can check for Quotas/Limit changes? been checking for quite some time and can't see one. I need to know what quotas were changed on the project I’m currently handling for documentation.”

This sounds like auditing quota changes. To my knowledge, there is no way to audit when or who changed a quota using audit logs in Google Cloud.

You can control who has access to change a quota by limiting the IAM permissions a user has by following the Security Best Practices for IAM of “least privilege.” This is the concept of only granting the IAM permissions a person needs and nothing more. This is also referred to as “separation of duties.”

One idea is to compare newly-created Google Cloud project quotas with the original project quotas. Each Google Cloud project gets a default quota set for all services. You could compare quotas side by side that are used in the console with two different internet browsers.

You can also use the Quota Monitoring Solution to export these as a report.
“I’m getting the following error while loading a file from Google Storage to BigQuery, but I can’t find where I exceeded any of the limits.

“Quota exceeded: Your table exceeded quota for imports or query appends per table.”

I’m using the PHP sdk loadFromStorage method. In the quotas interface, it doesn’t say that I exceeded anything, and we couldn't find any value that justifies reaching any quota. Any advice on troubleshooting or seeing where things are going wrong?”

Most likely you’re hitting the “Maximum number of table operations per day,” which is set to 1,500 updates per day, because the message says “imports or query appends per table.”

I would check the BigQuery quota troubleshooting page - there’s often a query you can run to get more information on this issue or even check Cloud Logging.
“Which tool or service can be used to monitor quota?”

You can do this directly with Cloud Monitoring to monitor quota usage and set alerting policies to be notified of approaching threshold. You can also leverage Quota Monitoring Solution to manage quota usage across multiple Google Cloud projects.
“Do you have any services or tools to optimize quota/capacity in Anthos on-premises VMware clusters?”

We don’t have quotas set for on-premises resources except for Connect Gateway, but there are customers in this case who look at deployments or pods in their clusters themselves. So typically customers will monitor their deployments or pods in the clusters themselves to evaluate - of the resources provisioned - how much is actually consumed. Anthos or Kubernetes Quotas can often be confused as being resource consumption on the Anthos cluster itself. (Please note that this is an amended and updated response from the live session.)