System Design: Storage Best Practices

Lauren_vdv · ‎12-13-2021

In this article, you'll find recommendations and best practices focused on the topic of Storage, as part of the System Design Pillar of the Google Cloud Architecture Framework.

Throughout this article, we often refer to the select and implement a storage strategy documentation. We suggest you review this documentation to learn basic concepts before evaluating the following assessment questions and recommendations.

Storage type

How much and what types of storage do you require?

Select Cloud Storage when you want to store data at scale for a low cost, and access performance is not an issue.
Select Persistent Disk or Local SSD when your compute applications need high-performance storage.
Select Filestore for high-performance workloads that need read/write access to shared space.
In situations with high-performance computing (HPC) or high-throughput computing (HTC), refer to our documentation on using clusters for large-scale technical computing for more information.

Select Cloud Storage when you want to store data at scale for a low cost, and access performance is not an issue. Select Persistent Disk or Local SSD when your compute applications need high-performance storage. Select Filestore for high-performance workloads that need read/write access to shared space. In situations with high-performance computing (HPC) or high-throughput computing (HTC), refer to our documentation on using clusters for large-scale technical computing for more information.

Do you need active or archival storage?

For data that will be served at a high rate with high availability, use the Multi-Regional Storage or Regional Storage class. For data that will be infrequently accessed and can tolerate slightly lower availability, use the Nearline Storage or Coldline Storage class. Please review data retrieval and early deletion pricing before you make your decision on storage class.

For data that will be served at a high rate with high availability, use the Multi-Regional Storage or Regional Storage class. For data that will be infrequently accessed and can tolerate slightly lower availability, use the Nearline Storage or Coldline Storage class. Please review data retrieval and early deletion pricing before you make your decision on storage class.

Are you looking to host static objects for web hosting? Are you using Cloud CDN (Content Delivery Network)?

Use Cloud CDN to improve static object delivery. Cloud CDN uses Google’s global external HTTP(s) load balancer to provide routing, health checking, and anycast IP support. Refer to our documentation on setting up Cloud CDN with cloud buckets to know more.

Use Cloud CDN to improve static object delivery. Cloud CDN uses Google’s global external HTTP(s) load balancer to provide routing, health checking, and anycast IP support. Refer to our documentation on setting up Cloud CDN with cloud buckets to know more.

What location(s) and type of data protection do you require?

Regional protection is available by default, where data is stored in at least two zones within the selected region.
Regional protection comes in two types: multi-region or dual-region. For multi-region, data is stored in two or more regions based on the broader geographical region you choose (e.g. United States). For dual-regions, data is stored in two specifically-selected regions (e.g. Tokyo and Osaka). Right now only a select combination of regions are available to select, with more customization options being planned for the near future.
Refer to the Cloud Storage bucket locations documentation to know more.

Regional protection is available by default, where data is stored in at least two zones within the selected region. Regional protection comes in two types: multi-region or dual-region. For multi-region, data is stored in two or more regions based on the broader geographical region you choose (e.g. United States). For dual-regions, data is stored in two specifically-selected regions (e.g. Tokyo and Osaka). Right now only a select combination of regions are available to select, with more customization options being planned for the near future. Refer to the Cloud Storage bucket locations documentation to know more.

Storage access patterns and types of workloads

How do you plan on accessing your data?

Data access patterns highly correlate to how you design your system performance. Cloud Storage provides scalable storage, but isn’t an ideal choice when you’re running heavy compute workloads that need access to large amounts of data. For high-performance storage access, use Persistent Disk.

Data access patterns highly correlate to how you design your system performance. Cloud Storage provides scalable storage, but isn’t an ideal choice when you’re running heavy compute workloads that need access to large amounts of data. For high-performance storage access, use Persistent Disk.

What are the object lifecycle operations ramp-up mechanisms?

Each cloud storage bucket is provisioned with initial IO capacity (see request rate and access distribution guidelines for details). It’s important to plan a gradual ramp up for these requests, and we recommend using exponential backoff while implementing retry logic to handle 5XX, 408, and 429 errors. For details, see the Reliability Pillar.

Each cloud storage bucket is provisioned with initial IO capacity (see request rate and access distribution guidelines for details). It’s important to plan a gradual ramp up for these requests, and we recommend using exponential backoff while implementing retry logic to handle 5XX, 408, and 429 errors. For details, see the Reliability Pillar.

Storage management

Do you store and process sensitive data? How do you monitor and manage access?

Make every bucket name unique across the entire Cloud Storage namespace. Do not include sensitive information in a bucket name and choose bucket and object names that are difficult to guess. Having entropy and randomness in bucket names, if possible, decreases the chance of hotspotting.
Ensure that your Cloud Storage bucket is not anonymously or publicly accessible.

Make every bucket name unique across the entire Cloud Storage namespace. Do not include sensitive information in a bucket name and choose bucket and object names that are difficult to guess. Having entropy and randomness in bucket names, if possible, decreases the chance of hotspotting. Ensure that your Cloud Storage bucket is not anonymously or publicly accessible.

What are your object naming conventions?

Using a random object name gives you the highest level of performance and avoids hotspotting. Use a longer, randomized prefix for your objects wherever possible.

Using a random object name gives you the highest level of performance and avoids hotspotting. Use a longer, randomized prefix for your objects wherever possible.

Do you need to prevent data from being accessible to the public?

Use public access prevention capabilities to prevent access at the Organization, folder, project, or bucket level.

Use public access prevention capabilities to prevent access at the Organization, folder, project, or bucket level.

Do you want the requesting project to pay the access costs?

You can use the Requester Pays feature for Cloud Storage, along with appropriately set up billing projects, to charge the requester for operation, network, and data retrieval costs. The owner still needs to pay for any storage or deletion charges.

You can use the Requester Pays feature for Cloud Storage, along with appropriately set up billing projects, to charge the requester for operation, network, and data retrieval costs. The owner still needs to pay for any storage or deletion charges.

Key Google Cloud services

Cloud Storage: Object storage that’s secure, durable, and scalable
Persistent Disks: Reliable, high-performance block storage for virtual machine instances
Regional Disks: Durable storage and replication of data between two zones in the same region
Local SSD: Ephemeral, locally-attached block storage for virtual machines and containers
Filestore: High-performance, fully managed file storage
Cloud Storage for Firebase: Object storage for storing and serving user-generated content
Actifio GO: Backup, disaster recovery, migration, and test data management software as a service solutions

Resources

Cloud Storage resources

Migration resources

Persistent Disk resources

Block storage performance

What's next?

We've just covered Storage as part of the System Design Pillar of the Google Cloud Architecture Framework. There are several other topics within the System Design Pillar that may be of interest to you: