Manage GKE Cluster Security with Autopilot Mode

shijimolak · 02-23-2022 07:06 AM

Managed Kubernetes services have changed the way organizations adopt microservices-based architectures in the cloud. It helps them focus on what matters the most (the application itself) and not have to worry about the nitty gritties of managing the underlying Kubernetes infrastructure, of which security is often the most important focus.

Google Kubernetes Engine (GKE) is Google’s managed Kubernetes service, which offers two modes of operations - GKE Autopilot and GKE Standard. With GKE Standard, you get all the features of a managed Kubernetes service, while having the flexibility to fully control the nodes and fine tune your cluster.

GKE Autopilot, on the other hand, provides a more hands-free approach, where the scaling, monitoring, and configuration of the Kubernetes cluster infrastructure is handled by the service. Most importantly, it has several security features baked into it from the get-go.

What's different with GKE Autopilot?

Before getting into the details of how GKE Autopilot enables container security, here is a brief overview how GKE Autopilot is different from GKE Standard:

Hands-free managed Kubernetes service where the entire underlying infrastructure, including the control plane, nodes, and node pools, is managed by GKE
Designed for production workloads and optimized for resource utilization and cost
Node auto-provisioning and cluster autoscaling built-in by default and managed by the cluster (i.e. it provisions and scales resources as required based on Pod specification)
Built-in auto repair feature to monitor the nodes for failures and repair them automatically
Node auto-upgrade along with control plane to keep the Kubernetes clusters up-to-date
Nodes run only on container-optimized OS with ‘containerd’ as the container runtime
Out-of-the-box integration with Cloud Operations for GKE for visibility into your workloads
Pre-configured with HTTP load balancing as an add-on
Pay per pod pricing based on CPU, memory, and ephemeral storage usage of the pods
Autopilot pods deployed across multiple zones are backed by SLA (99.9%) in addition to cluster control plane

Security features of GKE Autopilot

In addition to the features mentioned above, GKE Autopilot packs a punch in terms of the security features pre-built into the service.

Workload Identity

Workload Identity is a secure option for workloads hosted in your GKE cluster to access Google Cloud services. It leverages the following identity constructs:

Kubernetes service accounts associated with processes that run within GKE pods and provide them an identity
Google Cloud Identity and Access Management (IAM) service accounts used to authorize application calls to Google Cloud APIs, such as Compute Engine, Cloud Storage, BigQuery, etc.

A workload identity pool is created for the project where the GKE cluster is deployed. A service account associated with a namespace can then be configured to use Workload Identity. The service account credentials are trusted by the IAM, as it is associated with an identity pool. Pods using the service account are authenticated as an IAM service account that has required IAM policy bindings configured to access the target Google Cloud APIs. The workload identity pool also enables applications in multiple clusters to access Google Cloud APIs to use the same IAM service account, which simplifies identity management for trusted clusters in a project.

Workload Identity is enabled by default for GKE Autopilot clusters. Two other options that enable your workloads to access Google Cloud APIs are to:

Mount service account keys as Kubernetes secrets, or
Use the default Compute Engine service account

With the first alternative option, there is a risk of attackers stealing the keys and using it to access other Google Cloud resources. These kinds of lateral attacks often go unnoticed for a long time and can cause significant damage before they are detected. Considering the second alternative option, Compute Engine service accounts have a risk of being shared with all workloads in a node and is not a security best practice. With these factors in mind, Workload Identity provides a more secure option for applications in GKE to make authorized API calls to access other Google Cloud services.

Learn more about configuring Workload Identity for your GKE clusters here, and how to apply security best practices for identity and access management here.

Shielded Nodes

Shielded VMs in Google Cloud provide protection from root/kernel-level malware or rootkits and ensure integrity of VMs with features like Secure Boot, virtual trusted platform module (vTPM), and integrity monitoring. Shielded GKE nodes are built on top of Shielded VMs.

Shielded GKE nodes thwart the popular attack vector of exploiting pod vulnerabilities to impersonate nodes by gaining access to node bootstrap credentials. With Shielded GKE nodes, the control plane verifies the integrity of the cluster nodes and kubelet, and ensures that it is part of the respective Managed Instance Group (MIG) for the GKE cluster and hosted in a Google data center. The Shielded GKE node feature is enabled by default for all Autopilot clusters and is impossible to disable manually.

Secure Boot

The Secure Boot process uses a signature verification process for boot components so that only certified software is executed when the system boots up. The UEFI firmware is used in the backend to verify the keys used for signing the software. Unsigned components are not allowed to run.

Secure Boot is pre-configured in GKE Autopilot clusters. It prevents the execution of unsigned third-party modulus during boot time of the nodes.

Container runtime

While Standard GKE clusters can use multiple images, like Ubuntu, Windows Server, or Container-Optimized OS with “containerd” or Docker runtime, GKE Autopilot is pre-configured to use Container-optimized OS with “containerd” as runtime. Compared to Docker runtime, containerd runtime is more secure. It includes advanced features like gVisor that prevents exploitation of kernel-level vulnerabilities.

Automated node upgrades

The GKE team upgrades the GKE cluster control plane automatically to the latest stable version of Kubernetes. Nodes in GKE Autopilot clusters are also upgraded automatically, along with the control plane, to ensure that they are running the same version of Kubernetes. Along with eliminating the overhead of manually upgrading your nodes, node auto-upgrades also strengthens security by ensuring that any security fixes are applied without delay.

Note that GKE Standard also has node auto-upgrades enabled by default. However, the setting is configurable and node auto-upgrades can be disabled. With GKE Autopilot, node auto-upgrade is enabled by default and cannot be overridden manually.

For additional best practices, refer to the Google Cloud Architecture Framework documentation on implementing compute and container security.

Summary

Among the different tenants of the Google Cloud Architecture Framework, security, privacy and compliance is perhaps the most important one for organizations. Security breaches are costly - repercussions are not only in financial penalties and data loss, but more importantly, in the loss of customer trust.

The out-of-the-box security features of GKE Autopilot helps you deploy a production-ready hosting environment for your containerized workloads. In addition to the features we discussed here, the security of GKE workloads can be further bolstered by optional features such as customer-managed encryption keys, role-based access control (RBAC) configuration and application-layer secrets encryption.

If you have any questions about container security or the security features of GKE, please feel free to leave a comment below.