You've encountered a common issue with GKE Autopilot clusters automatically created by Cloud Composer 2: overly restrictive Pod Disruption Budgets (PDBs). Let's break down the problem and what you can do.
Understanding the Issue
- Pod Disruption Budgets (PDBs):
- PDBs are Kubernetes objects that limit the number of pods that can be voluntarily disrupted at any one time.
- They're designed to ensure application availability during planned disruptions, such as node maintenance or updates.
- Composer 2 and GKE Autopilot:
- When you create a Composer 2 environment, it automatically provisions a GKE Autopilot cluster to run Airflow tasks.
- The default PDBs created by Composer 2 are often overly restrictive, especially in Autopilot mode, which can hinder GKE maintenance.
- GKE Maintenance:
- GKE needs to perform periodic maintenance, including node updates and security patches.
- If PDBs are too restrictive, GKE cannot evict pods to perform maintenance, leading to the warning you're seeing.
- The Problem:
- The main problem is that the PDBs that are created have a value of 0 for maxUnavailable, which prevents GKE from evicting any pods.
Why Google Suggests Ignoring (Temporarily)
- Google's recommendation to "ignore these warnings until the issue is fixed" likely stems from the fact that:
- They are aware of this behavior in Composer 2's Autopilot setups.
- They are working on a more permanent solution to automatically adjust PDBs or improve the default configuration.
- That changing the PDBs manually, could cause unexpected issues with the composer environment.
How to Resolve (With Caution)
While Google's advice is to wait, you can manually adjust the PDBs. However, proceed with caution and understand the potential risks.
Identify the PDBs:
- Use kubectl get pdb -n <your-composer-namespace> to list the PDBs in your Composer environment's namespace.
- The namespace will be something like composer-<your-environment-name>.
Edit the PDBs:
- Use kubectl edit pdb -n <your-composer-namespace> <pdb-name> to edit each PDB.
- Change the maxUnavailable or minAvailable values to allow for at least one pod eviction.
- For example, change maxUnavailable: 0 to maxUnavailable: 1.
- Or, change minAvailable: 1 to minAvailable: 0.
- It is generally safer to modify the maxUnavailable value.
- It is also possible to patch the pdb, by using kubectl patch.
- Example:
- kubectl patch pdb airflow-redis-pdb -n composer-<your-environment-name> -p '{"spec":{"maxUnavailable": 1}}'
Monitor GKE Maintenance:
- After adjusting the PDBs, monitor your GKE cluster to ensure that maintenance operations can proceed without issues.
- Keep an eye on the GKE console for any new warnings or errors.
Important Considerations
- Composer Updates:
- Be aware that future Composer environment updates might overwrite your manual PDB changes.
- You might need to reapply the changes after updates.
- Application Availability:
- Adjusting PDBs can affect the availability of your Airflow components.
- Ensure that your applications can tolerate the potential disruptions during maintenance.