Why should you move from self-hosted Airflow to Google Cloud Composer?

devashishpatil_0-1672301570050.png

What is Apache Airflow?

Apache Airflow is an open-source workflow orchestration and scheduling platform. Airflow is extensible, pluggable and highly scalable. It is developed purely in Python and therefore can be as versatile as the language itself.

Airflow has several components like a web-server, scheduler, workers and a database. To learn more about Apache Airflow, go through the documentation here.

 

What is Google Cloud Composer?

Cloud Composer is Google’s fully managed workflow orchestration solution built on Apache Airflow. It has everything Airflow has to offer plus a lot more good stuff.

Cloud composer uses Google Kubernetes Engine underneath. This enables Airflow to take advantage of all the benefits of a containerised system and Kubernetes.

 

Why should you consider moving to Cloud Composer?

There are several reasons one might consider moving to a managed solution for Airflow. Let’s go through them one by one.

Ease of Use

Self-hosting Apache Airflow involves a lot of installation and configuration steps. There’s a lot of parameters to configure while setting up Airflow which makes it very easy to lose track of them.

Google Cloud Composer is a fully managed service, which means you don’t have to worry about installing, configuring, and maintaining the underlying infrastructure. This can save you a lot of time and effort.

Scalability

Each component of Airflow needs to be scaled independently to work properly. For a self hosted solution, all those operations need to be done by you or a dedicated team in your organisation.

Google Cloud Composer can automatically scale up and down based on the workload demands. This is very helpful when you have varying demands. You, as a user, will only have to take care of your workloads.

Integration with Google Cloud

Google Cloud Composer integrates very well with other GCP services such as BigQuery, Cloud Functions and IAM for Airflow role based access control. This makes it very easy to build and deploy complex workflows.

Cost

Running Apache Airflow on your own infrastructure can be expensive, especially if you have large and complex workloads. By moving to Google Cloud Composer, you can take advantage of Google’s economies of scale and pay only for what you use.

 

Few other things to know:

  • Since Cloud Composer is built on the open-source Apache Airflow, there’s no vendor lock-in and you can move your workloads elsewhere anytime.
  • It is easy to have a hybrid or multi-cloud setup
  • If you are already considering moving to a managed solution from Apache Airflow, Cloud composer gives you the ability to lift and shift your workloads easily and there will be no developer retraining involved.

Overall, moving from a self-hosted Airflow to Google Cloud Composer can be a good solution if you want to simplify the management of the workloads, reduce operational overhead and take advantage of the benefits of a fully managed service.

0 2 1,319
2 REPLIES 2

JAG
Bronze 1
Bronze 1

Easy to say but the management of permissions is totally out of control. There's nothing easy about this environment.

I agree with what @devashishpatil  as I worked on both airflow , composer and migration from airflow to composer.

Here are some improvements I observed after migrating to google cloud composer from airflow.

1) Long running dags and parallel running of dags.

When we were using airflow we had a lot of issues during peak time like monthly process , quarterly load where Dags used to stuck or long running, but we observed improvement after moving to cloud composer.

2) Webserver performance

Say example we have around 50-100 Dag triggers at 12.30 PM then we used to have issue in triggering multiple dags or multiple instance for same dags for different file system , which is again improved due to autoscaling in GKE which host cloud composer environment.