More granular control for rolling out proxies to p...

strebel · ‎04-05-2022

In this short article we are going to explore how the Apigee environment and environment group constructs can be used to implement staged rollouts of proxy and shared flow deployments.

For a long time, Apigee’s active-active multi-cluster design has been used to ensure high availability and resilience in the case of partial unavailability of the runtime environment. The assignment of environments to regions is generally an architectural concern that is outside of the control of an API developer. API developers themselves usually apply automation tooling and different SDLC stages to propagate their API proxies up to production. The centralized control plane in Apigee also means that the API developers are not exposed to the multi-region or multi-cluster setup behind the environments and that deployments to an environment are automatically rolled out to all the instances of that environment on all clusters. From a consumption side, consumers then address the APIs using a single hostname and path. The multi-region nature of an environment is again transparent for them and can be changed without requiring any change on the API consumer’s side.

Whilst this makes reasoning about environments very easy from an API developer perspective, it reduces their control and flexibility over how the rollout of their deployments to the different instances is performed. This is important for some customers as they have internal processes that require them to perform staged rollouts of their deployments to production even if their releases were promoted through different SDLC stages before reaching production. This rollout should still be transparent to the API consumer and provide consumers with a single entrypoint as with a traditional active-active deployment topology. The goal of this article is to describe how both of these requirements can be met using the default Apigee constructs of environments and environment groups, together with a simple ISO/OSI Layer 7 rewrite logic.

Creating Blue and Green Apigee Environments

Apigee environments (as per the time of writing this article) allow a developer to specify a single revision of an API proxy in a deployment. Environments are of course gradually and seamlessly migrating traffic from one revision to the next but as a developer you cannot use the same environment for manually controlling deployment rollout of your proxy revisions. If you want to obtain more control over your deployment rollouts you could decide to split your logical environment into multiple sub-environments. For the purpose of this article we will refer to these environments as “blue” and “green”. These blue and green environments can be used in the classical sense of a blue and green deployment or more generically to put the API developer in control of the rollout of API proxy deployments.

Because we want to be able to use an immutable revision of an API proxy in one or both the blue and the green environments, the base path of that proxy cannot change depending on the environment. In order to address the two deployments of the proxy during deployment we will have to put the blue and the green environments in different environment groups. This is required because Apigee can only route to one environment for a specific base path within an environment group and because environment groups cannot use hostnames that are already used by another environment.

If we would stop here we would have achieved the first requirement to allow an API developer to perform a staged rollout. However the blue green environment abstraction is leaking to the consumers and they would have to make a decision which one to target.

How to abstract the Blue and Green Apigee Environments for API consumers

Whilst in the above example an API developer can selectively address a blue or green environment, consumers would have to know about this implementation-level detail in order to call the APIs. This isn’t particularly nice and creates an unnecessary tight coupling between the exposure and consumption side of the API. A better approach would be to provide a facade that picks the correct blue or green environment without letting the API consumer know about their existence.

In a previous blog post we took an in-depth look at the internal routing of Apigee hybrid and how Istio routing constructs are used to route requests to the right environment group (based on the host header) and environment (based on the base path). This understanding is important for building a facade that provides a unifying hostname for the underlying blue and green environments.

Particularly the host-based routing is important to understand why a DNS based approach cannot be used to send traffic from api.example.com to blue.api.example.com. Whilst you could use a CNAME DNS entry to define api.example.com as an alias for blue.api.example.com the hostname of api.example.com won’t match your environment groups so Apigee will return a 404 Not Found response.

To solve this, the incoming request needs to be rewritten to the hostnames of the blue or green environment group’s hostname before reaching the Apigee routing path.

Using Istio to rewrite the incoming API request

If you are running Apigee hybrid and your cluster also has Istio installed, you could also leverage the Istio constructs to rewrite the incoming request and send it to the correct environment. This way you do not have to maintain another routing component that sits in front of your runtime cluster.

We would create a new Gateway resource to accept traffic for api.example.com

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: toggle-gateway-apigee
  namespace: apigee
spec:
  selector:
    app: istio-ingressgateway
  servers:
  - hosts:
    - 'api.example.com'
    port:
      name: https-apigee-443
      number: 443
      protocol: HTTPS
    tls:
      credentialName: tls-hybrid-ingress
      mode: SIMPLE

And a VirtualService that is in charge of defining whether the incoming request should be rewritten as the blue or green hostname:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: blue-green-toggle
  namespace: apigee
spec:
  gateways:
  - toggle-gateway-apigee
  hosts:
  - api.example.com
  http:
  - route:
    - destination:
        host: blue.api.example.com
    rewrite:
      authority: blue.api.example.com

Because the internal Apigee routing constructs are automatically configured by Apigee and the operator of the Apigee platform only controls the hostnames for the environment groups, we create a dedicated service entry and destination rule for the blue environment:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: blue-api
  namespace: apigee
spec:
  hosts:
  - blue.api.example.com
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: DNS
  location: MESH_EXTERNAL
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: blue-dr
  namespace: apigee
spec:
  host: "blue.api.example.com"
  trafficPolicy:
    tls:
      mode: SIMPLE
      sni: "blue.api.example.com"

The snippet above can trivially be adapted to support the green environment on another cluster or in case anyone wants to use the virtual service to switch between blue and green within the same cluster.

For a two cluster setup, the final architecture for Apigee hybrid looks as follows:

The solid blue and green borders represent routing components that are automatically managed by Apigee hybrid. The other components represent the additional re-write components as described above.

Additional considerations when creating blue/green environments

As the blue and green environments are completely separate entities they also do not share any environment level resources like KVMs, Caches or target servers. This is in line with the initial goal of performing staged rollouts and putting API developers in control of which proxy revision is deployed where, but puts an additional operational burden on them if done manually like ensuring that both environments are added to an API product. As a general guidance here we definitely recommend automating the rollout to blue and green to reduce the risk of manual misconfigurations as much as possible.

From an analytics and monitoring point of view the blue and green environments will also be reported separately even though they correspond to the same logical environment. Depending on your reporting requirements the separate environment metrics would have to be merged to obtain a more complete picture.

Lastly, the additional blue and green environments are subject to the documented environments limits and will create additional compute resource requirements or require license entitlements.

Alternative Approaches

The topic of rolled out deployments has been discussed in the context of Apigee for a long time.

Alternative approaches that allow more granular traffic routing or randomized traffic splitting is described in this article. There is also an often cited reference of using an Apigee JavaScript policy to select backends based on a weight split that can be applied in this case as well. The main difference to the approach in this article is that the routing is done within a special Apigee proxy and that the blue and green proxies are logically different proxies that are required to have different base paths which usually requires a change in the proxy bundle. The advantage of doing a blue/green deployment within the same environment is that this pattern does not require additional environment entitlements (in Apigee X) or separate runtime replicas (in Apigee hybrid).