Apigee hybrid and X - The sweet spot of multi-tenancy for large enterprises

ncardace · ‎02-25-2022

Apigee hybrid and X - The sweet spot of multi-tenancy for large enterprises

The aim of this article is to introduce the reader to some of the core design principles behind the design of Apigee hybrid and X that permits their flexible adoption by large multinational enterprises. Our customers come very often forward with a common pattern of legal, procedural and compliance requirements that require translation into architectural, security and topological decisions.

Very often the enterprises and their executives that approach an external (to public internet or to business partners) or internal (to corporate functions) API exposure consult our solutions architects with a blend of unique requirements and also often common patterns that are peculiar of their industry, the regulations they are subject to and also specific trade bodies they need to comply with.

Apigee hybrid and Apigee X runtime plane deployment options have been designed using models that allow for the final deployments to be generally compliant with these requirements, both from physical and logical point of view.

We often start new customer interactions by educating the solutions architects on the definition of these logical models; this allows us to plan the best target deployment models and choose the ones that correspond to the API exposure models.

The aim of this article is to provide a navigation guidance to the reader not yet familiar to the peculiar abstraction approach, providing a walk through the official documentation and the definition of these few required concepts. Read them in sequence as each concept also relies on the previous ones.

What is an Apigee organization ?

An Apigee organization (not to be confused with a GCP organization, the root node in the Google Cloud resource hierarchy) is the entity with the highest level of abstraction in the Apigee API gateway. In the context of Apigee, unless otherwise stated, we will always be referring to an Apigee organization, where for brevity we mention "organization" or "org" in its shortened form. It contains all the API proxies and related resources within it, the definitions of API products and their corresponding configurations, quotas and attributes, plus all the API monetization plans and rate cards, where applicable. Organization's user and role management, from the highest privilege of organization administrators to less privileged accounts are managed through dedicated Cloud IAM Apigee roles.

https://cloud.google.com/apigee/docs/api-platform/fundamentals/organization-structure

What is an Apigee instance ?

With the term "instance" we refer to each single Kubernetes cluster that hosts the Apigee runtime components, registered to the Google Cloud Apigee Management Plane. The key concept is that a single Apigee Organization (hence all the API's within it) could be designed to be multi-regional for business reasons, compliance requirements, operational needs or many other reasons. Apigee hybrid and X allow multi-regional deployments to be handled seamlessly by a single unified logical control plane (always hosted in Google Cloud).

This topology does not need to be cast in stone, in other words, the topology itself can be grown over time to include one or more cloud regions (this means having a hybrid multi-regional or an X multi-regional deployment), more cloud providers (hybrid) and even a mix of cloud providers and private data centers (hybrid, also leveraging hybrid connectivity). For the avoidance of doubt you can't mix hybrid regions with X regions and manage them at the same time with the same set of control plane credentials. In other words, once you decide to select a full-SaaS deployment model with Apigee X, you won't be able to control the corresponding cluster configurations as Google Cloud won't operate and monitor these clusters in your projects but in segregated projects not accessible from your enterprise. You can either create multi-regional hybrid or X clusters, but not a mix of the two. As an enterprise you can still have your APIs deployed across both Apigee products but as a result the two platforms will be hosted in two distinct organizations and won't share any runtime data nor API traffic flows.

One fundamental principle to remember is that Apigee X provides managed runtime planes that are fully transparent to the operators in terms of software stack management, monitoring and capacity planning. Both Apigee X and Apigee hybrid allow a customer to start with a deployment in one region and expand at a later point in time to multiple regions (X) and to multiple cloud providers and data centers (hybrid). This is the first degree of freedom. This means that, where needed, the same APIs can be exposed to different networks, different network segments, different geographies and even different cloud providers and physical infrastructure owned by the customer while still managing the API management lifecycle from the same control plane that oversees the growing collection of runtime planes over time.

What is an environment ?

We have comprehensive documentation to describe what an environment is, their overall architectural role down to implementation details, together with a dedicated resource that explains how to work with one or even multiple environments and why the product allows handling multiple environments. This is one of the first key aspects of multi-tenancy.

You will find a first overview of the environments in this document; this additional article goes into more details and explains how to work with environments.

What is an environment group ?

Now that you have gained familiarity with the definition of environments, the next Apigee-specific construct, which is present in both hybrid and X is the concept of environment groups. In addition to what the name implies, i.e. allow you to group multiple environments (note that one environment can belong at the same time to multiple environments, when needed), the environment groups are the components that carry the association with the virtual host exposure of the Apigee organizations and have the fundamental role of allowing the API exposure to the external of the Apigee instance (or instances) they have been associated with.

Working with environment groups is described in detail here. For a technical deep dive of how an API call is actually handled once it lands onto an Apigee instance, you can refer to this resource. The option to define multiple virtual hosts to one environment group is allowed and is another dimension available for you to consider when designing multi-tenancy within an individual Apigee instance, no matter if hybrid or X.

What is a region ? How does it differ from an Apigee instance ?

A cloud region pretty much is equivalent to the concept of a data center with some degree of generalization. In reality, the concept of zone or availability zone is more closely corresponding to an actual data center. Independently from the target deployment infrastructure and corresponding provider. Kubernetes clusters can either be regional or zonal, according to their resources being taken from a whole region, hence multiple zones, or a single zone. Hence a zone is the minimum failure domain in a given region. Apigee hybrid and X runtime clusters can stretch to multiple cloud regions; for X these regions can only be on GCP, while for hybrid there is no such restriction. Each single Apigee instance is deployed in only one cloud region. Each Kubernetes cluster cannot stretch on resources existing in multiple cloud regions; it is possible to increase the number of cloud regions (by having an Apigee instance in each) from one to multiple ones.

A definition and key points of multi-tenancy

The following definitions help to clarify to what extent Apigee implements multi-tenancy.

Organizations:

Can contain multiple environment groups
Must have at least one environment group

Environments:

Must be in at least one environment group
Can be in more than one group
Share hostnames with all other environments in the same group

Environment Groups:

Can have multiple hostnames
Contain one or more environments
Hostnames assigned to a group must be unique to that group (they cannot be used by other groups)

The following diagrams provide guidance on how to achieve a desired level of multi-tenancy (with a set of total limits) across multiple business units of the same enterprise once we have familiarized ourselves with the key concepts.

The above logical partitioning shows that:

Production and non-production traffic can be segregated into separate clusters, belonging to different organizations and potentially in multiple regions. API traffic belonging to two different organizations are handled by completely separate clusters; this is often a requirement for compliance in many industries.
Environments can be attached to runtime plane components that are deployed in a subset of regions or across all the regions where a given organization has been created. In the diagram below, the example given shows that one can create a set of three sandbox environments that are present in only one region, even if the organization has been deployed to two regions in the non-production organization.
As an environment is related to a single runtime nodepool in the corresponding Kubernetes cluster, this also allows to effectively partition API proxies into separate environments and guarantees that each environment can be configured with a different set of pod sizing and auto scaling properties, both vertical and horizontal.

We document many common ways in which environments are structured within environment groups.

The above diagram allows to focus on a few key points, relevant to the design of the right multi-tenancy approach:

An environment group holds one or multiple environments
At least one environment group is necessary to expose APIs from each runtime cluster and this is what one or more virtual hosts are associated with to guarantee this exposure.
An environment group can be exposed in a subset of regions or all the regions in which the organization has active runtimes.
Different organizations can have a completely different mapping logic and number of environments and environment groups, provided that they do not exceed the published limits.
Multi-tenancy can effectively leverage this mechanism. This is dynamic. New organizations can be added, new environments can be added to existing organizations and then also added to existing or even newly created additional environment groups.
It is a matter of choice how many separate production and non-production organizations are needed to map the current and future enterprise needs. Once this is chosen, the number of environment groups generally correspond to the different (network) exposure domains; one environment group can still carry multiple FQDN's.

Decision dimensions and multi-tenancy boundaries

There are four dimensions to take into account when designing for multi-tenancy.

Product license limits
Product limits
Permissions, RBAC management at the desired granularity levels
Cost considerations

Product license limits

Each Apigee X license comes with a certain negotiated allowance in the total number of organizations and environments. Each Apigee Enterprise license (or Enterprise plus license) has an entitlement for up to a maximum number of organizations that can be created (no matter how many regions and independent clusters; if more are needed, we have expansion packs, too) and it is bound by a set of published limits with regards to how many separate environments and other resources can be created within them. The licenses also include an allowance on the total volumes of API calls. These limits control the deployment options and in general are negotiated once the enterprise has an initial understanding of the API exposure footprint and they are usually extended over time once the footprint is expanded with the API programme grows its reach, the number of business units involved and the API traffic volumes, a growth that usually reflects the size of the adoption and the utilization. Since the call volume limits are controlled from the Google Cloud control plane, these can be adjusted and updated at short notice.

Product limits

We document a set of product limits that at times control the choice of API exposure patterns. They can be found in our public documentation here. We encourage all the limits to be followed as both general guidance, best practices and additional constraints, even the ones that are marked as "not currently enforced"; some limits are checked during operations and are strict. Those are meant to guarantee fair usage, protect from abuse and are often relevant in guiding the best deployment choices.

The two most important limits that will guide you in deciding the deployment strategy and the total number or environments that will need to be licensed are:

Section "Environment and Organization"
Section "API proxies"

Permissions

Both Apigee hybrid and X benefit from a close integration of the Role Based Access control within Google Cloud, Cloud Identity and Access Management (Cloud IAM) and these have been designed to be granular. This is possible since the control plane for both products is embedded into Google Cloud. Detailed documentation can be accessed here; access can be managed both from the Cloud UI and via Management APIs, as described in the documentation. The highest degree of granular control resides in the Apigee Organization and their Environments.

Note that Because an environment is considered a resource underneath the Cloud project, permissions set on the Cloud project apply to all environments unless you specify more fine-grained permissions on the environment.

Note that permissions set on the Cloud project apply to all environments unless more fine-grained permissions at the environment level are specified. The permission model for the access to the individual API proxies by the corresponding development team is the same for the manipulation via the management APIs (apigee.googleapis.com) and the Apigee Management portal (https://apigee.google.com). The two behaviors cannot be split to let the permissions be enforced differently.

The two most widely adopted approaches are the following:

One organization per Business Unit (BU): each BU developers team has control and access over just their own organization(s).

One organization per SDLC phase: each API developer can view all API Proxies but he/she can deploy it only to the environments they have been granted access to (additional permissions are needed to deploy).

Monitoring considerations

While the Apigee X runtimes won't belong to customer's projects as they are fully managed within Google's own projects, every other network component and additional GCP service, for example Cloud Logging, are still part of the customer's projects.

In case a customer has deployed multiple Apigee hybrid and/or hybrid organizations, a useful feature can be used, that can simplify the way the metrics corresponding by multiple projects can be accessed from the same console in one unified place. Metrics from multiple projects can be accessed by following the instructions provided in this document.

Cost considerations

Cost consists of two components: Apigee licensing costs and everything else, i.e. the ones not related to the Apigee licensing, but more broadly to the platform utilization.

When assessing the full TCO (total cost of ownership) neither of the two can be ignored.

Apigee license costs are driven by the total number of organizations (note that for multi-regional organizations, each region adds one unit), environments (similarly, if an environment is spanning to more than one region, it is also counted more than only once) and the desired API traffic runtime SLA, which effectively defines how many regions the solution will be running concurrently in a active-active fashion (for higher SLAs, minimum two regions will be needed).

In addition to the Apigee licenses, cost management considers the following additional contributions.

GCP egress traffic (where applicable)
GCP additional networking components
GCP additional security components (from Marketplace or with a BYOL model)
Additional ancillary GCP services priced separately (example: Cloud Logging)
Additional security services (example: Cloud Armor)
Other cloud providers' infrastructure, solutions, storage and networking costs (for hybrid and multi-cloud).

Therefore in addition to the GCP cost calculator, one should also explore the advanced configurations available for budgeting and the corresponding spending thresholds alerts and automatic cost control capabilities.

Migrating organizations

As part of the cloud migration journey or simply during network planning or servicing, it is sometimes necessary to relocate and redeploy enterprise services, backend systems and also API gateway instances.

We provide a handy utility tool that can perform a set of operations like create a report of all resources in a specific org, export all resources for a specific org, list all the deployments of an org and an environment, clean up all resources for an org. This utility is a wrapper for the Apigee Maven that simplifies Apigee deployments by eliminating the need for you to create and maintain pom files; it supports proxy deployments to Apigee Edge, hybrid and X products. This means that it is the right tool when executing a migration from hybrid to X (or vice-versa).

Developer portal considerations

The vanilla deployment models for the Apigee developer portal, in all possible deployment options (described here, here, and here) assume that only one Apigee Organization can be linked to an instance of Developer portal (note that this can still be deployed in a highly available, multi-regional topology). Customers have anyway customized the Developer portal technology stack to allow for the management of multiple Organizations in the same instances of DevPortal. Multi-tenancy of the Developer portal stack can still be achieved within a single organization, by leveraging multiple environments and the full RBAC model of the Drupal based developer portal stack (described here and here); this feature is offered by additional Drupal modules available directly from the Drupal modules repository. The topic of multi-tenancy in the Apigee developer portals will be covered more extensively in a subsequent dedicated article.

Denis_KALITVI · ‎02-28-2022

Amazing job mate!

Aravind · ‎01-20-2024

Helpful, Thanks mate!