An error handling pattern for Apigee proxies

ozanseymen · ‎05-05-2016

I wanted to document a pattern that I've been using since Apigee introduced the concept of "DefaultFaultRule" (too long ago - apologies I have been lazy). I have been explaining this to many Apigeeks so it is used in many Apigee projects but I don't believe it is properly documented anywhere.

The reasons why I use this pattern are:

centralised error handling
prevents code duplication/repeat
simple and leaner FaultRule definitions
easy "catch-all" error handling

The Problem

In conventional error handling implementation, we tend to put all handler logic within FaultRule element - this has the advantage of creating a self contained error handling logic for a specific error condition.

<FaultRule name="Expired Access Token">
   <Condition>(fault.name = "access_token_expired")</Condition>
   <Step>
      <Name>ServiceCallout.LogError</Name>
   </Step>
   <Step>
      <Name>RaiseFault.ExpiredAccessToken</Name>
   </Step>
</FaultRule>

Above FaultRule handles access token expiry error scenario by logging the error and returning a clean error response specific to access token expiry scenario. Let's assume the RaiseFault policy looks like this:

<RaiseFault name="RaiseFault.ExpiredAccessToken">
   <FaultResponse>
      <Set>
         <Headers>
            <Header name="Content-Type">application/json</Header>
         </Headers>
         <Payload contentType="application/json">{ 
  "code": "400", 
  "message": "Access token has expired", 
  "info": "https://developers.myapi.com/e400.01" 
}</Payload>
         <StatusCode>400</StatusCode>
         <ReasonPhrase>Bad Request</ReasonPhrase>
      </Set>
   </FaultResponse>
   <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
</RaiseFault>

However as the number of FaultRules increase, we will need to duplicate these policies and make minor modifications in them to handle different type of errors, e.g. invalid access token error.

 <FaultRule name="Expired Access Token">
   <Condition>(fault.name = "access_token_expired")</Condition>
   <Step>
      <Name>ServiceCallout.LogError</Name>
   </Step>
   <Step>
      <Name>RaiseFault.ExpiredAccessToken</Name>
   </Step>
 </FaultRule>
 <FaultRule name="Invalid Access Token">
   <Condition>(fault.name = "invalid_access_token")</Condition>
   <Step>
      <Name>ServiceCallout.LogError</Name>
   </Step>
   <Step>
      <Name>RaiseFault.InvalidAccessToken</Name>
   </Step>
 </FaultRule>

ServiceCallout.LogError policy has been referenced one more time for "invalid access token" scenario so we can log that error type. I also need to copy/paste RaiseFault policy and change the message to say "Access token is invalid".

If in the future I want to change the structure of the error responses, I will need to change all RaiseFault policies one by one. I can get rid of these unnecessary duplications by reusing both policies for all error scenarios. I can do that by templating both policies to use variables to fill in actual data. I also want to refer those policies once in proxy definition rather than referencing them in every FaultRule.

This is where DefaultFaultRule come in. Its official definition is "A default fault rule acts an exception handler for any error that is not explicitly handled by another fault rule". I would also translate this as "A default fault rule acts as a common FaultRule which is executed if no other FaultRule has executed RaiseFault policy". Read more about this here: http://docs.apigee.com/api-services/content/fault-handling#creatingfaultrules-definingthecustomerror...

The Solution

So here is the refactored error handling logic using DefaultFaultRule construct:

<FaultRules>
   <FaultRule name="Expired Access Token">
      <Condition>(fault.name = "access_token_expired)</Condition>
      <Step>
         <Name>AssignMessage.SetExpiredAccessTokenErrorVariables</Name>
      </Step>
   </FaultRule>
   <FaultRule name="Invalid Access Token">
      <Condition>(fault.name = "invalid_access_token")</Condition>
      <Step>
         <Name>AssignMessage.SetInvalidAccessTokenErrorVariables</Name>
      </Step>
   </FaultRule>
</FaultRules>


<DefaultFaultRule name="all">
   <AlwaysEnforce>true</AlwaysEnforce>

   <Step>
      <Condition>(flow.myapi.error.code = null)</Condition>
      <Name>AssignMessage.SetInternalServerErrorVariables</Name>
   </Step>
   <Step>
      <Name>ServiceCallout.LogError</Name>
   </Step>
   <Step>
      <Name>RaiseFault.Json</Name>
   </Step>
</DefaultFaultRule>

In above refactored snippet, FaultRule elements are very lean - they are only responsible for setting data relevant to that particular error scenario. Those variables will then get used by the common policies under DefaultFaultRule.

Here is an example of AssignMessage.SetExpiredAccessTokenErrorVariables policy:

<AssignMessage name="AssignMessage.SetExpiredAccessTokenErrorVariables">
   <AssignVariable>
      <Name>flow.myapi.error.code</Name>
      <Value>400</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.message</Name>
      <Value>access token has expired</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.info</Name>
      <Value>https://developers.myapi.com</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.status</Name>
      <Value>400</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.reason</Name>
      <Value>Bad Request</Value>
   </AssignVariable>
</AssignMessage>

So these variables can then be used by the actual RaiseFault policies that is common to all error type:

<RaiseFault name="RaiseFault.Json">
   <FaultResponse>
      <Set>
         <Headers>
            <Header name="Content-Type">application/json</Header>
         </Headers>
         <Payload contentType="application/json">{ 
  "code": "{flow.myapi.error.code}",
  "message": "{flow.myapi.error.message}",
  "info": "{flow.myapi.error.info}"
}</Payload>
         <StatusCode>{flow.myapi.error.status}</StatusCode>
         <ReasonPhrase>{flow.myapi.error.reason}</ReasonPhrase>
      </Set>
   </FaultResponse>
   <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
</RaiseFault>

RaiseFault.Json is responsible for defining the structure of the json error response while the data comes from the individual FaultRule.

Catching Unhandled Errors

When an error is thrown from any Apigee policy or custom code that is not handled by any of the FaultRule conditions, Apigee will start executing the policies under DefaultFaultRule. In this situation we would like to return a stock 500 response to the consuming apps instead of the default Apigee response so that we are consistent in our error responses for all cases.

This is handled by the following Step definition in DefaultFaultRules:

<Step>
    <Condition>(flow.myapi.error.code = null)</Condition>
    <Name>AssignMessage.SetInternalServerErrorVariables</Name>
</Step>

Which basically says "if one of the variables that should have set for this error is null, assume this error is not handled by any FaultRules and set the variables to some stock 500 response".

Here is a sample implementation of AssignMessage.SetInternalServerErrorVariables policy:

<AssignMessage name="AssignMessage.SetUnhandledErrorVariables">
   <AssignVariable>
      <Name>flow.myapi.error.code</Name>
      <Value>500</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.message</Name>
      <Value>internal server error</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.info</Name>
      <Value>https://developers.myapi.com</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.status</Name>
      <Value>500</Value>
   </AssignVariable>
   <AssignVariable>
      <Name>flow.myapi.error.reason</Name>
      <Value>Internal Server Error</Value>
   </AssignVariable>
</AssignMessage>

Don't forget that if multiple fault rules have a condition that evaluates to true, then the last of those fault rules executes.

Errors in Custom Code

How can we utilise DefaultFaultRule logic when we are handling errors in custom code, e.g. JS? All we need to do is to set variables that we need and force Apigee to halt flow processing and go straight to FaultRules:

if (...) {

    context.setVariable('flow.myapi.error.message', 'xyz parameter should be boo');
    ... set other error variables similar to AssignMessage policies

    throw new Error(); //halt current execution flow
}

When this error is thrown in JS, Apigee will start executing FaultRules. None of the conditions will match which is a good thing as we have already set all variables we need. DefaultFaultRule will be executed to perform logic necessary.

If you find it inconsistent to define the error message in your custom code rather than in AssignMessage policies, you can set variables indicating type of the error and other parameters in your custom code and allow another policy in FaultRules to define the format of the error message. Don't forget that the main point here is to let all errors flow through to FaultRules and a single policy within DefaultFaultRules to package everything in an HTTP response.

Possible Improvements:

An AssignMessage policy that overrides and sets a new HTTP response can also be used instead of RaiseFault policy (as commented by @Dino below). You don't need to use RaiseFault in a FaultRule... the flow is already in Fault.
If you are logging errors to an HTTP endpoint, consider using async JavaScript instead of ServiceCallout policy. ServiceCallout will do a synchronous HTTP call to the log endpoint which is not needed here. You will get better performance out of async HTTP calls to the log servers.
If you can use syslog for pushing log messages to the log servers or file (private cloud), consider using MessageLogging policy within PostClientFlow.

jonesfloyd · ‎05-06-2016

Ozan, this is awesome. I've linked to it from the Fault Handling topic in the docs. Thanks for putting this together!

ozanseymen · ‎05-10-2016

the honour is all mine. Thanks @Floyd Jones.

maivizhi_arunag · ‎05-22-2016

Awasome document @oseymen@apigee.com!

Well organised error Handling! B/w can we have a single JS to handle all errors(instead of multiple assign message policies)? Which is the Best/effective approach?

Example:

<FaultRules>
   <FaultRule name="Expired Access Token">
      <Condition>(fault.name = "access_token_expired)</Condition>
      <Step>
         <Name>AssignMessage.SetExpiredAccessTokenErrorVariables</Name>
      </Step>
   </FaultRule>
   <FaultRule name="Invalid Access Token">
      <Condition>(fault.name = "invalid_access_token")</Condition>
      <Step>
         <Name>AssignMessage.SetInvalidAccessTokenErrorVariables</Name>
      </Step>
   </FaultRule>
</FaultRules>


<DefaultFaultRule name="all">
   <AlwaysEnforce>true</AlwaysEnforce>

   <Step>
      <Name>ServiceCallout.LogError</Name>
   </Step>
   <Step>
      <Name>RaiseFault.Json</Name>
   </Step>
</DefaultFaultRule>

instead can we go with

<DefaultFaultRule name="fault-rule">
        <Step>
            <Name>JS.setError</Name>
        </Step>
        <Step>
            <Name>AM.assignError</Name>
        </Step>
        <AlwaysEnforce>false</AlwaysEnforce>
 </DefaultFaultRule>

where Js.setError will be

var faultName = context.getVariable ("fault.name");

if("access_token_expired".equalsIgnoreCase(faultName))
     {
        <set Error>
        
     }
     else if("invalid_access_token".equalsIgnoreCase(faultNamm))
     {
       <set Error>
        
     }
    else
    {
       <set UnHandledError>
        
    }

and AM.assignError will be my template to raise error.

Can you please suggest the best way of handling it?

ozanseymen · ‎05-23-2016

Sure @maivizhi - technically that would also work but you might end up with a big JS code there. So I guess it comes down to personal preference. But I really wanted to highlight the importance of DefaultFaultRule element to create a single point where we handle errors and you got that right!

maivizhi_arunag · ‎05-23-2016

Thanks @oseymen@apigee.com.Yeah i got the use of DefaultFaultRule.I would prefer to go with the approach which will improve the performance like

Which approach will reduce the processing time - going with multiple assign Message or Single JS to handle multiple error case?

Can you please suggest the best approach to reduce the processing time?

Thanks

Maivizhi A

ozanseymen · ‎05-23-2016

hi @maivizhi - Apigee out of the box policies are faster than custom JS code because Apigee interprets JS using Rhino. However I haven't compared the performance of setting a couple of variables in JS vs OOB policies.

DChiesa · ‎06-20-2016

Is raising a fault necessary in the DefaultFaultRule ?

I think the flow is already in Fault, therefore there is no need to use RaiseFault. One could use AssignMessage and set the payload, correct?

Report Inappropriate Content · ‎06-21-2016

I liked the approach using JS regardless of Performance perspective. Still do not know JS recommended though it is avoid number of AM policies.

I tried to implement the above your approach using JS through DefaultFaultRule and validated Oauth2 policy. The Result is not consistent or alternative success.

For Example : i have my Oauth policy in proxy endpoint Preflow. When access the proxy with invalid Access token or expired access token, (1st time ) the control is not entering into Error flow and not accessed DefaultFaultRule. But, 2nd time it is entering to DefaultFaultRule and reads from JS and returns Custom Fault Message. Again 3rd time same as 1st time and so on.

I tried this method a standalone proxy instead of a proxy which i have other logic. Any thoughts where could be problem ?

Here 1st attempt is not entering into Error flow. So, I get Apigee Standard format Json.

{"fault": {
   "faultstring": "Invalid Access Token",
   "detail": {"errorcode": "keymanagement.service.invalid_access_token"}
}}

Here 2nd attempt is entering to Error flow as expected and Error returns in custom format.

{"Error": {"Msg": {
   "Code": "401",
   "Text": "Invalid Access Token",
   "Type": "Error",
   "Severity": "High"
}}}

What could be the reason ? Thanks !!

ozanseymen · ‎06-22-2016

@Kumaresan Sithambaram - I am sure something else is going wrong there as I can't think of a logical reason for why you get a different response the first time for the same code:

Is it possible that you are running two message processors and deployment failed to update one of them? If this is the case, due to load balancing, 1 of 2 requests will hit the updated message processor and return a different response.

I'd check the deployment first and redeploy the bundle if necessary.

ozanseymen · ‎06-22-2016

"Need" is a strong word there. You can achieve the same result with AssignMessage or custom code.

I can think of two ways of looking at this problem:

You are handling the error caught during the flow execution and raising a new fault from scratch targeted to the client at that point. In this case it makes sense to use RaiseFault.
You are modifying the fault that is already raised by your policies or target connection. In this case, it makes sense to use AssignMessage.

In practice it doesn't really matter which policy you choose for this particular case.

Report Inappropriate Content · ‎06-22-2016

@oseymen@apigee.com, Earlier I deployed 2 different revision into 2 different environments. But, I un-deployed both and deployed the revision which have this use-case and tried. No Luck. I have same issue. do you think any other logical issue ? Earlier i thought could be some issue with my proxy which have other funcational too. So, i created separate proxy with no target. But, even that proxy have same problem. Thanks for your help !!

ozanseymen · ‎06-22-2016

@Kumaresan Sithambaram - different environments within the same org might still use the same message processors. Are you using Apigee cloud or on-premises?

Can you let me know if 2,4,6... requests are ok but odd ones not?

Can you try adding a new RaiseFault policy as the first Step in PreFlow, deploy that and trace? Let's see if what will happen then for the same request

Report Inappropriate Content · ‎06-22-2016

@oseymen@apigee.com

I am using on-premises Edge v4.16.01.03

Yes, Odd(1,3,...) requests are causing this problem. Even(2,4,...) are good.

I have added a RaiseFault at Preflow as first policy, it is always reaching to my custom Error flow. Trace Screen shot below. Please let me know anymore information required. Thank you.

ozanseymen · ‎06-22-2016

OK - odd requests failing, even requests successful is pointing to one message processor in a cluster of 2 not getting updated with the code you are deploying.

Can you see your default fault rules executing successfully on all requests?

If yes, then please remove that RaiseFault from the Preflow and try another deployment but pay attention to deployment output (unless you are using Enterprise UI to do deployment). If you are still seeing 1/2 requests failing, contact Apigee support so you can troubleshoot which MP is unable to fetch the code.

Report Inappropriate Content · ‎06-22-2016

@oseymen@apigee.com My default fault rules Steps(<DefaultFaultRule name="Default-Fault-rule">) are executing only for Even# requests and control is not switching to Error flow for Odd# requests. Additionally I noticed this is happening only for OAuth2 Policy for "Invalid access token". I have other policies like Regular Expression Policy for Threat Protection which absolutely fine and all time Threat Detected, it enters into Error flow and called my JS to returning custom error message.

gargi_talukdar · ‎08-19-2016

Thanks @oseymen@apigee.com for sharing a well organized way to handle errors. I have one query regarding handling custom errors. If we have to handle a lot custom errors then will using the JS policy instead of RaiseFault policy reduce the performance? Which is better.. having a JS policy will lot of if..else block or having specific JS policy to handle each custom error (the number of JS policy will grow with the number of custom errors) or having a RaiseFault policy?

Using a Raise Fault policy, I will not be able to set those variables but can set the error message and skip the execution of the raise fault policy in Default Fault block.

ozanseymen · ‎08-19-2016

Hi @GargiTalukdar

I support the idea of a single RaiseFault policy in the proxy. This is the only place where you define the structure of the error response going back to client. This ensures two things:

Error response format is always consistent.
Easy maintenance, e.g. "CORS headers returned from error responses" feature can be implemented by modifying this file rather than 10.

This RaiseFault policy will contain variables to set message, info, response code, etc. Now for these, AssignMessage policy (one per error type), or a single JS policy with lots of if/else statements will do fine. I don't envisage too much performance hit with using JS in this scenario but I can see that file becoming a maintenance bottleneck very quickly if you have too many error conditions.

What I generally do in my projects is to catch each individual error type in FaultRules with a specific Condition and put a 4 line AssignMessage policy for each error type to set the variables to be then used by the RaiseFault policy. I get a lot of AssignMessage policies with this approach but it causes no harm to me and makes maintenance straightforward.

Hope this helps.

davidmehi · ‎03-14-2017

Thanks for posting this Ozan

I incorporated this technique into a "proxy template" that can be used as a starting point with other best practices build in.

https://github.com/davidmehi/edge-proxy-template

I also created an example where the error handling logic is separated out into a shared flow. This way, the same common error handling logic can be used by multiple proxies with minimal effort. The example is here

https://github.com/davidmehi/edge-shared-errorhandling-flow-example

omidt · ‎05-16-2017

+1 you beat me to it!

shompek2 · ‎09-21-2017

I've written an article that seeks to build on Ozan's pattern. While there are elements of this approach I like, and I fully agree with Ozan's summary "main point", I recommend some modifications to the approach to fit better with Apigee's native fault handling, which will reduce code duplication and improve maintenance.

Report Inappropriate Content · ‎01-28-2018

@Ozan Seymen, few years ago I used the same approach on Axway API Gateway and I cannot see any better way to handle errors in a centralized way as you describe here.

Now I'm working on Apigee and this post is exactly want I need.

Thanks for that...

mdunker · ‎10-10-2018

Can't believe I didn't see this before now, @ozanseymen. Great article -- I'm linking to this in the API Platform Learning Guide.

stevescheider · ‎06-11-2020

Item 2 - if you remove the Response element from the ServiceCallout, it will be a fire-and-forget call. See: https://community.apigee.com/questions/53829/service-callout-policy-for-logging.html

nikhildflow · ‎03-02-2021

Hi @ozanseymen/ @Dino,

Could you please help me with the below scenario:

I have a proxy where there are many service call-outs (6-8 sequential) And so there are many possibilities for custom errors like invalid payload field That does mean that there are many RF/AM/JS should be present.

AM - cannot be used as it wont switch the normal flow to error flow (I assume, help me if there are other)

JS - As call outs are sequential and are many, creating JS for callout can cause additional delay (Correct me if wrong)

(I have seen use of single JS suggested in default rule which also might be having this issue I guess)

RF - I choose this. But on choosing RF for raising custom errors ,there is an issue - on error occurrence , flow will be switched to error flow and reaches to "<DefaultFaultRule name="Default Always Runs"><AlwaysEnforce>true</AlwaysEnforce>" and so gives out error response as :

"{ "fault": { "faultstring": "Raising fault. Fault name : Raise-Fault-1", "detail": { "errorcode": "steps.raisefault.RaiseFault" } } }"

So I have added one condition in DefaultFaultRule

<Step><Name>RaiseFault.Json</Name><Condition>custom.error_message != null and (fault.name != "RaiseFault")</Condition></Step>

Is this method okay? are there any better way of implementation? Will this affect the logging as the error content sent for logging will be that of RF's->"steps.raisefault.RaiseFault"?

dchiesa1 · ‎03-02-2021

Hi, is it possible for you to ask a new question?

davormilutinovi · ‎11-01-2022

Great article. Just one comment, using RFs in error flow is marked as antipattern in https://docs.apigee.com/api-platform/antipatterns/raise-fault-conditions . Is this still valid recommendation?