I wanted to document a pattern that I've been using since Apigee introduced the concept of "DefaultFaultRule" (too long ago - apologies I have been lazy). I have been explaining this to many Apigeeks so it is used in many Apigee projects but I don't believe it is properly documented anywhere.
The reasons why I use this pattern are:
In conventional error handling implementation, we tend to put all handler logic within FaultRule element - this has the advantage of creating a self contained error handling logic for a specific error condition.
<FaultRule name="Expired Access Token"> <Condition>(fault.name = "access_token_expired")</Condition> <Step> <Name>ServiceCallout.LogError</Name> </Step> <Step> <Name>RaiseFault.ExpiredAccessToken</Name> </Step> </FaultRule>
Above FaultRule handles access token expiry error scenario by logging the error and returning a clean error response specific to access token expiry scenario. Let's assume the RaiseFault policy looks like this:
<RaiseFault name="RaiseFault.ExpiredAccessToken"> <FaultResponse> <Set> <Headers> <Header name="Content-Type">application/json</Header> </Headers> <Payload contentType="application/json">{ "code": "400", "message": "Access token has expired", "info": "https://developers.myapi.com/e400.01" }</Payload> <StatusCode>400</StatusCode> <ReasonPhrase>Bad Request</ReasonPhrase> </Set> </FaultResponse> <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables> </RaiseFault>
However as the number of FaultRules increase, we will need to duplicate these policies and make minor modifications in them to handle different type of errors, e.g. invalid access token error.
<FaultRule name="Expired Access Token"> <Condition>(fault.name = "access_token_expired")</Condition> <Step> <Name>ServiceCallout.LogError</Name> </Step> <Step> <Name>RaiseFault.ExpiredAccessToken</Name> </Step> </FaultRule> <FaultRule name="Invalid Access Token"> <Condition>(fault.name = "invalid_access_token")</Condition> <Step> <Name>ServiceCallout.LogError</Name> </Step> <Step> <Name>RaiseFault.InvalidAccessToken</Name> </Step> </FaultRule>
ServiceCallout.LogError policy has been referenced one more time for "invalid access token" scenario so we can log that error type. I also need to copy/paste RaiseFault policy and change the message to say "Access token is invalid".
If in the future I want to change the structure of the error responses, I will need to change all RaiseFault policies one by one. I can get rid of these unnecessary duplications by reusing both policies for all error scenarios. I can do that by templating both policies to use variables to fill in actual data. I also want to refer those policies once in proxy definition rather than referencing them in every FaultRule.
This is where DefaultFaultRule come in. Its official definition is "A default fault rule acts an exception handler for any error that is not explicitly handled by another fault rule". I would also translate this as "A default fault rule acts as a common FaultRule which is executed if no other FaultRule has executed RaiseFault policy". Read more about this here: http://docs.apigee.com/api-services/content/fault-handling#creatingfaultrules-definingthecustomerror...
So here is the refactored error handling logic using DefaultFaultRule construct:
<FaultRules> <FaultRule name="Expired Access Token"> <Condition>(fault.name = "access_token_expired)</Condition> <Step> <Name>AssignMessage.SetExpiredAccessTokenErrorVariables</Name> </Step> </FaultRule> <FaultRule name="Invalid Access Token"> <Condition>(fault.name = "invalid_access_token")</Condition> <Step> <Name>AssignMessage.SetInvalidAccessTokenErrorVariables</Name> </Step> </FaultRule> </FaultRules> <DefaultFaultRule name="all"> <AlwaysEnforce>true</AlwaysEnforce> <Step> <Condition>(flow.myapi.error.code = null)</Condition> <Name>AssignMessage.SetInternalServerErrorVariables</Name> </Step> <Step> <Name>ServiceCallout.LogError</Name> </Step> <Step> <Name>RaiseFault.Json</Name> </Step> </DefaultFaultRule>
In above refactored snippet, FaultRule elements are very lean - they are only responsible for setting data relevant to that particular error scenario. Those variables will then get used by the common policies under DefaultFaultRule.
Here is an example of AssignMessage.SetExpiredAccessTokenErrorVariables policy:
<AssignMessage name="AssignMessage.SetExpiredAccessTokenErrorVariables"> <AssignVariable> <Name>flow.myapi.error.code</Name> <Value>400</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.message</Name> <Value>access token has expired</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.info</Name> <Value>https://developers.myapi.com</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.status</Name> <Value>400</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.reason</Name> <Value>Bad Request</Value> </AssignVariable> </AssignMessage>
So these variables can then be used by the actual RaiseFault policies that is common to all error type:
<RaiseFault name="RaiseFault.Json"> <FaultResponse> <Set> <Headers> <Header name="Content-Type">application/json</Header> </Headers> <Payload contentType="application/json">{ "code": "{flow.myapi.error.code}", "message": "{flow.myapi.error.message}", "info": "{flow.myapi.error.info}" }</Payload> <StatusCode>{flow.myapi.error.status}</StatusCode> <ReasonPhrase>{flow.myapi.error.reason}</ReasonPhrase> </Set> </FaultResponse> <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables> </RaiseFault>
RaiseFault.Json is responsible for defining the structure of the json error response while the data comes from the individual FaultRule.
Catching Unhandled Errors
When an error is thrown from any Apigee policy or custom code that is not handled by any of the FaultRule conditions, Apigee will start executing the policies under DefaultFaultRule. In this situation we would like to return a stock 500 response to the consuming apps instead of the default Apigee response so that we are consistent in our error responses for all cases.
This is handled by the following Step definition in DefaultFaultRules:
<Step> <Condition>(flow.myapi.error.code = null)</Condition> <Name>AssignMessage.SetInternalServerErrorVariables</Name> </Step>
Which basically says "if one of the variables that should have set for this error is null, assume this error is not handled by any FaultRules and set the variables to some stock 500 response".
Here is a sample implementation of AssignMessage.SetInternalServerErrorVariables policy:
<AssignMessage name="AssignMessage.SetUnhandledErrorVariables"> <AssignVariable> <Name>flow.myapi.error.code</Name> <Value>500</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.message</Name> <Value>internal server error</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.info</Name> <Value>https://developers.myapi.com</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.status</Name> <Value>500</Value> </AssignVariable> <AssignVariable> <Name>flow.myapi.error.reason</Name> <Value>Internal Server Error</Value> </AssignVariable> </AssignMessage>
Don't forget that if multiple fault rules have a condition that evaluates to true, then the last of those fault rules executes.
Errors in Custom Code
How can we utilise DefaultFaultRule logic when we are handling errors in custom code, e.g. JS? All we need to do is to set variables that we need and force Apigee to halt flow processing and go straight to FaultRules:
if (...) { context.setVariable('flow.myapi.error.message', 'xyz parameter should be boo'); ... set other error variables similar to AssignMessage policies throw new Error(); //halt current execution flow }
When this error is thrown in JS, Apigee will start executing FaultRules. None of the conditions will match which is a good thing as we have already set all variables we need. DefaultFaultRule will be executed to perform logic necessary.
If you find it inconsistent to define the error message in your custom code rather than in AssignMessage policies, you can set variables indicating type of the error and other parameters in your custom code and allow another policy in FaultRules to define the format of the error message. Don't forget that the main point here is to let all errors flow through to FaultRules and a single policy within DefaultFaultRules to package everything in an HTTP response.
Ozan, this is awesome. I've linked to it from the Fault Handling topic in the docs. Thanks for putting this together!
the honour is all mine. Thanks @Floyd Jones.
Awasome document @oseymen@apigee.com!
Well organised error Handling! B/w can we have a single JS to handle all errors(instead of multiple assign message policies)? Which is the Best/effective approach?
Example:
<FaultRules> <FaultRule name="Expired Access Token"> <Condition>(fault.name = "access_token_expired)</Condition> <Step> <Name>AssignMessage.SetExpiredAccessTokenErrorVariables</Name> </Step> </FaultRule> <FaultRule name="Invalid Access Token"> <Condition>(fault.name = "invalid_access_token")</Condition> <Step> <Name>AssignMessage.SetInvalidAccessTokenErrorVariables</Name> </Step> </FaultRule> </FaultRules> <DefaultFaultRule name="all"> <AlwaysEnforce>true</AlwaysEnforce> <Step> <Name>ServiceCallout.LogError</Name> </Step> <Step> <Name>RaiseFault.Json</Name> </Step> </DefaultFaultRule>
instead can we go with
<DefaultFaultRule name="fault-rule"> <Step> <Name>JS.setError</Name> </Step> <Step> <Name>AM.assignError</Name> </Step> <AlwaysEnforce>false</AlwaysEnforce> </DefaultFaultRule>
where Js.setError will be
var faultName = context.getVariable ("fault.name"); if("access_token_expired".equalsIgnoreCase(faultName)) { <set Error> } else if("invalid_access_token".equalsIgnoreCase(faultNamm)) { <set Error> } else { <set UnHandledError> }
and AM.assignError will be my template to raise error.
Can you please suggest the best way of handling it?
Sure @maivizhi - technically that would also work but you might end up with a big JS code there. So I guess it comes down to personal preference. But I really wanted to highlight the importance of DefaultFaultRule element to create a single point where we handle errors and you got that right!
Thanks @oseymen@apigee.com.Yeah i got the use of DefaultFaultRule.I would prefer to go with the approach which will improve the performance like
Can you please suggest the best approach to reduce the processing time?
Thanks
Maivizhi A
hi @maivizhi - Apigee out of the box policies are faster than custom JS code because Apigee interprets JS using Rhino. However I haven't compared the performance of setting a couple of variables in JS vs OOB policies.
Is raising a fault necessary in the DefaultFaultRule ?
I think the flow is already in Fault, therefore there is no need to use RaiseFault. One could use AssignMessage and set the payload, correct?
I liked the approach using JS regardless of Performance perspective. Still do not know JS recommended though it is avoid number of AM policies.
I tried to implement the above your approach using JS through DefaultFaultRule and validated Oauth2 policy. The Result is not consistent or alternative success.
For Example : i have my Oauth policy in proxy endpoint Preflow. When access the proxy with invalid Access token or expired access token, (1st time ) the control is not entering into Error flow and not accessed DefaultFaultRule. But, 2nd time it is entering to DefaultFaultRule and reads from JS and returns Custom Fault Message. Again 3rd time same as 1st time and so on.
I tried this method a standalone proxy instead of a proxy which i have other logic. Any thoughts where could be problem ?
Here 1st attempt is not entering into Error flow. So, I get Apigee Standard format Json.
{"fault": { "faultstring": "Invalid Access Token", "detail": {"errorcode": "keymanagement.service.invalid_access_token"} }}
Here 2nd attempt is entering to Error flow as expected and Error returns in custom format.
{"Error": {"Msg": { "Code": "401", "Text": "Invalid Access Token", "Type": "Error", "Severity": "High" }}}
What could be the reason ? Thanks !!
@Kumaresan Sithambaram - I am sure something else is going wrong there as I can't think of a logical reason for why you get a different response the first time for the same code:
Is it possible that you are running two message processors and deployment failed to update one of them? If this is the case, due to load balancing, 1 of 2 requests will hit the updated message processor and return a different response.
I'd check the deployment first and redeploy the bundle if necessary.
"Need" is a strong word there. You can achieve the same result with AssignMessage or custom code.
I can think of two ways of looking at this problem:
In practice it doesn't really matter which policy you choose for this particular case.
@oseymen@apigee.com, Earlier I deployed 2 different revision into 2 different environments. But, I un-deployed both and deployed the revision which have this use-case and tried. No Luck. I have same issue. do you think any other logical issue ? Earlier i thought could be some issue with my proxy which have other funcational too. So, i created separate proxy with no target. But, even that proxy have same problem. Thanks for your help !!
@Kumaresan Sithambaram - different environments within the same org might still use the same message processors. Are you using Apigee cloud or on-premises?
Can you let me know if 2,4,6... requests are ok but odd ones not?
Can you try adding a new RaiseFault policy as the first Step in PreFlow, deploy that and trace? Let's see if what will happen then for the same request
I am using on-premises Edge v4.16.01.03
Yes, Odd(1,3,...) requests are causing this problem. Even(2,4,...) are good.
I have added a RaiseFault at Preflow as first policy, it is always reaching to my custom Error flow. Trace Screen shot below. Please let me know anymore information required. Thank you.
OK - odd requests failing, even requests successful is pointing to one message processor in a cluster of 2 not getting updated with the code you are deploying.
Can you see your default fault rules executing successfully on all requests?
If yes, then please remove that RaiseFault from the Preflow and try another deployment but pay attention to deployment output (unless you are using Enterprise UI to do deployment). If you are still seeing 1/2 requests failing, contact Apigee support so you can troubleshoot which MP is unable to fetch the code.
@oseymen@apigee.com My default fault rules Steps(<DefaultFaultRule name="Default-Fault-rule">) are executing only for Even# requests and control is not switching to Error flow for Odd# requests. Additionally I noticed this is happening only for OAuth2 Policy for "Invalid access token". I have other policies like Regular Expression Policy for Threat Protection which absolutely fine and all time Threat Detected, it enters into Error flow and called my JS to returning custom error message.
Thanks @oseymen@apigee.com for sharing a well organized way to handle errors. I have one query regarding handling custom errors. If we have to handle a lot custom errors then will using the JS policy instead of RaiseFault policy reduce the performance? Which is better.. having a JS policy will lot of if..else block or having specific JS policy to handle each custom error (the number of JS policy will grow with the number of custom errors) or having a RaiseFault policy?
Using a Raise Fault policy, I will not be able to set those variables but can set the error message and skip the execution of the raise fault policy in Default Fault block.
I support the idea of a single RaiseFault policy in the proxy. This is the only place where you define the structure of the error response going back to client. This ensures two things:
This RaiseFault policy will contain variables to set message, info, response code, etc. Now for these, AssignMessage policy (one per error type), or a single JS policy with lots of if/else statements will do fine. I don't envisage too much performance hit with using JS in this scenario but I can see that file becoming a maintenance bottleneck very quickly if you have too many error conditions.
What I generally do in my projects is to catch each individual error type in FaultRules with a specific Condition and put a 4 line AssignMessage policy for each error type to set the variables to be then used by the RaiseFault policy. I get a lot of AssignMessage policies with this approach but it causes no harm to me and makes maintenance straightforward.
Hope this helps.
Thanks for posting this Ozan
I incorporated this technique into a "proxy template" that can be used as a starting point with other best practices build in.
https://github.com/davidmehi/edge-proxy-template
I also created an example where the error handling logic is separated out into a shared flow. This way, the same common error handling logic can be used by multiple proxies with minimal effort. The example is here
https://github.com/davidmehi/edge-shared-errorhandling-flow-example
+1 you beat me to it!
I've written an article that seeks to build on Ozan's pattern. While there are elements of this approach I like, and I fully agree with Ozan's summary "main point", I recommend some modifications to the approach to fit better with Apigee's native fault handling, which will reduce code duplication and improve maintenance.
@Ozan Seymen, few years ago I used the same approach on Axway API Gateway and I cannot see any better way to handle errors in a centralized way as you describe here.
Now I'm working on Apigee and this post is exactly want I need.
Thanks for that...
Can't believe I didn't see this before now, @ozanseymen. Great article -- I'm linking to this in the API Platform Learning Guide.
Item 2 - if you remove the Response element from the ServiceCallout, it will be a fire-and-forget call. See: https://community.apigee.com/questions/53829/service-callout-policy-for-logging.html
Hi @ozanseymen/ @Dino,
Could you please help me with the below scenario:
I have a proxy where there are many service call-outs (6-8 sequential) And so there are many possibilities for custom errors like invalid payload field That does mean that there are many RF/AM/JS should be present.
AM - cannot be used as it wont switch the normal flow to error flow (I assume, help me if there are other)
JS - As call outs are sequential and are many, creating JS for callout can cause additional delay (Correct me if wrong)
(I have seen use of single JS suggested in default rule which also might be having this issue I guess)
RF - I choose this. But on choosing RF for raising custom errors ,there is an issue - on error occurrence , flow will be switched to error flow and reaches to "<DefaultFaultRule name="Default Always Runs"><AlwaysEnforce>true</AlwaysEnforce>" and so gives out error response as :
"{ "fault": { "faultstring": "Raising fault. Fault name : Raise-Fault-1", "detail": { "errorcode": "steps.raisefault.RaiseFault" } } }"
So I have added one condition in DefaultFaultRule
<Step><Name>RaiseFault.Json</Name><Condition>custom.error_message != null and (fault.name != "RaiseFault")</Condition></Step>
Is this method okay? are there any better way of implementation? Will this affect the logging as the error content sent for logging will be that of RF's->"steps.raisefault.RaiseFault"?
Great article. Just one comment, using RFs in error flow is marked as antipattern in https://docs.apigee.com/api-platform/antipatterns/raise-fault-conditions . Is this still valid recommendation?