I have an Apigee ServiceCallout that is occurring inside a shared flow. I am receiving occasional ( 1 in 1000) EOF errors in that callout. My best guess is the target server has stale connections. A retry would likely work on the second call. I have created a second callout if the first one fails and it is working, however, I cannot find a way of logging the reason why the first call fails. I am looking for advice
Here is an example of the Error that occurs that I managed to capture in a trace.
id": "Error",
"results": [
{
"actionResult": "DebugInfo",
"accessList": [],
"timestamp": "21-02-25 16:43:20:355",
"properties": {
"properties": [
{
"name": "error.cause",
"value": "eof unexpected",
"rowID": "___row188"
},
{
"name": "error.class",
"value": "com.apigee.kernel.exceptions.spi.UncheckedException",
"rowID": "___row189"
},
{
"name": "state",
"value": "PROXY_REQ_FLOW",
"rowID": "___row190"
},
{
"name": "type",
"value": "ErrorPoint",
"rowID": "___row191"
},
{
"name": "error",
"value": "Execution of ServiceCallout scp_callExternalService failed. Reason: eof unexpected",
"rowID": "___row192"
}
...
In order to make the second call, I set the "continueOnError" to true so that the error flow is not started and therefore no retries could be done. However, the only flow variable set is the "servicecallout.xxx.failed"=true. The "fault" and "error" variable are only available in the Error flow. Therefore I just have the equivalent of the ambiguous "Something has gone wrong".
Here is the simple prototype policies I am working with
...
<Step>
<Name>scp_callExternalService</Name>
</Step>
<Step>
<Name>jsp-processServiceResponse1</Name>
</Step>
<Step>
<Name>scp_callExternalService</Name>
<Condition>servicecallout.scp_callExternalService.failed == true</Condition>
</Step>
<Step>
<Name>jsp-processServiceResponse2</Name>
</Step>
...
Here is the policy config.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ServiceCallout async="false" continueOnError="true" enabled="true" name="scp_callExternalService">
<DisplayName>scp_callExternalService</DisplayName>
<Request variable="serviceRequest">
<Set>
<Verb>POST</Verb>
<Path>/access-management/v1/authorization</Path>
</Set>
<IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
</Request>
<Response>serviceResponse</Response>
<Timeout>30000</Timeout>
<HTTPTargetConnection>
<Properties>
<Property name="success.codes">1xx, 2xx, 3xx, 4xx</Property>
</Properties>
<LoadBalancer>
<Algorithm>RoundRobin</Algorithm>
<Server name="rest-cluster-a"/>
<Server name="rest-cluster-b"/>
<MaxFailures>5</MaxFailures>
<ServerUnhealthyResponse>
<ResponseCode>500</ResponseCode>
<ResponseCode>502</ResponseCode>
<ResponseCode>503</ResponseCode>
</ServerUnhealthyResponse>
<RetryEnabled>true</RetryEnabled>
</LoadBalancer>
<HealthMonitor>
<IsEnabled>true</IsEnabled>
<TCPMonitor>
<ConnectTimeoutInSec>10</ConnectTimeoutInSec>
</TCPMonitor>
<IntervalInSec>60</IntervalInSec>
</HealthMonitor>
</HTTPTargetConnection>
</ServiceCallout>
I don't have a good answer for you here. As you can see, when there is a non-response from the backend , you must resort to somewhat unnatural acts , or at least unusual acts, within Apigee in order to retry it from there. What I would suggest is, push the responsibility for retry outside of your Apigee proxy.
Separately, I would recommend: