Enable Apigee Service Callout retries - Page 2

Wayn23 · 02-24-2025 11:23 AM

I have an Apigee ServiceCallout that is occurring inside a shared flow. I am receiving occasional ( 1 in 1000) EOF errors in that callout. My best guess is the target server has stale connections. A retry would likely work on the second call. I have created a second callout if the first one fails and it is working, however, I cannot find a way of logging the reason why the first call fails. I am looking for advice

Is there a way of making a ServiceCallout and capturing connection error conditions? Would Javascript enable more error handling?
Can I change the TargetServer or Connection properties to maybe alleviate the root cause of the EOF?
Can an enhancement be made to Apigee so the "ContinueOnError=true" would capture the "fault" information like the Error Flow would have?

Here is an example of the Error that occurs that I managed to capture in a trace.

id": "Error",
          "results": [
            {
              "actionResult": "DebugInfo",
              "accessList": [],
              "timestamp": "21-02-25 16:43:20:355",
              "properties": {
                "properties": [
                  {
                    "name": "error.cause",
                    "value": "eof unexpected",
                    "rowID": "___row188"
                  },
                  {
                    "name": "error.class",
                    "value": "com.apigee.kernel.exceptions.spi.UncheckedException",
                    "rowID": "___row189"
                  },
                  {
                    "name": "state",
                    "value": "PROXY_REQ_FLOW",
                    "rowID": "___row190"
                  },
                  {
                    "name": "type",
                    "value": "ErrorPoint",
                    "rowID": "___row191"
                  },
                  {
                    "name": "error",
                    "value": "Execution of ServiceCallout scp_callExternalService failed. Reason: eof unexpected",
                    "rowID": "___row192"
                  }
...

In order to make the second call, I set the "continueOnError" to true so that the error flow is not started and therefore no retries could be done. However, the only flow variable set is the "servicecallout.xxx.failed"=true. The "fault" and "error" variable are only available in the Error flow. Therefore I just have the equivalent of the ambiguous "Something has gone wrong".

Here is the simple prototype policies I am working with

...
<Step>
  <Name>scp_callExternalService</Name>
</Step>
<Step>
  <Name>jsp-processServiceResponse1</Name>
</Step>
<Step>
<Name>scp_callExternalService</Name>
  <Condition>servicecallout.scp_callExternalService.failed == true</Condition>
</Step>
<Step>
  <Name>jsp-processServiceResponse2</Name>
</Step>
...

Here is the policy config.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ServiceCallout async="false" continueOnError="true" enabled="true" name="scp_callExternalService">
  <DisplayName>scp_callExternalService</DisplayName>
  <Request variable="serviceRequest">
    <Set>
      <Verb>POST</Verb>
      <Path>/access-management/v1/authorization</Path>
    </Set>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
  </Request>
  <Response>serviceResponse</Response>
  <Timeout>30000</Timeout>
  <HTTPTargetConnection>
    <Properties>
      <Property name="success.codes">1xx, 2xx, 3xx, 4xx</Property>
    </Properties>
    <LoadBalancer>
      <Algorithm>RoundRobin</Algorithm>
      <Server name="rest-cluster-a"/>
      <Server name="rest-cluster-b"/>
      <MaxFailures>5</MaxFailures>
      <ServerUnhealthyResponse>
        <ResponseCode>500</ResponseCode>
        <ResponseCode>502</ResponseCode>
        <ResponseCode>503</ResponseCode>
      </ServerUnhealthyResponse>
      <RetryEnabled>true</RetryEnabled>
    </LoadBalancer>
    <HealthMonitor>
      <IsEnabled>true</IsEnabled>
      <TCPMonitor>
        <ConnectTimeoutInSec>10</ConnectTimeoutInSec>
      </TCPMonitor>
      <IntervalInSec>60</IntervalInSec>
    </HealthMonitor>
  </HTTPTargetConnection>
</ServiceCallout>

I have a health check in place and I believe the default for keep-alives in 60 seconds which should keep any connection pool connections alive.
The unhealthy server and success codes are not in relevant for this error as it is happening on the connection with the target server, not a response from the target server. I have referenced https://cloud.google.com/apigee/docs/api-platform/deploy/load-balancing-across-backend-servers#setti...