Re: Circuit Breaker is not working as intended

MohsenM · 10-31-2024 03:12 AM

Hello everyone,

I'm trying to implement a circuit breaker to one of our proxies.

The problem is that whenever target 1 returns an unhealthy response it automatically flows to target 2 even though I have set MaxFailures to 5.

Also, when target 2 flow is happening, target 1 is still being called and is never out of the rotation, so there are two flows being executed at the same time.

Is there any way to control when target 1 should be put back up in rotation?

code sample below:

<HTTPTargetConnection>
<LoadBalancer>
<Algorithm>RoundRobin</Algorithm>
<Server name="Target1"/>
<Server name="Target2">
<IsFallback>true</IsFallback>
</Server>
<MaxFailures>5</MaxFailures>
<ServerUnhealthyResponse>
<ResponseCode>500</ResponseCode>
<ResponseCode>502</ResponseCode>
<ResponseCode>503</ResponseCode>
</ServerUnhealthyResponse>
</LoadBalancer>
<HealthMonitor>
<IsEnabled>true</IsEnabled>
<IntervalInSec>5</IntervalInSec>
<HTTPMonitor>
<SuccessResponse>
<!--successfull status code -->
<ResponseCode>200</ResponseCode>
<ResponseCode>201</ResponseCode>
<ResponseCode>202</ResponseCode>
</SuccessResponse>
</HTTPMonitor>
</HealthMonitor>
<Path>/test</Path>
</HTTPTargetConnection>

AlexET

Hi @MohsenM! I’ve noticed your question hasn’t received any replies yet, but we’ll keep an eye on this conversation to help ensure it gets the attention it needs.

In the meantime, feel free to register to our upcoming no-cost virtual event on CI/CD for API development happening on November 14th—it’s a great chance to dive deeper into Apigee topics and connect with other community members.

👉 Register here for the event

vmartucci

Hello, you have two ways of creating a circuit breaker:

Option 1 (MaxFailures + ServerUnhealthyResponse)

In the following example, the target server will be removed from rotation after five failed requests including 404 and some 5XX responses from the target server. Apigee will automatically take the target server out of rotation when the first failure is detected. Apigee will check the health of the target server every five minutes and return it to the rotation when it responds normally. [Link]

<LoadBalancer>
        <Algorithm>RoundRobin</Algorithm>
        <Server name="target1" />
        <Server name="target2" />
        <MaxFailures>5</MaxFailures>
        <ServerUnhealthyResponse>
            <ResponseCode>404</ResponseCode>
            <ResponseCode>500</ResponseCode>
            <ResponseCode>502</ResponseCode>
            <ResponseCode>503</ResponseCode>
        </ServerUnhealthyResponse>
      </LoadBalancer>
      <Path>/test</Path>

Option 2 (MaxFailures and HealthMonitor)

In this option you have control on the logic to put back a server into rotation using HealthMonitor. A failed target server is automatically put back into rotation when the health monitor determines that the target server is active. In the following snippet, the server will be put into rotation once a successful response is received. [Link]

    …
</LoadBalancer>
<Path>/test</Path>
<HealthMonitor>
   <IsEnabled>true</IsEnabled>
   <IntervalInSec>5</IntervalInSec>
 <HTTPMonitor>
     <Request>
        <ConnectTimeoutInSec>10</ConnectTimeoutInSec>
        <SocketReadTimeoutInSec>30</SocketReadTimeoutInSec>
        <Verb>GET</Verb>
        <Path>/healthcheck</Path>
      </Request>
      <SuccessResponse>
          <ResponseCode>200</ResponseCode>
       <Header name="header1">OK</Header>
      </SuccessResponse>
  </HTTPMonitor>
</HealthMonitor>

I'd recommend choosing one of those patterns, also I can see you have Target Server 2 as Fallback, this means the load balancer will not use the fallback server until such time as all the other target servers have been removed from rotation by the load balancer. When this happens, all traffic is routed to the fallback server until such time as one of the other target servers reports as healthy again and is returned to rotation. [Link].

AlexET

Hey @MohsenM, if the response provided has addressed your question, please consider marking it as the accepted solution to help others in the community, we would very much appreciate it. A big thank you to @vmartucci for the comprehensive reply! 👍🏼