Load Balancing & Fault Tolerance
With the rapid proliferation and decreasing prices of broadband solutions, more and more small and medium enterprises are opting for the use of multiple WAN links from various ISPs. The benefits include:
- Single link failure does not result in a total loss of internet connectivity, thus WAN reliability increases.
- Traffic can be evenly dispersed across multiple WAN links, resulting in increased efficiency and improved performance of bandwidth.
- Multiple WAN links for fault tolerance and load balancing has two advantages:
- The outbound traffic, i.e. traffic originating from LAN traveling outwards, can be load-balanced across multiple WAN links. This is Auto Routing. l Traffic from the WAN, i.e. traffic originating from WAN traveling towards the LAN, can be load-balanced across multiple WAN links. This is Multihoming.
Load Balancing Algorithms
Load balancing algorithm is one of the important components for achieving purpose of traffic load balancing via FortiWAN’s various services, such as Auto Routing, Multihoming, Tunnel Routing, Virtual Server and DNS Proxy. These services distribute inbound or outbound traffic over multiple resources (WAN links or internal servers) according to predefined policies, which consist of a load balancing algorithm and the participating resources. A Load balancing algorithm dynamically evaluates on the availability of the participants against factors such as weight, connections or traffic, and picks an appropriate one for the load balancing services assign traffic to. When traffic (sessions or packets) matches a filter rule or policy of a load balancing service, the corresponding algorithm (specified to the policy) determines the appropriate one from the specified resources for the service to handle the traffic. All the load balancing services detect and label the unavailable resources by their own mechanism, such as WAN link health detection (see WAN Link Health Detection). The algorithms will ignore the failed resources and work with the available ones.
The followings are the algorithms that FortiWAN provides for services Auto Routing, Multihoming, Tunnel Routing, Virtual Server and DNS Proxy.
|
Auto Routing |
Multihoming |
Tunnel Routing |
Virtual Server |
Proxy DNS |
Round-Robin |
O |
O |
O |
O |
O |
By Connection |
O |
|
|
O |
|
By Upstream |
O |
O |
O |
|
O |
By Downstream |
O |
O |
|
|
O |
By Total Traffic |
O |
O |
|
|
O |
See also
Outbound Load Balancing and Failover (Auto Routing)
Inbound Load Balancing and Failover (Multihoming)
Tunnel Routing
Virtual Server & Server Load Balancing DNS Proxy
Round Robin (weighted)
Weight Round Robin picks one of the participating resources in circular order according to the specified weights. Round Robin works without considering resource’s ability such as processing connections, available bandwidth and response time. In FortiWAN, algorithm Round Robin serves for Auto Routing, Multihoming, Tunnel Routing, Virtual Server and DNS Proxy (it is called By Weight in DNS Proxy). To create a load balancing policy with Round Robin, you need specify the participants (WAN links or internal servers) and assign the weight to each of them.
For example, if three WAN links (WAN1, WAN2 and WAN3) are defined in an Auto Routing policy with weight
3:1:2, Round Robin returns one of the three WAN links to Auto Routing in the order of WAN1, WAN1, WAN1, WAN2, WAN3, WAN3. So that Auto Routing can distribute sessions to WAN links in the order. If some of the participants get failed, Round Robin will ignore them and work with the rest participants. For example, if WAN2 goes to failure, then Round Robin return the WAN link to Auto Routing in the order of WAN1, WAN1, WAN1, WAN3, WAN3.
Round Robin works similarly for Multihoming, Tunnel Routing, Virtual Server and DNS Proxy. For the details of configuring a policy of a service, see the section relevant to each of them.
By Connection
By connection picks one of the participating resource (WAN links or internal servers) for Auto Routing and Virtual
Server, but the processes that By Connection works for Auto Routing and Virtual Server are totally different. For
Auto Routing, an idea of weighted Round Robin is involved in the By Connection algorithm. The goal of Auto Routing’s By Connection is to guarantee the number of connections being processed by each participating WAN link in a fixed weight. By Connection counts the number of connections running on each participating WAN link and picks one for a new-coming connection to keep the ration of connections running on the WAN links closely fixed after adding the new connection to the picked one. For example, there are three WAN links (WAN1, WAN2 and WAN3) are defined in an Auto Routing policy with weight 1:1:2. By Connection will respectively return WAN1, WAN2 and WAN3 to Auto Routing for the first three connections, if all the three WAN links are idle. So far, the count of connections running on WAN1, WAN2 and WAN3 goes to 1:1:1. To match the specified weight 1:1:2 of the policy, By Connection will return WAN3 for the forth connection. Next, By Connection returns WAN1 and WAN2 respectively for the fifth and sixth connections and so the count goes to 2:2:2. Obviously, By Connection will return WAN3 for the next two (seventh and eighth) connections, so that the count will be 2:2:4 which is in the ratio 1:1:2. Considering the two connections on WAN2 are closed (the counts become 2:0:4), By Connection must return WAN2 for the next two connections to keep the counts be in ratio 1:1:2. If some of the participants get failed, By Connection will ignore them and work with the rest participants. For example, if WAN2 goes to failure, By Connection will work by keeping the connection count on WAN1 and WAN3 in weight 1:2.
|
WAN1 |
WAN2 |
WAN3 |
Weight |
1 |
1 |
2 |
Connection 1 |
V |
|
|
Connection 2 |
|
V |
|
Connection 3 |
|
|
V |
Connection 4 |
|
|
V |
Connection counts |
1 |
1 |
2 |
Connection 5 |
V |
|
|
Connection 6 |
|
V |
|
Connection 7 |
|
|
V |
Connection 8 |
|
|
V |
Connection counts |
2 |
2 |
4 |
The two connections on WAN2 are closed. |
|
|
Connection counts 2 |
0 |
4 |
Connection 9 |
V |
|
Connection 10 |
V |
|
Connection counts 2 |
2 |
4 |
Connection 11 V |
|
|
Connection counts 3 |
2 |
4 |
WAN1 WAN2 WAN3 |
One of the connections on WAN2 and one of the connections on WAN4 are cloased. |
Connection counts 3 1 3 |
Connection 12 V |
Connection 13 V |
Connection 14 V |
Connection 15 V |
Connection 16 V |
Connection counts 3 3 6 |
As for Virtual Server, By connection treats service requests coming from the same source IP address as the same connection. The algorithm determine an internal server from server pool for incoming requests of a connection by hashing source IP address of the connection. The hash mechanism that By connection uses is the same as algorithm Hash (see section Hash later). Every internal server in the server pool has the same weight for By connection’s hash mechanism.
By Downstream Traffic
By Downstream Traffic picks one of the participating resources (WAN links) according to the weight mainly relevant to their data downloading availability. Each of the participating WAN links is weighted every three seconds by summing 80% available inbound bandwidth and 20% available outbound bandwidth up. For example, there is an Auto Routing policy with participants WAN1, WAN2 and WAN3. If, at some time, the available inbound bandwidth on WAN1, WAN2 and WAN3 is 4Mbps, 10Mbps and 6Mbps, and the available outbound bandwidth on WAN1, WAN2 and WAN3 is 8Mbps, 5Mbps and 20Mbps, the weight of each WAN link is so that calculated as:
WAN1: 0.8*(4/10) + 0.2*(8/20) = 0.4
WAN2: 0.8*(10/10) + 0.2*(5/20) = 0.85
WAN3: 0.8*(6/10) + 0.2*(20/20) = 0.68
Before the weights are updated next time , By Downstream Traffic returns one of the three WAN links for the load balancing policy in circular order with weight 40:85:68. Weights will be updated by calculating with real-time available bandwidth every three seconds. By Downstream Traffic serves for Auto Routing, Multihoming and DNS Proxy.
By Upstream Traffic
By Upstream Traffic serves Auto Routing, Multihoming, Tunnel Routing and DNS Proxy. However, the process that By Upstream Traffic works for Tunnel Routing is different from Auto Routing, Multihoming and DNS Proxy. For working with Auto Routing, Multihoming and DNS Proxy, By Upstream Traffic picks one of the participating resources (WAN links) according to the weight mainly relevant to their data uploading availability. Each of the participating WAN links is weighted every three seconds by summing 80% available outbound bandwidth and 20% available inbound bandwidth up. For the same example, there is an Auto Routing policy with participants
WAN1, WAN2 and WAN3. If, at some time, the available inbound bandwidth on WAN1, WAN2 and WAN3 is 4Mbps, 10Mbps and 6Mbps, and the available outbound bandwidth on WAN1, WAN2 and WAN3 is 8Mbps, 5Mbps and 20Mbps, the weight of each WAN link is so that calculated as:
WAN1: 0.8*(8/20) + 0.2*(4/10) = 0.4
WAN2: 0.8*(5/20) + 0.2*(10/10) = 0.4
WAN3: 0.8*(20/20) + 0.2*(6/10) = 0.92
Before the weights are updated next time , By Upstream Traffic returns one of the three WAN links for the load balancing policy in circular order with weight 40:40:92. Weights will be updated by calculating with real-time available bandwidth every three seconds.
As for working with Tunnel Routing, By Upstream Traffic divides the available uploading bandwidth of each participating WAN link by the number of GRE tunnel deployed on the WAN link, and picks one with the most available uploading bandwidth. For example, there is a Tunnel Routing Group consisting of three GRE tunnels deployed on WAN1, WAN2 and WAN3 respectively. Other Tunnel Routing Groups deploy 2 GRE tunnels on WAN1, 3 GRE tunnels on WAN2 and 1 GRE tunnel on WAN3. Totally, there are 3 tunnels on WAN1, 4 tunnels on
WAN2 and 2 tunnels on WAN3. If, at a time, the available uploading bandwidth of WAN1, WAN2 and WAN3 is 6Mbps, 20Mbps and 12Mbps, By Upstream Traffic will picks WAN3 for transferring packets matching this Tunnel Routing Group because:
WAN1: 6Mbps/3 = 2Mbps
WAN2: 20Mbps/4 = 5Mbps
WAN3: 12Mbps/2 = 6Mbps
By Upstream Traffic for Tunnel Routing is not a Round-Robin based algorithm, it always picks the resource with most available uploading bandwidth.
By Total Traffic
By Total Traffic serves Auto Routing, Multihoming and DNS Proxy. By Total Traffic picks one of the participating resources (WAN links) according to the weight evenly relevant to their data downloading and uploading availability. Each of the participating WAN links is weighted every three seconds by summing 50% available inbound bandwidth and 50% available outbound bandwidth up. For example, there is an Auto Routing policy with participants WAN1, WAN2 and WAN3. If, at some time, the available inbound bandwidth on WAN1, WAN2 and WAN3 is 4Mbps, 10Mbps and 6Mbps, and the available outbound bandwidth on WAN1, WAN2 and WAN3 is 8Mbps, 5Mbps and 20Mbps, the weight of each WAN link is so that calculated as:
WAN1: 0.5*(4/10) + 0.5*(8/20) = 0.4
WAN2: 0.5*(10/10) + 0.5*(5/20) = 0.625
WAN3: 0.5*(6/10) + 0.5*(20/20) = 0.8
Before the weights are updated next time , By Total Traffic returns one of the three WAN links for the load balancing policy in circular order with weight 400:625:800. Weights will be updated by calculating with real-time available bandwidth every three seconds.
Notices of By Upstream Traffic, By Downstream Traffic and By Total Traffic
What the available bandwidth that algorithms By Upstream, Downstream and Total Traffic using for Auto Routing and Multihoming will depend on how Bandwidth Management (see Bandwidth Management) is configured. Considering that a Bandwidth Management class limits the usage of maximum downloading and uploading bandwidth of a 20Mbps/10Mbps WAN link to 6Mbps and 3Mbps respectively. For traffic classified to this BM class, the available downloading and uploading bandwidth for algorithms By Upstream, Downstream and Total Traffic to evaluate this WAN link will never exceed the bandwidth limits 6Mbps/3Mbps, even if the WAN link is wholly idle.
Algorithms By Upstream, Downstream and Total Traffic measure the transmission ability of a WAN link only between the FortiWAN device and the gateway of its ISP network (last mile). The available bandwidth of a WAN link is measured on the network interface of the WAN link. Algorithms By Upstream, Downstream and Total Traffic do not guarantee transmission ability between the ISP network and destinations.
By Optimum Route
Relative to algorithms By Upstream, Downstream and Total Traffic , By Optimum Route evaluates a WAN link with not only its traffic loading but also the round-trip time (RTT) between FortiWAN and the destinations. The evaluation involves bandwidth usage of a WAN link and the RTT, which responses the network conditions closer to reality. For example a WAN link with the most available bandwidth might not be the best choice for data transferring to a destination, if it has the worst RTT. Conversely, the WAN link with fewer available bandwidth might be picked by Optimum Route if the RTT is good. By Optimum Route works for Auto Routing and Multihoming to mainly avoid the peering issue between ISP networks. Optimum Route works via various detections and measures. It requires to have the details configured first to make sure it works appropriately (See Optimum Route Detection).
By Response Time
By Response Time is only used by Virtual Server (see Virtual Server & Server Load Balancing) for distribute incoming service requests to internal servers to achieve server load balancing. By Response Time measures the response time of each internal server by sending a detection packets, and picks one server with the lowest response time for Virtual Server routes the matched requests to it.
By Static
By Static is only used by Multihoming for responding fixed IP addresses to DNS requests for an A/AAAA record without considering the traffic loading and connectivity state of each WAN link. By Static deprives Multihoming of inbound load balancing and WAN link failover; retrogrades it back to general DNS service. Note that the external clients will access to the responded IP addresses, and the accesses might be stuck or failed if the WAN link is congested or unavailable.
By Fixed
By Fixed is only used by Auto Routing for routing outbound traffic to a fixed WAN link without considering the traffic loading on the WAN link. Different from Multihoming’s By Static, By Fixed will not return the WAN link to Auto Routing if it is unavailable. It requires a fail-over policy (configured in a filter rule) to achieve WAN link failover when the fixed WAN link is failed. By Fixed deprives Auto Routing of outbound load balancing.
Hash
Hash is only used by Virtual Server for distribute incoming service requests to weighted internal servers to achieve server load balancing. The source IP addresses of a service request will be translated from dot-decimal address to a decimal value first. This value is then hashed by calculating the reminder of the division of the value by the sum of weights (modulo operation), and the reminder indicates the internal server that the service request should be directed to. For example, if there are three servers (serv1, serv2 and serv3) weighted with 1:2:3 in the server pool, requests that their IP addresses are congruent modulo 6 (sum of the servers’ weight:1+2+3) will be assigned to the same server according to the weights (reminder 0 indicates serv1, reminders 1 and 2 indicate serv2, reminders 3, 4 and 5 indicate serv3). The following table lists the examples how the hash function works for Virtual Server:
Source IP of request |
Decimal value |
Hash value (mod 6) |
Assigned server |
172.16.254.1 |
2886794753 |
5 |
serv3 |
172.16.254.2 |
2886794754 |
0 |
serv1 |
172.16.254.3 |
2886794755 |
1 |
serv2 |
172.16.254.4 |
2886794756 |
2 |
serv2 |
172.16.254.5 |
2886794757 |
3 |
serv3 |
172.16.254.6 |
2886794758 |
4 |
serv3 |
125.227.251.80 |
2112093008 |
2 |
serv2 |
125.227.251.88 |
2112093016 |
4 |
serv3 |
125.227.251.96 |
2112093024 |
0 |
serv1 |