FortiGate 7000 Series High Availability

High Availability

FortiGate-7000 supports a variation of active-passive FortiGate Clustering Protocol (FGCP) high availability between two identical FortiGate-7000 chassis. With active-passive FortiGate-7000 HA, you create redundant network connections to two identical FortiGate-7000s and add redundant HA heartbeat connections. Then you configure the FIM interface modules for HA. A cluster forms and a primary chassis is selected.

Example FortiGate-7040

All traffic is processed by the primary (or master) chassis. The backup chassis operates in hot standby mode. The configuration, active sessions, routing information, and so on is synchronized to the backup chassis. If the primary chassis fails, traffic automatically fails over to the backup chassis.

The primary chassis is selected based on a number of criteria including the configured priority, the bandwidth, the number of FIM interface failures, and the number of FPM or FIM modules that have failed. As part of the HA configuration you assign each chassis a chassis ID and you can set the priority of each FIM interface module and configure module failure tolerances and the link failure thresholds.

Before you begin configuring HA

Before you begin, the chassis should be running the same FortiOS firmware version and interfaces should not be configured to get their addresses from DHCP or PPPoE. Register and apply licenses to the each FortiGate-7000 Connect the M1 and M2 interfaces for HA heartbeat communication

before setting up the HA cluster. This includes licensing for FortiCare, IPS, AntiVirus, Web Filtering, Mobile Malware, FortiClient, FortiCloud, and additional virtual domains (VDOMs). Both FortiGate-7000s in the cluster must have the same level of licensing for FortiGuard, FortiCloud, FortiClient, and VDOMs. FortiToken licenses can be added at any time because they are synchronized to all cluster members.

If required, you should configure split ports on the FIMs on both chassis before configuring HA because the modules have to reboot after split ports is configured. For example, to split the C1, C2, and C4 interfaces of an FIM-7910E in slot 1, enter the following command:

config system global set split-port 1-C1 2-C1 2-C4

end

After configuring split ports, the chassis reboots and the configuration is synchronized.

On each chassis, make sure configurations of the modules are synchronized before starting to configure HA. You can use the following command to verify that the configurations of all of the modules are synchronized: diagnose sys confsync chsum | grep all all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e

If the modules are synchronized, the checksums displayed should all be the same.

You can also use the following command to list the modules that are synchronized. The example output shows all four FIM modules have been configured for HA and added to the cluster.

diagnose sys configsync status | grep in_sync

Master, uptime=692224.19, priority=1, slot_1d=1:1, idx=0, flag=0x0, in_sync=1

Slave, uptime=676789.70, priority=2, slot_1d=1:2, idx=1, flag=0x0, in_sync=1

Slave, uptime=692222.01, priority=17, slot_1d=1:4, idx=2, flag=0x64, in_sync=1

Slave, uptime=692271.30, priority=16, slot_1d=1:3, idx=3, flag=0x64, in_sync=1 In this command output in_sync=1 means the module is synchronized with the primary unit and in_sync=0 means the module is not synchronized.

Connect the M1 and M2 interfaces for HA heartbeat communication

HA heartbeat communication between chassis happens over the 10Gbit M1 and M2 interfaces of the FIM modules in each chassis. To set up HA heartbeat connections:

l Connect the M1 interfaces of all FIM modules together using a switch. l Connect the M2 interfaces of all FIM modules together using another switch.

All of the M1 interfaces must be connected together with a switch and all of the M2 interfaces must be connected together with another switch. Connecting M1 interfaces or M2 interfaces directly is not supported as each FIM needs to communicate with all other FIMs.

Connect the M1 and M2 interfaces for HA heartbeat communication

Heartbeat packets are VLAN packets with VLAN ID 999 and ethertype 9890. The MTU value for the M1 and M2 interfaces is 1500. You can use the following commands to change the HA heartbeat packet VLAN ID and ethertype values if required for your switches. You must change these settings on each FIM interface module. By default the M1 and M2 interface heartbeat packets use the same VLAN IDs and ethertypes.

config system ha set hbdev-vlan-id <vlan> set hbdev-second-vlan-id <vlan> set ha-eth-type <eth-type> end

Using separate switches for M1 and M2 is recommended for redundancy. It is also recommended that these switches be dedicated to HA heartbeat communication and not used for other traffic.

If you use the same switch for both M1 and M2, separate the M1 and M2 traffic on the switch and set the heartbeat traffic on the M1 and M2 Interfaces to have different VLAN IDs. For example, use the following command to set the heartbeat traffic on M1 to use VLAN ID 777 and the heartbeat traffic on M2 to use VLAN ID 888:

config system ha

set hbdev-vlan-id 777

set hbdev-second-vlan-id 888   end

If you don’t set different VLAN IDs for the M1 and M2 heartbeat packets q-in-q must be enabled on the switch.

Sample switch configuration for a Cisco Catalyst switch. This configuration sets the interface speeds, configures the switch to allow vlan 999, and enables trunk mode:

##interface config interface TenGigabitEthernet1/0/5 description Chassis1 FIM1 M1 switchport trunk allowed vlan 999

switchport mode trunk

If you are using one switch for both M1 and M2 connections, the configuration would be the same except you would add q-in-q support and two different VLANs, one for M1 traffic and one for M2 traffic.

HA configuration

For the M1 connections:

interface Ethernet1/5 description QinQ Test switchportmode dot1q-tunnel switchport access vlan 888 spanning-tree port type edge

For the M2 connections:

interface Ethernet1/5 description QinQ Test switchport mode dot1q-tunnel switchport access vlan 880 spanning-tree port type edge

HA packets must have the configured VLAN tag (default 999). If the switch removes or changes this tag, HA heartbeat communication will not work and the cluster will form a split brain configuration. In effect two clusters will form, one in each chassis, and network traffic will be disrupted.

HA configuration

Use the following steps to setup the configuration for HA between two chassis (chassis 1 and chassis 2). These steps are written for a set of two FortiGate-7040E or 7060Es. The steps are similar for the FortiGate-7030E except that each FortiGate-7030E only has one FIM interface module.

Each FIM interface module has to be configured for HA separately. The HA configuration is not synchronized among FIMs. You can begin by setting up chassis 1 and setting up HA on both of the FIM interfaces modules in it. Then do the same for chassis 2.

Each of the FortiGate-7000s is assigned a chassis ID (1 and 2). These numbers just allow you to identify the chassis and do not influence primary unit selection.

Setting up HA on the FIM interface modules in the first FortiGate-7000 (chassis 1)

  1. Log into the CLI of the FIM interface module in slot 1 (FM01) and enter the following command:

config system ha set mode a-p set password <password> set group-id <id> set chassis-id 1 set hbdev M1/M2

end

This adds basic HA settings to this FIM interface module.

  1. Repeat this configuration on the FIM interface module in slot 2 (FIM02).

config system ha set mode a-p set password <password> set group-id <id> set chassis-id 1 set hbdev M1/M2

HA configuration

end

  1. From either FIM interface module, enter the following command to confirm that the FortiGate-7000 is in HA mode:

diagnose sys ha status

The password and group-id are unique for each HA cluster and must be the same on all FIM modules. If a cluster does not form, one of the first things to check are groupd-id and re-enter the password on both FIM interface modules.

Configure HA on the FIM interface modules in the second FortiGate-7000 (chassis 2)

  1. Repeat the same HA configuration settings on the FIM interfaces modules in the second chassis except set the chassis ID to 2.

config system ha set mode a-p set password <password> set group-id <id> set chassis-id 2 set hbdev M1/M2

end

  1. From any FIM interface module, enter the following command to confirm that the cluster has formed and all of the FIM modules have been added to it:

diagnose sys ha status

The cluster has now formed and you can add the configuration and connect network equipment and start operating the cluster. You can also modify the HA configuration depending on your requirements.

Verifying that the cluster is operating correctly

Enter the following CLI command to view the status of the cluster. You can enter this command from any module’s CLI. The HA members can be in a different order depending on the module CLI from which you enter the command.

If the cluster is operating properly the following command output should indicate the primary and backup (master and slave) chassis as well as primary and backup (master and slave) modules. For each module, the state portion of the output shows all the parameters used to select the primary FIM module. These parameters include the number FPM modules that the FIM module is connecting to that have failed, the status of any link aggregation group (LAG) interfaces in the configuration, the state of the interfaces in the FIM module, the traffic bandwidth score for the FIM module (the higher the traffic bandwidth score the more interfaces are connected to networks, and the status of the management links.

diagnose sys ha status

==========================================================================

Current slot: 1 Module SN: FIM04E3E16000085

Chassis HA mode: a-p

Chassis HA information:

[Debug_Zone HA information]

HA group member information: is_manage_master=1.

FG74E83E16000015: Slave, serialno_prio=1, usr_priority=128, hostname=CH15

FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16

HA member information:

CH16(FIM04E3E16000085), Master(priority=0), uptime=78379.78, slot=1, chassis=2(2)

HA management configuration

slot: 1, chassis_uptime=145358.97, more: cluster_id:0, flag:1, local_priority:0, usr_priority:127, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(1:force-to-master) traffic-bandwidth-score=120, mgmt-link=1

hbdevs: local_interface=     1-M1 best=yes local_interface=     1-M2 best=no

ha_elbc_master: 3, local_elbc_master: 3

CH15(FIM04E3E16000074), Slave(priority=2), uptime=145363.64, slot=1, chassis=1(2) slot: 1, chassis_uptime=145363.64, more: cluster_id:0, flag:0, local_priority:2, usr_priority:128, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=120, mgmt-link=1

hbdevs: local_interface=     1-M1 last_hb_time=145640.39 status=alive local_interface=     1-M2 last_hb_time=145640.39 status=alive

CH15(FIM10E3E16000040), Slave(priority=3), uptime=145411.85, slot=2, chassis=1(2) slot: 2, chassis_uptime=145638.51, more: cluster_id:0, flag:0, local_priority:3, usr_priority:128, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=100, mgmt-link=1

hbdevs: local_interface=     1-M1 last_hb_time=145640.62 status=alive local_interface=     1-M2 last_hb_time=145640.62 status=alive

CH16(FIM10E3E16000062), Slave(priority=1), uptime=76507.11, slot=2, chassis=2(2) slot: 2, chassis_uptime=145641.75, more: cluster_id:0, flag:0, local_priority:1, usr_priority:127, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=100, mgmt-link=1

hbdevs: local_interface=     1-M1 last_hb_time=145640.39 status=alive local_interface=     1-M2 last_hb_time=145640.39 status=alive

HA management configuration

In HA mode, you should connect the interfaces in the mgmt 802.3 static aggregate interfaces of both chassis to the same switch. You can create one aggregate interface on the switch and connect both chassis management interfaces to it.

Managing individual modules in HA mode

When you browse to the system management IP address you connect to the primary FIM interface module in the primary chassis. Only the primary FIM interface module responds to management connections using the system management IP address. If a failover occurs you can connect to the new primary FIM interface module using the same system management IP address.

Managing individual modules in HA mode

In some cases you may want to connect to an individual FIM or FPM module in a specific chassis. For example, you may want to view the traffic being processed by the FPM module in slot 3 of chassis 2. You can connect to the GUI or CLI of individual modules in the chassis using the system management IP address with a special port number.

For example, if the system management IP address is 1.1.1.1 you can browse to https://1.1.1.1:44323 to connect to the FPM module in chassis 2 slot 3. The special port number (in this case 44323) is a combination of the service port, chassis ID, and slot number. The following table lists the special ports for common admin protocols:

FortiGate-7000 HA special administration port numbers

Chassis and Slot Number Slot Address HTTP

(80)

HTTPS (443) Telnet

(23)

SSH (22) SNMP (161)
Ch1 slot 5 FPM05 8005 44305 2305 2205 16105
Ch1 slot 3 FPM03 8005 44303 2303 2203 16103
Ch1 slot 1 FIM01 8003 44301 2301 2201 16101
Ch1 slot 2 FIM02 8002 44302 2302 2202 16102
Ch1 slot 4 FPM04 8004 44304 2304 2204 16104
Ch1 slot 6 FPM06 8006 44306 2306 2206 16106
Ch2 slot 5 FPM05 8005 44325 2325 2225 16125
Ch2 slot 3 FPM03 8005 44323 2323 2223 16123
Ch2 slot 1 FIM01 8003 44321 2321 2221 16121
Ch2 slot 2 FIM02 8002 44322 2322 2222 16122
Ch2 slot 4 FPM04 8004 44324 2324 2224 16124
Ch2 slot 6 FPM06 8006 44326 2326 2226 16126

For example:

Firmware upgrade

l To connect to the GUI of the FPM module in chassis 1 slot 3 using HTTPS you would browse to https://1.1.1.1:44313. l To send an SNMP query to the FPM module in chassis 2 slot 6 use the port number 16126.

The formula for calculating the special port number is based on Chassis ID. CH1 = Chassis ID1, CH2 = Chassis ID2. The formula is: service_port x 100 + (chassis_id – 1) x 20 + slot_id.

Firmware upgrade

All of the modules in a FortiGate-7000 HA cluster run the same firmware image. You upgrade the firmware from the GUI or CLI by logging into the primary FIM interface module using the system management IP address and uploading the firmware image.

If uninterruptable-upgrade and session-pickup are enabled, firmware upgrades should only cause a minimal traffic interruption. Use the following command to enable these settings (they should be enabled by default). These settings are synchronized to all modules in the cluster. config system ha set uninterruptable-upgrade enable set session-pickup enable

end

When enabled, the primary FIM interface module uploads firmware to all modules, but in this case, the modules in the backup chassis install their new firmware and reboot and rejoin the cluster and resynchronize.

Then all traffic fails over to the backup chassis which becomes the new primary chassis. Then the modules in the new backup chassis upgrade their firmware and rejoin the cluster. Unless override is enabled, the new primary chassis continues to operate as the primary chassis.

Normally you would want to enable uninterruptable-upgrade to minimize traffic interruptions. But unterruptable-upgrade does not have to be enabled. In fact, if a traffic interruption is not going to cause any problems you an disable unterruptable-upgrade so that the firmware upgrade process takes less time.

Session failover (session-pickup)

Session failover means that after a failover, communications sessions resume on the new primary FortiGate7000 with minimal or no interruption. Two categories of sessions need to be resumed after a failover:

l Sessions passing through the cluster l Sessions terminated by the cluster

Session failover (also called session-pickup) is not enabled by default for FortiGate-7000 HA. If sessions pickup is enabled, while the FortiGate-7000 HA cluster is operating the primary FortiGate-7000 informs the backup FortiGate-7000 of changes to the primary FortiGate-7000 connection and state tables for TCP and UDP sessions passing through the cluster, keeping the backup FortiGate-7000 up-to-date with the traffic currently being processed by the cluster.

Session failover (session-pickup)

After a failover the new primary FortiGate-7000 recognizes open sessions that were being handled by the cluster. The sessions continue to be processed by the new primary FortiGate-7000 and are handled according to their last known state.

Session-pickup has some limitations. For example, session failover is not supported for sessions being scanned by proxy-based security profiles. Session failover is supported for sessions being scanned by flow-based security profiles; however, flowbased sessions that fail over are not inspected after they fail over.

Session terminated by the cluster include management sessions (such as HTTPS connections to the FortiGate

GUI or SSH connection to the CLI as well as SNMP and logging and so on). Also included in this category are IPsec VPN, SSL VPN, sessions terminated by the cluster, and explicit proxy sessions. In general, whether or not session-pickup is enabled, these sessions do not failover and have to be restarted.

Enabling session pickup for TCP and UDP

To enable session-pickup, from the CLI enter:

config system ha set session-pickup enable

end

When session-pickup is enabled, sessions in the primary FortiGate-7000 TCP and UDP session tables are synchronized to the backup FortiGate-7000. As soon as a new TCP or UDP session is added to the primary FortiGate-7000 session table, that session is synchronized to the backup FortiGate-7000. This synchronization happens as quickly as possible to keep the session tables synchronized.

If the primary FortiGate-7000 fails, the new primary FortiGate-7000 uses its synchronized session tables to resume all TCP and UDP sessions that were being processed by the former primary FortiGate-7000 with only minimal interruption. Under ideal conditions all TCP and UDP sessions should be resumed. This is not guaranteed though and under less than ideal conditions some sessions may need to be restarted.

If session pickup is disabled

If you disable session pickup, the FortiGate-7000 HA cluster does not keep track of sessions and after a failover, active sessions have to be restarted or resumed. Most session can be resumed as a normal result of how TCP and UDP resumes communication after any routine network interruption.

If you do not require session failover protection, leaving session pickup disabled may reduce CPU usage and reduce HA heartbeat network bandwidth usage. Also if your FortiGate-7000 HA cluster is mainly being used for traffic that is not synchronized (for example, for proxy-based security profile processing) enabling session pickup is not recommended since most sessions will not be failed over anyway.

Primary unit selection and failover criteria

Primary unit selection and failover criteria

Once two FortiGate-7000s recognize that they can form a cluster, they negotiate to select a primary chassis. Primary selection occurs automatically based on the criteria shown below. After the cluster selects the primary, the other chassis becomes the backup.

Negotiation and primary chassis selection also takes place if the one of the criteria for selecting the primary chassis changes. For example, an interface can become disconnected or module can fail. After this happens, the cluster can renegotiate to select a new primary chassis also using the criteria shown below.

Primary unit selection and failover criteria

How link and module failures affect primary chassis selection

If there are no failures and if you haven’t configured any settings to influence primary chassis selection, the chassis with the highest serial number to becomes the primary chassis.

Using the serial number is a convenient way to differentiate FortiGate-7000 chassis; so basing primary chassis selection on the serial number is predictable and easy to understand and interpret. Also the chassis with the highest serial number would usually be the newest chassis with the most recent hardware version. In many cases you may not need active control over primary chassis selection, so basic primary chassis selection based on serial number is sufficient.

In some situations you may want have control over which chassis becomes the primary chassis. You can control primary chassis selection by setting the priority of one chassis to be higher than the priority of the other. If you change the priority of one of the chassis, during negotiation, the chassis with the highest priority becomes the primary chassis. As shown above, FortiGate-7000 FGCP selects the primary chassis based on priority before serial number. For more information about how to use priorities, see High Availability on page 57.

Chassis uptime is also a factor. Normally when two chassis start up their uptimes are similar and do not affect primary chassis selection. However, during operation, if one of the chassis goes down the other will have a much higher uptime and will be selected as the primary chassis before priorty and serial number are tested.

Verifying primary chassis selection

You can use the diagnose sys ha status command to verify which chassis has become the primary chassis as shown by the following command output example. This output also shows that the chassis with the highest serial number was selected to be the primary chassis.

diagnose sys ha status

==========================================================================

Current slot: 1 Module SN: FIM04E3E16000085 Chassis HA mode: a-p

Chassis HA information:

[Debug_Zone HA information]

HA group member information: is_manage_master=1.

FG74E83E16000015: Slave, serialno_prio=1, usr_priority=128, hostname=CH15

FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16

How link and module failures affect primary chassis selection

The total number of connected data interfaces in a chassis has a higher priority than the number of failed modules in determining which chassis in a FortiGate-7000 HA configuration is the primary chassis. For example, if one chassis has a failed FPM module and the other has a disconnected or failed data interface, the chassis with the failed processor module becomes the primary unit.

For another example, the following diagnose sys ha status command shows the HA status for a cluster where one chassis has a disconnected or failed data interface and the other chassis has a failed FPM module.

diagnose sys ha status

==========================================================================

Slot: 2 Module SN: FIM01E3E16000088 Chassis HA mode: a-p

Chassis HA information:

How link and module failures affect primary chassis selection

[Debug_Zone HA information]

HA group member information: is_manage_master=1.

FG74E33E16000027: Master, serialno_prio=0, usr_priority=128, hostname=Chassis-K FG74E13E16000072: Slave, serialno_prio=1, usr_priority=128, hostname=Chassis-J

HA member information:

Chassis-K(FIM01E3E16000088), Slave(priority=1), uptime=2237.46, slot=2, chassis=1(1) slot: 2, chassis_uptime=2399.58,

state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1

hbdevs: local_interface= 2-M1 best=yes local_interface= 2-M2 best=no

Chassis-J(FIM01E3E16000031), Slave(priority=2), uptime=2151.75, slot=2, chassis=2(1) slot: 2, chassis_uptime=2151.75,

state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1

hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-J(FIM01E3E16000033), Slave(priority=3), uptime=2229.63, slot=1, chassis=2(1) slot: 1, chassis_uptime=2406.78,

state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1

hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-K(FIM01E3E16000086), Master(priority=0), uptime=2203.30, slot=1, chassis=1(1) slot: 1, chassis_uptime=2203.30,

state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=1, force-state(0:none) traffic-bandwidth-score=30, mgmt-link=1

hbdevs: local_interface= 2-M1 last_hb_time= 2399.74 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead

This output shows that chassis 1 (hostname Chassis-K) is the primary or master chassis. The reason for this is that chassis 1 has a total traffic-bandwidth-score of 30 + 20 = 50, while the total trafficbandwidth-score for chassis 2 (hostname Chassis-J) is 20 + 20 = 40.

The output also shows that both FIM modules in chassis 1 are detecting a worker failure (worker_ failure=1/2) while both FIM modules in chassis 2 are not detecting a worker failure worker_ failure=0/2). The intf-state=(port up)=1 field shows that FIM module in slot 1 of chassis 1 has one more interface connected than the FIM module in slot 1 of chassis 2. It is this extra connected interface that gives the FIM module in chassis 1 slot 1 the higher traffic bandwidth score than the FIM module in slot 1 of chassis 2.

One of the interfaces on the FIM module in slot 1 of chassis 2 must have failed. In a normal HA configuration the FIM modules in matching slots of each chassis should have redundant interface connections. So if one module has fewer connected interfaces this indicates a link failure.

Link failure threshold and board failover tolerance

FIM module failures

If an FIM module fails, not only will HA recognize this as a module failure it will also give the chassis with the failed FIM module a much lower traffic bandwidth score. So an FIM module failure would be more likely to cause an HA failover than a FPM module failover.

Also, the traffic bandwidth score for an FIM module with more connected interfaces would be higher than the score for an FIM module with fewer connected interfaces. So if a different FIM module failed in each chassis, the chassis with the functioning FIM module with the most connected data interfaces would have the highest traffic bandwidth score and would become the primary chassis.

Management link failures

Management connections to a chassis can affect primary chassis selection. If the management connection to one chassis become disconnected a failover will occur and the chassis that still has management connections will become the primary chassis.

Link failure threshold and board failover tolerance

The default settings of the link failure threshold and the board failover tolerance result in the default link and module failure behavior. You can change these settings if you want to modify this behavior. For example, if you want a failover to occur if an FPM module fails, even if an interface has failed you can increase the board failover tolerance setting.

Link failure threshold

The link failure threshold determines how many interfaces in a link aggregation interface (LAG) can be lost before the LAG interface is considered down. The chassis with the most connected LAGs becomes the primary chassis. if a LAG goes down the cluster will negotiate and may select a new primary chassis. You can use the following command to change the link failure threshold:

config system ha set link-failure-threshold <threshold>

end

The threshold range is 0 to 80 and 0 is the default.

A threshold of 0 means that if a single interface in any LAG fails the LAG the considered down. A higher failure threshold means that more interfaces in a LAG can fail before the LAG is considered down. For example, if the threshold is set to 1, at least two interfaces will have to fail.

Board failover tolerance

You can use the following command to configure board failover tolerance.

config system ha set board-failover-tolerance <tolerance>

end

The tolerance range is 0 to 12, and 0 is the default.

Priority and primary chassis selection

A tolerance of 0 means that if a single module fails in the primary chassis a failover occurs and the chassis with the fewest failed modules becomes the new primary chassis. A higher failover tolerance means that more modules can fail before a failover occurs. For example, if the tolerance is set to 1, at least two modules in the primary chassis will have to fail before a failover occurs.

Priority and primary chassis selection

You can select a chassis to become the primary chassis by setting the HA priority of one of one or more of its FIM modules (for example, the FIM module in slot 1) higher than the priority of the other FIM modules. Enter the following command to set the HA priority:

config system ha set priority <number>

end

The default priority is 128.

The chassis with the highest total FIM module HA priority becomes the primary chassis.

Override and primary chassis selection

Enabling override changes the order of primary chassis selection. If override is enabled, primary chassis selection considers priority before chassis up tme and serial number. This means that if you set the device priority higher for one chassis, with override enabled this chassi becomes the primary chassis even if its uptime and serial number are lower than the other chassis.

Enter the following command to enable override.

config system ha set override enable

end

When override is enabled primary unit selection checks the traffic bandwidth score, aggregate interface state, management interface links and FPM module failures first. So any of these factors affect primary chassis selection, even if override is enabled.

This entry was posted in Administration Guides, FortiGate and tagged , , on by .

About Mike

Michael Pruett, CISSP has a wide range of cyber-security and network engineering expertise. The plethora of vendors that resell hardware but have zero engineering knowledge resulting in the wrong hardware or configuration being deployed is a major pet peeve of Michael's. This site was started in an effort to spread information while providing the option of quality consulting services at a much lower price than Fortinet Professional Services. Owns PacketLlama.Com (Fortinet Hardware Sales) and Office Of The CISO, LLC (Cybersecurity consulting firm).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.