Example: Failover scenarios
This section describes basic FortiMail active-passive HA failover scenarios. For each scenario, refer to the HA group shown in Figure 134. To simplify the descriptions of these scenarios, the following abbreviations are used:
- P1 is the configured primary unit.
- S2 is the configured secondary unit.
Figure 134:Example active-passive HA group
Active-passive HA group
port1
This section contains the following HA failover scenarios:
This topic includes:
- Failover scenario 1: Temporary failure of the primary unit
- Failover scenario 2: System reboot or reload of the primary unit
- Failover scenario 3: System reboot or reload of the secondary unit
- Failover scenario 4: System shutdown of the secondary unit
- Failover scenario 5: Primary heartbeat link fails
- Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)
Failover scenario 1: Temporary failure of the primary unit
In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.
When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.
Here is what happens during this process:
- The FortiMail HA group is operating normally.
- The power is accidentally disconnected from P1.
- S2’s primary heartbeat test detects that P1 has failed.
How soon this happens depends on the HA daemon configuration of S2.
- The effective HA operating mode of S2 changes to master.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘MASTER heartbeat disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
- S2 records event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
Recovering from temporary failure of the primary unit
After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure settings under System > High Availability > Configuration.
Figure 135:HA On Failure settings
- switch off
P1 will not process email or join the HA group until you manually select the effective HA operating mode (see “click HERE to restart the HA system” on page 317 and “click HERE to restore configured operating mode” on page 317).
- wait for recovery then restore original role
On recovery, P1’s effective HA operating mode resumes its configured master role. This also means that S2 needs to give back the master role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
In the case, the S2 will send out another alert email similar to the following:
This is the HA machine at 172.16.5.11.
The following event has occurred
‘SLAVE asks us to switch roles (recovery after a restart) The state changed from ‘MASTER’ to ‘SLAVE’
After recovery, P1 also sends out an alert email similar to the following:
This is the HA machine at 172.16.5.10.
The following critical event was detected The system was shutdown!
- wait for recovery then restore slave role
On recovery, P1’s effective HA operating mode becomes slave, and S2 continues to assume the master role. P1 then synchronizes the content of its MTA queue directories with the current master unit, S2. S2 can then deliver email that existed in P1’s MTA queue directory at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see “click HERE to restore configured operating mode” on page 317.
Failover scenario 2: System reboot or reload of the primary unit
If you need to reboot or reload (not shut down) P1 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd…>, or by clicking the Restart button under Monitor > System Status > Status on the GUI:
- P1 will send a holdoff command to S2 so that S2 will not take over the master role during P1’s reboot.
- P1 will also send out an alert email similar to the following:
This is the HA machine at 172.16.5.10.
The following critical event was detected The system is rebooting (or reloading)!
- S2 will hold off checking the services and heartbeat with P1. Note that S2 will only hold off for about 5 minutes. In case P1 never boots up, S2 will take over the master role.
- S2 will send out an alert email, indicating that S2 received the holdoff command from P1.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘peer rebooting (or reloading)’
The state changed from ‘SLAVE’ to ‘HOLD_OFF’ After P1 is up again:
- P1 will send another command to S2 and ask S2 to change its state from holdoff to slave and resume monitoring P1’s services and heartbeat.
- S2 will send out an alert email, indicating that S2 received instruction commands from P1.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘peer command appeared’
The state changed from ‘HOLD_OFF’ to ‘SLAVE’
- S2 logs the event in the HA logs.
Failover scenario 3: System reboot or reload of the secondary unit
If you need to reboot or reload (not shut down) S2 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd…>, or by clicking the Restart button under Monitor > System Status > Status on the GUI, the behavior of P1 and S2 is as follows:
For FortiMail v4.1 and newer releases:
- P1 will send out an alert email similar to the following, informing the administrator of the heartbeat loss with S2.
This is the HA machine at 172.16.5.10.
The following event has occurred
‘ha: SLAVE heartbeat disappeared’
- S2 will send out an alert email similar to the following:
This is the HA machine at 172.16.5.11.
The following critical event was detected The system is rebooting (or reloading)!
- P1 will also log this event in the HA logs.
For FortiMail v4.0 releases:
- P1 will not send out the alert email.
- P1 will log the event in the HA logs.
Failover scenario 4: System shutdown of the secondary unit
If you shut down S2:
- No alert email is sent out from either P1 or S2.
- P1 will log this event in the HA logs.
Failover scenario 5: Primary heartbeat link fails
If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the secondary unit (S2) changes to operating as a primary unit, and both FortiMail units are acting as primary units.
Two primary units connected to the same network may cause address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.
Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the web-based manager of S2 to restore its effective HA operating mode to slave (secondary unit).
- The FortiMail HA group is operating normally.
- The heartbeat link Ethernet cable is accidently disconnected.
- S2’s HA heartbeat test detects that the primary unit has failed.
How soon this happens depends on the HA daemon configuration of S2.
- The effective HA operating mode of S2 changes to master.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘MASTER heartbeat disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
- S2 records event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
Recovering from a heartbeat link failure
Because the hardware failure is not permanent (that is, the failure of the heartbeat link was caused by a disconnected cable, not a failed port on one of the FortiMail units), you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.
To return to normal operation after the heartbeat link fails
- Reconnect the primary heartbeat interface by reconnecting the heartbeat link Ethernet cable.
Even though the effective HA operating mode of S2 is master, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1 is also operating as a primary unit. So S2 sends a heartbeat signal to notify P1 to stop operating as a primary unit. The effective HA operating mode of P1 changes to off.
- P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.
This is the HA machine at 172.16.5.10
The following event has occurred
‘SLAVE asks us to switch roles (user requested takeover)’
The state changed from ‘MASTER’ to ‘OFF’
- P1 records event log messages (among others) indicating that P1 is switching to off
The configured HA mode of operation of P1 is master and the effective HA operating mode of P1 is off.
The configured HA mode of operation of S2 is slave and the effective HA operating mode of S2 is master.
P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.
- Connect to the web-based manager of P1, go to System > High Availability > Status.
- Check for synchronization messages.
Do not proceed to the next step until P1 has synchronized with S2.
- Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
The HA group should return to normal operation. P1 records the event log message (among others) indicating that S2 asked P1 to return to operating as the primary unit.
- P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.
Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)
Depending on your network configuration, the network connection between the primary and secondary units can fail for a number of reasons. In the network configuration shown in Figure 134 on page 330, the connection between port1 of primary unit (P1) and port1 of the secondary unit (S2) can fail if a network cable is disconnected or if the switch between P1 and S2 fails.
A more complex network configuration could include a number of network devices between the primary and secondary unit’s non-heartbeat network interfaces. In any configuration, remote service monitoring can only detect a communication failure. Remote service monitoring cannot determine where the failure occurred or the reason for the failure.
In this scenario, remote service monitoring has been configured to make sure that S2 can connect to P1. The On failure setting located in the HA main configuration section is wait for recovery then restore slave role. For information on the On failure setting, see “On failure” on page 322. For information about remote service monitoring, see “Configuring service-based failover” on page 328.
The failure occurs when power to the switch that connects the P1 and S2 port1 interfaces is disconnected. Remote service monitoring detects the failure of the network connection between the primary and secondary units. Because of the On failure setting, P1 changes its effective HA operating mode to failed.
When the failure is corrected, P1 detects the correction because while operating in failed mode P1 has been attempting to connect to S2 using the port1 interface. When P1 can connect to S2, the effective HA operating mode of P1 changes to slave and the mail data on P1 will be synchronized to S2. S2 can now deliver this mail. The HA group continues to operate in this manner until an administrator resets the effective HA modes of operation of the FortiMail units.
- The FortiMail HA group is operating normally.
- The power cable for the switch between P1 and S2 is accidently disconnected.
- S2’s remote service monitoring cannot connect to the primary unit.
How soon this happens depends on the remote service monitoring configuration of S2.
- Through the HA heartbeat link, S2 signals P1 to stop operating as the primary unit.
- The effective HA operating mode of P1 changes to failed.
- The effective HA operating mode of S2 changes to master.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘MASTER remote service disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
- S2 logs the event (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
- P1 sends an alert email similar to the following, indicating that P1 has stopped operating in HA mode.
This is the HA machine at 172.16.5.10.
The following event has occurred
‘SLAVE asks us to switch roles (user requested takeover)’ The state changed from ‘MASTER’ to ‘FAILED’
10.P1 records the following log messages (among others) indicating that P1 is switching to Failed mode.
Recovering from a network connection failure
Because the network connection failure was not caused by failure of either FortiMail unit, you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.
To return to normal operation after the heartbeat link fails
- Reconnect power to the switch.
Because the effective HA operating mode of P1 is failed, P1 is using remote service monitoring to attempt to connect to S2 through the switch.
- When the switch resumes operating, P1 successfully connects to S2.
P1 has determined the S2 can connect to the network and process email.
- The effective HA operating mode of P1 switches to slave.
- P1 logs the event.
- P1 sends an alert email similar to the following, indicating that P1 is switching its effective HA operating mode to slave.
This is the HA machine at 172.16.5.10.
The following event has occurred
‘SLAVE asks us to switch roles (user requested takeover)’ The state changed from ‘FAILED’ to ‘SLAVE’
- P1 synchronizes the content of its MTA queue directories to S2. S2 can now deliver all email in these directories.
The HA group can continue to operate with S2 as the primary unit and P1 as the secondary unit. However, you can use the following steps to restore each unit to its configured HA mode of operation.
- Connect to the web-based manager of P1 and go to System > High Availability > Status.
- Check for synchronization messages.
Do not proceed to the next step until P1 has synchronized with S2.
- Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
10.Connect to the web-based manager of P1, go to System > High Availability > Status and select click HERE to restore configured operating mode.
P1 should return to operating as the primary unit and S2 should return to operating as the secondary unit.
11.P1 and S2 synchronize their MTA queue directories again. P1 can now deliver all email in these directories.
Example: Active-passive HA group in gateway mode
In this example, two FortiMail-400 units are configured to operate in gateway mode as an active-passive HA group.
The procedures in this example describe HA configuration necessary to achieve this scenario. Before beginning, verify that both of the FortiMail units are already:
- physically connected according to Figure 136 on page 338
- operating in gateway mode (for details, see “Operation mode” on page 175)
- configured with the IP addresses for their port3 and port1 network interfaces according to Figure 136 on page 338, with the exception of the HA virtual IP address that will be configured in this example (for details, see “Editing network interfaces” on page 248)
- allowing HTTPS administrative access through their port1 network interfaces according to Figure 136 on page 338
Figure 136:Virtual IP address for HA failover
Gateway mode
Protected Domain:
@example.com
The active-passive HA group is located on a private network with email users and the protected email server. All are behind a FortiGate unit which separates the private network from the Internet. The DNS server, remote email users, and external SMTP servers are located on the Internet.
For both FortiMail units:
port1 • | connected to a switch which is connected only to the computer that the FortiMail administrator uses to manage the HA group |
• | administrative access occurs through this port |
port3 • | connected to a switch which is connected to the private network and, indirectly, the Internet |
• | email connections occur through this port |
port6 • | connected directly to each other using a crossover cable |
• | heartbeat and synchronization occurs through this port |
The secondary unit will become the new primary unit when a failover occurs. In order for it to receive the connections formerly destined for the failed primary unit, the new primary unit must adopt the failed primary unit’s IP address. You will configure an HA virtual IP address on port3 for this purpose.
While the configured primary unit is functional, the HA virtual IP address is associated with its port3 network interface, which receives email connections. After a failover, the HA virtual IP address becomes associated with the new primary unit’s port3. As a result, after a failover, the new primary unit (originally the secondary unit) will then receive and process the email connections.
This example contains the following topics:
- About standalone versus HA deployment
- Configuring the DNS and firewall settings
- Configuring the primary unit for HA operation
- Configuring the secondary unit for HA operation
- Administering an HA group
About standalone versus HA deployment
If you plan to convert a standalone FortiMail unit to a member of an HA group, first understand the changes you need to make for HA deployment shown in Figure 136 on page 338 in the context of its similarities and differences with a standalone deployment.
Examine the network interface configuration of a standalone FortiMail-400 unit in Table 40.
Table 40:Example standalone network interface configuration
Network interface | IP address | Description |
port1 | 192.168.1.5 | Administrative connections to the FortiMail unit. |
port2, port4 | Default | Not connected. |
port3 | 172.16.1.2 | Email connections to the FortiMail unit; the target of your email DNS A records. (No administrative access.) |
port5 | Default | Not connected. |
port6 | Default | Not connected. |
Similarly, for the HA group, DNS A records should target the IP address of the port3 interface of the primary FortiMail-400 unit. Additionally, administrators should administer each FortiMail unit in the HA group by connecting to the IP address of each FortiMail unit’s port1.
If a failover occurs, the network must be able to direct traffic to port3 of the secondary unit without reconfiguring the DNS A record target. The secondary unit must cleanly and automatically substitute for the primary unit, as if they were a single, standalone unit.
Unlike the configuration of the standalone unit, for the HA group to accomplish that substitution, all email connections must use an IP address that transfers between the primary unit and the secondary unit according to which one’s effective HA operating mode is currently master. This transferable IP address can be accomplished by configuring the HA group to either:
- set the IP address of the current primary unit’s network interface
- add a virtual IP address to the current primary unit’s network interface
In this example, the HA group uses the method of adding a virtual IP address. Email connections will not use the actual IP address of port3. Instead, all email connections will use only the virtual IP address 172.16.1.2, which is used by port3 of whichever FortiMail unit’s effective HA operating mode is currently master. During normal HA group operation, this IP address resides on the primary unit. Conversely, after a failover occurs, this IP address resides on the former secondary unit (now the current primary unit).
Also unlike the configuration of the standalone unit, both port5 and port6 are configured for each member of the HA group. The primary unit’s port5 is directly connected using a crossover cable to the secondary unit’s port5; the primary unit’s port6 is directly connected to the secondary unit’s port6. These links are used solely for heartbeat and synchronization traffic between members of the HA group.
For comparison with the standalone unit, examine the network configuration of the primary unit in Table 41.
Table 41:Example primary unit HA network interface configuration
Interface | IP/Netmask | Virtual IP address | Description | |
Setting | IP address | |||
port1 | 192.168.1.5 | Ignore | Administrative connections to this FortiMail unit. (Because the IP address does not follow the FortiMail unit whose effective mode is currently master, connections to this IP address are specific to this physical unit. Administrators can still connect to this FortiMail unit after failover, which may be useful for diagnostic purposes.) | |
port2, port4 | Default | Ignore | Not connected. | |
port3 | 172.16.1.5 | Set | 172.16.1.2 | Email connections to the FortiMail unit; the target of your email DNS MX and A records. Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the FortiMail unit whose effective HA operating mode is master. No administrative access. |
port5 | 10.0.1.2 | Ignore | Secondary heartbeat and synchronization interface. | |
port6 | 10.0.0.2 | Ignore | Primary heartbeat and synchronization interface. |
Because the Virtual IP action settings are synchronized between the primary and secondary units, you do not need to configure them separately on the secondary unit. However, you must configure the secondary unit with other settings listed in Table 42.
Table 42:Example secondary unit HA network interface configuration
Interfac e | IP/Netma sk | Virtual IP Address | Description | |
Setting | IP address | |||
port1 | 192.168.1.
6 |
(synchronize d from primary unit) | (synchronize d from primary unit) | Administrative connections to this FortiMail unit. (Because the IP address does not follow the FortiMail unit whose effective mode is currently master, connections to this IP address are specific to this physical unit. Administrators can connect to this FortiMail unit even when it is currently the secondary unit, which may be useful for HA configuration and log viewing.) |
port2, port4 | Default | (synchronize d from primary unit) | (synchronize d from primary unit) | Not connected. |
port3 | 172.16.1.6 | (synchronize d from primary unit) | (synchronize d from primary unit) | Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the FortiMail unit whose effective HA operating mode is master. As a result, no connections should be destined for this network interface until a failover occurs, causing the secondary unit to become the new primary unit. No administrative access. |
port5 | 10.0.1.4 | (synchronize d from primary unit) | (synchronize d from primary unit) | Secondary heartbeat and synchronization interface. |
port6 | 10.0.0.4 | (synchronize d from primary unit) | (synchronize d from primary unit) | Primary heartbeat and synchronization interface. |
Configuring the DNS and firewall settings
In the example shown in Figure 136 on page 338, SMTP clients will connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, this connection occurs through the public network virtual IP on the FortiGate unit, whose policies allow the connections and route them to the virtual IP on the current primary unit.
Because the FortiMail HA group is installed behind a firewall performing NAT, the DNS server hosting records for the domain example.com must be configured to reflect the public IP address of the FortiGate unit, rather than the private network IP address of the HA group.
The DNS server has been configured with:
- an MX record to indicate that the FortiMail unit is the email gateway for example.com
- an A record to resolve fortimail.example.com into the FortiGate unit’s public IP address
- a reverse DNS record to enable external email servers to resolve the public IP address of the FortiGate unit into the domain name of the FortiMail unit
Configuring the primary unit for HA operation
The following procedure describes how to prepare a FortiMail unit for HA operation as the primary unit according to Figure 136 on page 338.
Before beginning this procedure, verify that you have completed the required preparations described in “Example: Active-passive HA group in gateway mode” on page 337.
To configure the primary unit for HA operation
- Connect to the web-based manager of the primary unit at https://192.168.1.5/admin.
- Go to System > Network.
- Configure port 6 to 10.0.0.2/255.255.255.0 and port 6 to 10.0.1.2/255.255.255.0.
- Go to System > High Availability > Configuration.
- Configure the following:
HA Configuration section | . |
Mode of operation | master |
On failure | wait for recovery then assume slave role |
Shared password | change_me |
Backup options section | See “Configuring the backup options”. |
Backup mail data directories | enabled |
Backup MTA queue directories | disabled |
Advanced options section | See “Configuring the advanced options”. |
HA base port | 2000 |
Heartbeat lost threshold | 15 seconds |
Remote services as heartbeat | disabled |
Interface section | See “Configuring interface monitoring”. |
Interface | port6 |
Enable port monitor | Enabled |
Heartbeat status | Primary |
Peer IP address | 10.0.0.4 |
Interface | port5 |
Enable port monitor | Enabled |
Heartbeat status | Secondary |
Peer IP address | 10.0.1.4 |
Virtual IP Address | |
port1 | Ignore |
port2 | Ignore |
port3 | Set
172.16.1.2/255.255.255.0 |
port4 | Ignore |
port5 | Ignore |
port6 | Ignore |
- Click Apply.
The FortiMail unit switches to active-passive HA mode, and, after determining that there is no other primary unit, sets its effective HA operating mode to master. The virtual IP 172.16.1.2 is added to port3; if not already complete, configure DNS records and firewalls to route email traffic to this virtual IP address, not the actual IP address of the port3 network interface.
- To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be master.
If the effective HA operating mode is not master, the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.
Figure 137:Primary unit status