Replacing a failed cluster unit
This procedure describes how to remove a failed cluster unit from a cluster and add a new one to replace it. You can also use this procedure to remove a failed unit from a cluster, repair it and add it back to the cluster. Replacing a failed does not interrupt the operation of the cluster unless you have to change how the cluster is connected to the network to accommodate the replacement unit.
You can use this procedure to replace more than one cluster unit.
To replace a failed cluster unit
1. Disconnect the failed unit from the cluster and the network.
If you maintain other connections between the network and the still functioning cluster unit or units and between remaining cluster units network traffic will continue to be processed.
2. Repair the failed cluster unit, or obtain a replacement unit with the exact same hardware configuration as the failed cluster unit.
3. Install the same firmware build on the repaired or replacement unit as is running on the cluster.
4. Register and apply licenses to the FortiGate unit. This includes FortiCloud activation, FortiClient licensing, and FortiToken licensing, and entering a license key if you purchased more than 10 Virtual Domains (VDOMS).
5. You can also install any third-party certificates on the primary FortiGate before forming the cluster. Once the cluster is formed third-party certificates are synchronized to the backup FortiGate.
6. Configure the repaired or replacement unit for HA operation with the same HA configuration as the cluster.
7. If the cluster is running in Transparent mode, change the operating mode of the repaired or replacement unit to Transparent mode.
8. Connect the repaired or replacement cluster unit to the cluster.
For an example see: How to set up FGCP clustering (recommended steps) on page 1354.
9. Power on the repaired or replacement cluster unit.
When the unit starts it negotiates to join the cluster. After it joins the cluster, the cluster synchronizes the repaired or replacement unit configuration with the configuration of the primary unit.
You can add a repaired or replacement unit to a functioning cluster at any time. The repaired or replacement cluster unit must:
- Have the same hardware configuration as the cluster units. Including the same hard disk configuration and the same AMC cards installed in the same slots.
- Have the same firmware build as the cluster.
- Be set to the same operating mode (NAT or Transparent) as the cluster.
- Be operating in single VDOM mode.
Hello Mike!
Great article, just one quick doubt, after the replacement (with a new unit) I pressume that we have to reboot the primary unit in order to clear the error displayed in “get sys ha status” output (the error references the old defective unit). am I correct? it is no longer in the cluster but the error still appears:
INTERNALFW1 (global) # get sys ha status <– This is the unit that was left single in the cluster after the crash of INTERNALFW2, but this output has been taken after joining successfully a new FGT800C.
HA Health Status:
ERROR: FG800C2021801555 is lost @ 2020/12/31 11:47:46
Model: FortiGate-800C
We just replaced it and performed the failover of all vdoms succesfully changing priorities but we forgot to reboot the previous active unit, and now we have one vdom active in each of the units using virtual cluster so we will have to reserve another maintenance window (just in case) prior to failover again and rebooting to check if it clears the error.
Thanks!