Category Archives: FortiOS 5.4 Handbook

The complete handbook for FortiOS 5.4

Link failover (port monitoring or interface monitoring)

Link failover means that if a monitored interface fails, the cluster reorganizes to reestablish a link to the network that the monitored interface was connected to and to continue operating with minimal or no disruption of network traffic.

You configure monitored interfaces (also called interface monitoring or port monitoring) by selecting the interfaces to monitor as part of the cluster HA configuration.

You can monitor up to 64 interfaces.

The interfaces that you can monitor appear on the port monitor list. You can monitor all FortiGate interfaces including redundant interfaces and 802.3ad aggregate interfaces.

You cannot monitor the following types of interfaces (you cannot select the interfaces on the port monitor list):

FortiGate interfaces that contain an internal switch.
VLAN subinterfaces.
IPsec VPN interfaces.
Individual physical interfaces that have been added to a redundant or 802.3ad aggregate interface.
FortiGate-5000 series backplane interfaces that have not been configured as network interfaces.

If you are configuring a virtual cluster you can create a different port monitor configuration for each virtual cluster. Usually for each virtual cluster you would monitor the interfaces that have been added to the virtual domains in each virtual cluster.

Wait until after the cluster is up and running to enable interface monitoring. You do not need to configure interface monitoring to get a cluster up and running and interface monitoring will cause failovers if for some reason during initial setup a monitored interface has become disconnected. You can always enable interface monitoring once you have verified that the cluster is connected and operating properly.

You should only monitor interfaces that are connected to networks, because a failover may occur if you monitor an unconnected interface.

To enable interface monitoring – web-based manager

Use the following steps to monitor the port1 and port2 interfaces of a cluster.

1. Connect to the cluster web-based manager.

2. Go to System > HA and edit the primary unit (Role is MASTER).

3. Select the Port Monitor check boxes for the port1 and port2 interfaces and select OK.

The configuration change is synchronized to all cluster units.

To enable interface monitoring – CLI

Use the following steps to monitor the port1 and port2 interfaces of a cluster.

1. Connect to the cluster CLI.

2. Enter the following command to enable interface monitoring for port1 and port2.

configure system ha

set monitor port1 port2 end

The following example shows how to enable monitoring for the external, internal, and DMZ interfaces.

config system ha

set monitor external internal dmz end

With interface monitoring enabled, during cluster operation, the cluster monitors each cluster unit to determine if the monitored interfaces are operating and connected. Each cluster unit can detect a failure of its network interface hardware. Cluster units can also detect if its network interfaces are disconnected from the switch they should be connected to.

Cluster units cannot determine if the switch that its interfaces are connected to is still con- nected to the network. However, you can use remote IP monitoring to make sure that the cluster unit can connect to downstream network devices. See Remote link failover on page 1534.

Because the primary unit receives all traffic processed by the cluster, a cluster can only process traffic from a network if the primary unit can connect to it. So, if the link between a network and the primary unit fails, to maintain communication with this network, the cluster must select a different primary unit; one that is still connected to the network. Unless another link failure has occurred, the new primary unit will have an active link to the network and will be able to maintain communication with it.

To support link failover, each cluster unit stores link state information for all monitored cluster units in a link state database. All cluster units keep this link state database up to date by sharing link state information with the other cluster units. If one of the monitored interfaces on one of the cluster units becomes disconnected or fails, this information is immediately shared with all cluster units.

If a monitored interface on the primary unit fails

If a monitored interface on the primary unit fails, the cluster renegotiates to select a new primary unit using the process described in An introduction to the FGCP on page 1310. Because the cluster unit with the failed monitored interface has the lowest monitor priority, a different cluster unit becomes the primary unit. The new primary unit should have fewer link failures.

After the failover, the cluster resumes and maintains communication sessions in the same way as for a device failure. See Device failover on page 1499.

If a monitored interface on a subordinate unit fails

If a monitored interface on a subordinate unit fails, this information is shared with all cluster units. The cluster does not renegotiate. The subordinate unit with the failed monitored interface continues to function in the cluster.

In an active-passive cluster after a subordinate unit link failover, the subordinate unit continues to function normally as a subordinate unit in the cluster.

In an active-active cluster after a subordinate unit link failure:

The subordinate unit with the failed monitored interface can continue processing connections between functioning interfaces. However, the primary unit stops sending sessions to a subordinate unit that use any failed monitored interfaces on the subordinate unit.
If session pickup is enabled, all sessions being processed by the subordinate unit failed interface that can be failed over are failed over to other cluster units. Sessions that cannot be failed over are lost and have to be restarted.
If session pickup is not enabled all sessions being processed by the subordinate unit failed interface are lost.

How link failover maintains traffic flow

Monitoring an interface means that the interface is connected to a high priority network. As a high priority network, the cluster should maintain traffic flow to and from the network, even if a link failure occurs. Because the primary unit receives all traffic processed by the cluster, a cluster can only process traffic from a network if the primary unit can connect to it. So, if the link that the primary unit has to a high priority network fails, to maintain traffic flow to and from this network, the cluster must select a different primary unit. This new primary unit should have an active link to the high priority network.

A link failure causes a cluster to select a new primary unit

If a monitored interface on the primary unit fails, the cluster renegotiates and selects the cluster unit with the highest monitor priority to become the new primary unit. The cluster unit with the highest monitor priority is the cluster unit with the most monitored interfaces connected to networks.

After a link failover, the primary unit processes all traffic and all subordinate units, even the cluster unit with the link failure, share session and link status. In addition all configuration changes, routes, and IPsec SAs are synchronized to the cluster unit with the link failure.

In an active-active cluster, the primary unit load balances traffic to all the units in the cluster. The cluster unit with the link failure can process connections between its functioning interfaces (for, example if the cluster has connections to an internal, external, and DMZ network, the cluster unit with the link failure can still process connections between the external and DMZ networks).

Synchronizing IPsec VPN SAs

Leave a reply

Synchronizing IPsec VPN SAs

The FGCP synchronizes IPsec security associations (SAs) between cluster members so that if a failover occurs, the cluster can resume IPsec sessions without having to establish new SAs. The result is improved failover performance because IPsec sessions are not interrupted to establish new SAs. Also, establishing a large number of SAs can reduce cluster performance.

The FGCP implements slightly different synchronization mechanisms for IKEv1 and IKEv2.

Synchronizing SAs for IKEv1

When an SA is synchronized to the subordinate units. the sequence number is set to the maximum sequence number. After a failover, all inbound traffic that connects with the new primary unit and uses the SA will be accepted without needing to re-key. However, first outbound packet to use the SA causes the sequence number to overflow and so causes the new primary unit to re-key the SA.

Please note the following:

The cluster synchronizes all IPsec SAs.
IPsec SAs are not synchronized until the IKE process has finished synchronizing the ISAKMP SAs. This is required in for dialup tunnels since it is the synchronizing of the ISAKMP SA that creates the dialup tunnel.
A dialup interface is created as soon as the phase1 is complete. This ensures that the when HA synchronizes phase1 information the dialup name is included.
If the IKE process re-starts for any reason it deletes any dialup tunnels that exist. This forces the peer to re-key them.
IPsec SA deletion happens immediately. Routes associated with a dialup tunnel that is being deleted are cleaned up synchronously as part of the delete, rather than waiting for the SA hard-expiry.
The FGCP does not sync the IPsec tunnel MTU from the primary unit to the subordinate units. This means that after HA failover if the first packet received by the FortiGate unit arrives after the HA route has been deleted and before the new route is added and the packet is larger than the default MTU of 1024 then the FortiGate unit sends back an ICMP fragmentation required. However, as soon as routing is re-established then the MTU will be corrected and traffic will flow.

Synchronizing SAs for IKEv2

Due to the way the IKEv2 protocol is designed the FGCP cannot use exactly the same solution that is used for synchronizing IKEv1 SAs, though it is similar.

For IKEv2, like IKEv1, the FGCP synchronizes IKE and ISAKMP SAs from the primary unit to the subordinate units. However, for IKEv2 the FGCP cannot actually use this IKE SA to send/receive IKE traffic because IKEv2 includes a sequence number in every IKE message and thus it would require synchronizing every message to the subordinate units to keep the sequence numbers on the subordinate units up to date.

Instead, the FGCP synchronizes IKEv2 Message IDs. This Message ID Sync allows IKEv2 to re-negotiate send and receive message ID counters after a failover. By doing this, the established IKE SA can remain up, instead of re-negotiating.

The diagnose vpn ike stats command shows statistics for the number of HA messages sent/received for IKEv2. The output of this command includes a number of fields prefixed with ha that contain high availably related-data. For example:

ha.resync: 0 ha.vike.sync: 0 ha.conn.sync: 0 ha.sync.tx: 1 ha.sync.rx: 0 ha.sync.rx.len.bad: 0

Bidirectional Forwarding Detection (BFD) enabled BGP graceful restart

Leave a reply

Bidirectional Forwarding Detection (BFD) enabled BGP graceful restart

If you configure a BFD enabled BGP neighbor as a static BFD neighbor using the router bfd command, FGCP supports graceful restart of BFD enabled BGP. The router bfd command is needed as the BGP auto- start timer is 5 seconds. After HA failover, BGP on the new primary unit has to wait for 5 seconds to connect to its neighbors, and then register BFD requests after establishing the connections. With static BFD neighbors, BFD requests and sessions can be created as soon as possible after the failover. The new command get router info bfd requests shows the BFD peer requests.

A BFD session created for a static BFD neighbor/peer request will initialize its state as “INIT” instead of “DOWN” and its detection time asbfd-required-min-rx * bfd-detect-mult milliseconds.

When a BFD control packet with nonzero your_discr is received, if no session can be found to match the your_ discr, instead of discarding the packet, other fields in the packet, such as addressing information, are used to choose one session that was just initialized, with zero as its remote discriminator.

When a BFD session in the up state receives a control packet with zero as your_discr and down as the state, the session will change its state into down but will not notify this down event to BGP and/or other registered clients.

Synchronizing kernel routing tables

Leave a reply

Synchronizing kernel routing tables

In a functioning cluster, the primary unit keeps all subordinate unit kernel routing tables (also called the forwarding information base FIB) up to date and synchronized with the primary unit. After a failover, because of these routing table updates the new primary unit does not have to populate its kernel routing table before being able to route traffic. This gives the new primary unit time to rebuild its regular routing table after a failover.

Use the following command to view the regular routing table. This table contains all of the configured routes and routes acquired from dynamic routing protocols and so on. This routing table is not synchronized. On subordinate units this command will not produce the same output as on the primary unit.

get router info routing-table

Use the following command to view the kernel routing table (FIB). This is the list of resolved routes actually being used by the FortiOS kernel. The output of this command should be the same on the primary unit and the subordinate units.

get router info kernel

This section describes how clusters handle dynamic routing failover and also describes how to use CLI commands to control the timing of routing table updates of the subordinate unit routing tables from the primary unit.

Configuring graceful restart for dynamic routing failover

When an HA failover occurs, neighbor routers will detect that the cluster has failed and remove it from the network until the routing topology stabilizes. During the time the routers may stop sending IP packets to the cluster and communications sessions that would normally be processed by the cluster may time out or be dropped. Also the new primary unit will not receive routing updates and so will not be able to build and maintain its routing database.

You can configure graceful restart (also called nonstop forwarding (NSF)) as described in RFC3623 (Graceful OSPF Restart) to solve the problem of dynamic routing failover. If graceful restart is enabled on neighbor routers, they will keep sending packets to the cluster following the HA failover instead of removing it from the network.

The neighboring routers assume that the cluster is experiencing a graceful restart.

After the failover, the new primary unit can continue to process communication sessions using the synchronized routing data received from the failed primary unit before the failover. This gives the new primary unit time to update its routing table after the failover.

You can use the following commands to enable graceful restart or NSF on Cisco routers:

router ospf 1

log-adjacency-changes

nsf ietf helper strict-lsa-checking

If the cluster is running BGP, use the following command to enable graceful restart for BGP:

config router bgp

set graceful-restart enable end

You can also add BGP neighbors and configure the cluster unit to notify these neighbors that it supports graceful restart.

config router bgp config neighbor

edit <neighbor_address_Ipv4>

set capability-graceful-restart enable end

end

If the cluster is running OSPF, use the following command to enable graceful restart for OSFP:

config router ospf

set restart-mode graceful-restart end

To make sure the new primary unit keeps its synchronized routing data long enough to acquire new routing data, you should also increase the HA route time to live, route wait, and route hold values to 60 using the following CLI command:

config system ha set route-ttl 60 set route-wait 60 set route-hold 60

end

Controlling how the FGCP synchronizes kernel routing table updates

You can use the following commands to control some of the timing settings that the FGCP uses when synchronizing routing updates from the primary unit to subordinate units and maintaining routes on the primary unit after a failover.

config system ha

set route-hold <hold_integer> set route-ttl <ttl_integer> set route-wait <wait_integer>

end

Change how long routes stay in a cluster unit routing table

Change the route-ttl time to control how long routes remain in a cluster unit routing table. The time to live range is 5 to 3600 seconds. The default time to live is 10 seconds.

The time to live controls how long routes remain active in a cluster unit routing table after the cluster unit becomes a primary unit. To maintain communication sessions after a cluster unit becomes a primary unit, routes remain active in the routing table for the route time to live while the new primary unit acquires new routes.

By default, route-ttl is set to 10 which may mean that only a few routes will remain in the routing table after a failover. Normally keeping route-ttl to 10 or reducing the value to 5 is acceptable because acquiring new routes usually occurs very quickly, especially if graceful restart is enabled, so only a minor delay is caused by acquiring new routes.

If the primary unit needs to acquire a very large number of routes, or if for other reasons, there is a delay in acquiring all routes, the primary unit may not be able to maintain all communication sessions.

You can increase the route time to live if you find that communication sessions are lost after a failover so that the primary unit can use synchronized routes that are already in the routing table, instead of waiting to acquire new routes.

Change the time between routing updates

Change the route-hold time to change the time that the primary unit waits between sending routing table updates to subordinate units. The route hold range is 0 to 3600 seconds. The default route hold time is 10 seconds.

To avoid flooding routing table updates to subordinate units, set route-hold to a relatively long time to prevent subsequent updates from occurring too quickly. Flooding routing table updates can affect cluster performance if a great deal of routing information is synchronized between cluster units. Increasing the time between updates means that this data exchange will not have to happen so often.

The route-hold time should be coordinated with the route-wait time.

Change the time the primary unit waits after receiving a routing update

Change the route-wait time to change how long the primary unit waits after receiving routing updates before sending the updates to the subordinate units. For quick routing table updates to occur, set route-wait to a relatively short time so that the primary unit does not hold routing table changes for too long before updating the subordinate units.

The route-wait range is 0 to 3600 seconds. The default route-wait is 0 seconds.

Normally, because the route-wait time is 0 seconds the primary unit sends routing table updates to the subordinate units every time its routing table changes.

Once a routing table update is sent, the primary unit waits the route-hold time before sending the next update.

Usually routing table updates are periodic and sporadic. Subordinate units should receive these changes as soon as possible so route-wait is set to 0 seconds. route-hold can be set to a relatively long time because normally the next route update would not occur for a while.

In some cases, routing table updates can occur in bursts. A large burst of routing table updates can occur if a router or a link on a network fails or changes. When a burst of routing table updates occurs, there is a potential that the primary unit could flood the subordinate units with routing table updates. Flooding routing table updates can affect cluster performance if a great deal of routing information is synchronized between cluster units. Setting route-wait to a longer time reduces the frequency of additional updates are and prevents flooding of routing table updates from occurring.

How to diagnose HA out of sync messages

Leave a reply

How to diagnose HA out of sync messages

This section describes how to use the commands diagnose sys ha showcsum and diagnose debug to diagnose the cause of HA out of sync messages.

If HA synchronization is not successful, use the following procedures on each cluster unit to find the cause.

To determine why HA synchronization does not occur

1. Connect to each cluster unit CLI by connected to the console port.

2. Enter the following commands to enable debugging and display HA out of sync messages.

diagnose debug enable

diagnose debug console timestamp enable diagnose debug application hatalk -1 diagnose debug application hasync -1

Collect the console output and compare the out of sync messages with the information on page 203.

3. Enter the following commands to turn off debugging.

diagnose debug disable diagnose debug reset

To determine what part of the configuration is causing the problem

If the previous procedure displays messages that include sync object 0x30 (for example, HA_SYNC_SETTING_ CONFIGURATION = 0x03) there is a synchronization problem with the configuration. Use the following steps to determine the part of the configuration that is causing the problem.

If your cluster consists of two cluster units, use this procedure to capture the configuration checksums for each unit. If your cluster consists of more that two cluster units, repeat this procedure for all cluster units that returned messages that include 0x30 sync object messages.

1. Connect to each cluster unit CLI by connected to the console port.

2. Enter the following command to turn on terminal capture

diagnose debug enable

3. Enter the following command to stop HA synchronization.

execute ha sync stop

4. Enter the following command to display configuration checksums.

diagnose sys ha showcsum 1

5. Copy the output to a text file.

6. Repeat for all affected units.

7. Compare the text file from the primary unit with the text file from each cluster unit to find the checksums that do not match.

You can use a diff function to compare text files.

8. Repeat steps 4 to 7 for each checksum level:

diagnose sys ha showcsum 2 diagnose sys ha showcsum 3 diagnose sys ha showcsum 4 diagnose sys ha showcsum 5 diagnose sys ha showcsum 6 diagnose sys ha showcsum 7 diagnose sys ha showcsum 8

9. When the non-matching checksum is found, attempt to drill down further. This is possible for objects that have sub-components.

For example you can enter the following commands:

diagnose sys ha showcsum system.global diagnose sys ha showcsum system.interface

Generally it is the first non-matching checksum in one of the levels that is the cause of the synchronization problem.

10. Attempt to can remove/change the part of the configuration that is causing the problem. You can do this by making configuration changes from the primary unit or subordinate unit CLI.

11. Enter the following commands to start HA configuration and stop debugging:

execute ha sync start diagnose debug disable diagnose debug reset

Recalculating the checksums to resolve out of sync messages

Sometimes an error can occur when checksums are being calculated by the cluster. As a result of this calculation error the CLI console could display out of sync error messages even though the cluster is otherwise operating normally. You can also sometimes see checksum calculation errors in diagnose sys ha showcsum command output when the checksums listed in the debugzone output don’t match the checksums in the checksum part of the output.

One solution to this problem could be to re-calculate the checksums. The re-calculated checksums should match and the out of sync error messages should stop appearing.

You can use the following command to re-calculate HA checksums:

diagnose sys ha csum-recalculate [<vdom-name> | global]

Just entering the command without options recalculates all checksums. You can specify a VDOM name to just recalculate the checksums for that VDOM. You can also enter global to recalculate the global checksum.

Synchronizing the configuration

Leave a reply

Synchronizing the configuration

The FGCP uses a combination of incremental and periodic synchronization to make sure that the configuration of all cluster units is synchronized to that of the primary unit.

The following settings are not synchronized between cluster units:

HA override.
HA device priority.
The virtual cluster priority.
The FortiGate unit host name.
The HA priority setting for a ping server (or dead gateway detection) configuration.
The system interface settings of the HA reserved management interface.
The HA default route for the reserved management interface, set using the ha-mgmt-interface-gateway option of the config system ha command.

The primary unit synchronizes all other configuration settings, including the other HA configuration settings.

Disabling automatic configuration synchronization

In some cases you may want to use the following command to disable automatic synchronization of the primary unit configuration to all cluster units.

config system ha

set sync-config disable end

When this option is disabled the cluster no longer synchronizes configuration changes. If a device failure occurs, the new primary unit may not have the same configuration as the failed primary unit. As a result, the new primary unit may process sessions differently or may not function on the network in the same way.

In most cases you should not disable automatic configuration synchronization. However, if you have disabled this feature you can use the execute ha synchronize command to manually synchronize a subordinate unit’s configuration to that of the primary unit.

You must enter execute ha synchronize commands from the subordinate unit that you want to synchronize with the primary unit. Use the execute ha manage command to access a subordinate unit CLI.

For example, to access the first subordinate unit and force a synchronization at any time, even if automatic synchronization is disabled enter:

execute ha manage 0

execute ha synchronize start

You can use the following command to stop a synchronization that is in progress.

execute ha synchronize stop

Incremental synchronization

When you log into the cluster web-based manager or CLI to make configuration changes, you are actually logging into the primary unit. All of your configuration changes are first made to the primary unit. Incremental synchronization then immediately synchronizes these changes to all of the subordinate units.

When you log into a subordinate unit CLI (for example using execute ha manage) all of the configuration changes that you make to the subordinate unit are also immediately synchronized to all cluster units, including the primary unit, using the same process.

Incremental synchronization also synchronizes other dynamic configuration information such as the DHCP server address lease database, routing table updates, IPsec SAs, MAC address tables, and so on. SeeAn introduction to the FGCP on page 1310 for more information about DHCP server address lease synchronization and Synchronizing kernel routing tables on page 1523 for information about routing table updates.

Whenever a change is made to a cluster unit configuration, incremental synchronization sends the same configuration change to all other cluster units over the HA heartbeat link. An HA synchronization process running on the each cluster unit receives the configuration change and applies it to the cluster unit. The HA synchronization process makes the configuration change by entering a CLI command that appears to be entered by the administrator who made the configuration change in the first place.

Synchronization takes place silently, and no log messages are recorded about the synchronization activity. However, log messages can be recorded by the cluster units when the synchronization process enters CLI commands. You can see these log messages on the subordinate units if you enable event logging and set the minimum severity level to Information and then check the event log messages written by the cluster units when you make a configuration change.

You can also see these log messages on the primary unit if you make configuration changes from a subordinate unit.

Periodic synchronization

Incremental synchronization makes sure that as an administrator makes configuration changes, the configurations of all cluster units remain the same. However, a number of factors could cause one or more cluster units to go out of sync with the primary unit. For example, if you add a new unit to a functioning cluster, the configuration of this new unit will not match the configuration of the other cluster units. Its not practical to use incremental synchronization to change the configuration of the new unit.

Periodic synchronization is a mechanism that looks for synchronization problems and fixes them. Every minute the cluster compares the configuration file checksum of the primary unit with the configuration file checksums of each of the subordinate units. If all subordinate unit checksums are the same as the primary unit checksum, all cluster units are considered synchronized.

If one or more of the subordinate unit checksums is not the same as the primary unit checksum, the subordinate unit configuration is considered out of sync with the primary unit. The checksum of the out of sync subordinate unit is checked again every 15 seconds. This re-checking occurs in case the configurations are out of sync because an incremental configuration sequence has not completed. If the checksums do not match after 5 checks the subordinate unit that is out of sync retrieves the configuration from the primary unit. The subordinate unit then reloads its configuration and resumes operating as a subordinate unit with the same configuration as the primary unit.

The configuration of the subordinate unit is reset in this way because when a subordinate unit configuration gets out of sync with the primary unit configuration there is no efficient way to determine what the configuration differences are and to correct them. Resetting the subordinate unit configuration becomes the most efficient way to resynchronize the subordinate unit.

Synchronization requires that all cluster units run the same FortiOS firmware build. If some cluster units are running different firmware builds, then unstable cluster operation may occur and the cluster units may not be able to synchronize correctly.

Re-installing the firmware build running on the primary unit forces the primary unit to upgrade all cluster units to the same firmware build.

Console messages when configuration synchronization succeeds

When a cluster first forms, or when a new unit is added to a cluster as a subordinate unit, the following messages appear on the CLI console to indicate that the unit joined the cluster and had its configuring synchronized with the primary unit.

slave’s configuration is not in sync with master’s, sequence:0 slave’s configuration is not in sync with master’s, sequence:1

slave’s configuration is not in sync with master’s, sequence:2 slave’s configuration is not in sync with master’s, sequence:3 slave’s configuration is not in sync with master’s, sequence:4 slave starts to sync with master

logout all admin users

slave succeeded to sync with master

Console messages when configuration synchronization fails

If you connect to the console of a subordinate unit that is out of synchronization with the primary unit, messages similar to the following are displayed.

slave is not in sync with master, sequence:0. (type 0x3) slave is not in sync with master, sequence:1. (type 0x3) slave is not in sync with master, sequence:2. (type 0x3) slave is not in sync with master, sequence:3. (type 0x3) slave is not in sync with master, sequence:4. (type 0x3) global compared not matched

If synchronization problems occur the console message sequence may be repeated over and over again. The messages all include a type value (in the example type 0x3). The type value can help Fortinet Support diagnose the synchronization problem.

HA out of sync object messages and the configuration objects that they reference

Out of Sync Message Configuration Object

HA_SYNC_SETTING_CONFIGURATION = 0x03 /data/config

HA_SYNC_SETTING_AV = 0x10

HA_SYNC_SETTING_VIR_DB = 0x11 /etc/vir

HA_SYNC_SETTING_SHARED_LIB = 0x12 /data/lib/libav.so

HA_SYNC_SETTING_SCAN_UNIT

0x13

/bin/scanunitd

HA_SYNC_SETTING_IMAP_PRXY

0x14

/bin/imapd

HA_SYNC_SETTING_SMTP_PRXY

0x15

/bin/smtp

HA_SYNC_SETTING_POP3_PRXY

0x16

/bin/pop3

HA_SYNC_SETTING_HTTP_PRXY

0x17

/bin/thttp

HA_SYNC_SETTING_FTP_PRXY = 0x18 /bin/ftpd

HA_SYNC_SETTING_FCNI

0x19

/etc/fcni.dat

HA_SYNC_SETTING_FDNI

0x1a

/etc/fdnservers.dat

HA_SYNC_SETTING_FSCI

0x1b

/etc/sci.dat

HA_SYNC_SETTING_FSAE

0x1c

/etc/fsae_adgrp.cache

HA_SYNC_SETTING_IDS = 0x20 /etc/ids.rules

Out of Sync Message Configuration Object

HA_SYNC_SETTING_IDSUSER_RULES = 0x21 /etc/idsuser.rules

HA_SYNC_SETTING_IDSCUSTOM = 0x22

HA_SYNC_SETTING_IDS_MONITOR = 0x23 /bin/ipsmonitor

HA_SYNC_SETTING_IDS_SENSOR = 0x24 /bin/ipsengine

HA_SYNC_SETTING_NIDS_LIB = 0x25 /data/lib/libips.so

HA_SYNC_SETTING_WEBLISTS = 0x30

HA_SYNC_SETTING_CONTENTFILTER = 0x31 /data/cmdb/webfilter.bword

HA_SYNC_SETTING_URLFILTER = 0x32 /data/cmdb/webfilter.urlfilter

HA_SYNC_SETTING_FTGD_OVRD = 0x33 /data/cmdb/webfilter.fgtd-ovrd

HA_SYNC_SETTING_FTGD_LRATING = 0x34 /data/cmdb/webfilter.fgtd-ovrd

HA_SYNC_SETTING_EMAILLISTS = 0x40

HA_SYNC_SETTING_EMAILCONTENT = 0x41 /data/cmdb/spamfilter.bword

HA_SYNC_SETTING_EMAILBWLIST = 0x42 /data/cmdb/spamfilter.emailbwl

HA_SYNC_SETTING_IPBWL = 0x43 /data/cmdb/spamfilter.ipbwl

HA_SYNC_SETTING_MHEADER = 0x44 /data/cmdb/spamfilter.mheader

HA_SYNC_SETTING_RBL = 0x45 /data/cmdb/spamfilter.rbl

HA_SYNC_SETTING_CERT_CONF = 0x50 /etc/cert/cert.conf

HA_SYNC_SETTING_CERT_CA = 0x51 /etc/cert/ca

HA_SYNC_SETTING_CERT_LOCAL = 0x52 /etc/cert/local

HA_SYNC_SETTING_CERT_CRL = 0x53 /etc/cert/crl

HA_SYNC_SETTING_DB_VER = 0x55

HA_GET_DETAIL_CSUM = 0x71

HA_SYNC_CC_SIG = 0x75 /etc/cc_sig.dat

Out of Sync Message Configuration Object

HA_SYNC_CC_OP = 0x76 /etc/cc_op

HA_SYNC_CC_MAIN = 0x77 /etc/cc_main

HA_SYNC_FTGD_CAT_LIST = 0x7a /migadmin/webfilter/ublock/ftgd/data/

Comparing checksums of cluster units

You can use the diagnose sys ha showcsum command to compare the configuration checksums of all cluster units. The output of this command shows checksums labelled global and all as well as checksums for each of the VDOMs including the root VDOM. The get system ha-nonsync-csum command can be used to display similar information; however, this command is intended to be used by FortiManager.

The primary unit and subordinate unit checksums should be the same. If they are not you can use the execute ha synchronize command to force a synchronization.

The following command output is for the primary unit of a cluster that does not have multiple VDOMs enabled:

diagnose sys ha showcsum is_manage_master()=1, is_root_master()=1 debugzone

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

checksum

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

The following command output is for a subordinate unit of the same cluster:

diagnose sys ha showcsum is_manage_master()=0, is_root_master()=0 debugzone

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

checksum

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

The following example shows using this command for the primary unit of a cluster with multiple VDOMs. Two

VDOMs have been added named test and Eng_vdm. From the primary unit:

config global

sys ha showcsum

is_manage_master()=1, is_root_master()=1 debugzone

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9

test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

checksum

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

From the subordinate unit:

config global

diagnose sys ha showcsum is_manage_master()=0, is_root_master()=0 debugzone

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

checksum

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain

Leave a reply

Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain

A network may experience packet loss when two FortiGate HA clusters have been deployed in the same broadcast domain. Deploying two HA clusters in the same broadcast domain can result in packet loss because of MAC address conflicts. The packet loss can be diagnosed by pinging from one cluster to the other or by pinging both of the clusters from a device within the broadcast domain. You can resolve the MAC address conflict by changing the HA Group ID configuration of the two clusters. The HA Group ID is sometimes also called the Cluster ID.

This section describes a topology that can result in packet loss, how to determine if packets are being lost, and how to correct the problem by changing the HA Group ID.

Packet loss on a network can also be caused by IP address conflicts. Finding and fixing IP address conflicts can be difficult. However, if you are experiencing packet loss and your net- work contains two FortiGate HA clusters you can use the information in this article to elim- inate one possible source of packet loss.

Changing the HA group ID to avoid MAC address conflicts

Change the Group ID to change the virtual MAC address of all cluster interfaces. You can change the Group ID

from the FortiGate CLI using the following command:

config system ha

set group-id <id_integer>

end

Example topology

The topology below shows two clusters. The Cluster_1 internal interfaces and the Cluster_2 port 1 interfaces are both connected to the same broadcast domain. In this topology the broadcast domain could be an internal network. Both clusters could also be connected to the Internet or to different networks.

Example HA topology with possible MAC address conflicts

Ping testing for packet loss

If the network is experiencing packet loss, it is possible that you will not notice a problem unless you are constantly pinging both HA clusters. During normal operation of the network you also might not notice packet loss because the loss rate may not be severe enough to timeout TCP sessions. Also many common types if TCP traffic, such as web browsing, may not be greatly affected by packet loss. However, packet loss can have a significant effect on real time protocols that deliver audio and video data.

To test for packet loss you can set up two constant ping sessions, one to each cluster. If packet loss is occurring the two ping sessions should show alternating replies and timeouts from each cluster.

Cluster_1 Cluster_2

reply timeout

Cluster_1 Cluster_2

reply timeout

timeout reply

reply timeout

timeout reply

Viewing MAC address conflicts on attached switches

If two HA clusters with the same virtual MAC address are connected to the same broadcast domain (L2 switch or hub), the MAC address will conflict and bounce between the two clusters. This example Cisco switch MAC address table shows the MAC address flapping between different interfaces (1/0/1 and 1/0/4).

0009.0f09.0002 DYNAMIC Gi1/0/1
0009.0f09.0002 DYNAMIC Gi1/0/4

Displaying the virtual MAC address

Leave a reply

Displaying the virtual MAC address

Every FortiGate unit physical interface has two MAC addresses: the current hardware address and the permanent hardware address. The permanent hardware address cannot be changed, it is the actual MAC address of the interface hardware. The current hardware address can be changed. The current hardware address is the address seen by the network. For a FortiGate unit not operating in HA, you can use the following command to change the current hardware address of the port1 interface:

config system interface edit port1

set macaddr <mac_address>

end end

For an operating cluster, the current hardware address of each cluster unit interface is changed to the HA virtual MAC address by the FGCP. The macaddr option is not available for a functioning cluster. You cannot change an interface MAC address and you cannot view MAC addresses from the system interface CLI command.

You can use the get hardware nic <interface_name_str> command to display both MAC addresses for any FortiGate interface. This command displays hardware information for the specified interface. Depending on their hardware configuration, this command may display different information for different interfaces. You can use this command to display the current hardware address as Current_HWaddr and the permanent hardware address as Permanent_HWaddr. For some interfaces the current hardware address is displayed as MAC. The command displays a great deal of information about the interface so you may have to scroll the output to find the hardware addresses.

You can also use the diagnose hardware deviceinfo nic <interface_str> command to display both MAC addresses for any FortiGate interface.

Before HA configuration the current and permanent hardware addresses are the same. For example for one of the units in Cluster_1:

FGT60B3907503171 # get hardware nic internal

MAC: 02:09:0f:78:18:c9

Permanent_HWaddr: 02:09:0f:78:18:c9

During HA operation the current hardware address becomes the HA virtual MAC address, for example for the units in Cluster_1:

FGT60B3907503171 # get hardware nic internal

MAC: 00:09:0f:09:00:02

Permanent_HWaddr: 02:09:0f:78:18:c9

The following command output for Cluster_2 shows the same current hardware address for port1 as for the internal interface of Cluster_2, indicating a MAC address conflict.

FG300A2904500238 # get hardware nic port1

MAC: 00:09:0f:09:00:02

Permanent_HWaddr: 00:09:0F:85:40:FD

Fortinet GURU

FortiGate Guides and MORE!

Category Archives: FortiOS 5.4 Handbook

Link failover (port monitoring or interface monitoring)

Synchronizing IPsec VPN SAs

Bidirectional Forwarding Detection (BFD) enabled BGP graceful restart

Synchronizing kernel routing tables

How to diagnose HA out of sync messages

Synchronizing the configuration

Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain

Displaying the virtual MAC address