Use Link Health Monitor and e-mail alerts
Another tool available to you on FortiGate units is the Link Health Monitor, useful in dead gateway detection. This feature allows the FortiGate unit to ping a gateway at regular intervals to ensure it is online and working. When the gateway is not accessible, that interface is marked as down.
To detect possible routing loops with Link Health Monitor and e-mail alerts
Use the following command to configure dead gateway detection:
config system link-monitor edit “test”
set srcintf “internal4” set server “8.8.8.8”
set interval 5 set failtime 1
end
Set the Interval (how often to send a ping) and failtime (how many lost pings is considered a failure). A smaller interval and smaller number of lost pings will result in faster detection, but will create more traffic on your network.
To configure notification of failed gateways
1. Go to Log & Report > Report > Local and enable Email Generated Reports.
2. Enter your email details.
3. Select Apply.
You might also want to log CPU and Memory usage as a network outage will cause your CPU activity to spike.
If you have VDOMs configured, you will have to enter the basic SMTP server inform- ation in the Global section, and the rest of the configuration within the VDOM that includes this interface.
After this configuration, when this interface on the FortiGate unit cannot connect to the next router, the FortiGate unit will bring down the interface and alert you with an email about the outage.
Look at the packet flow
If you want to see what is happening on your network, look at the packets travelling on the network. This is same idea as police pulling over a car and asking the driver where they have been, and what the conditions were like.
The method used in the troubleshooting sections Debugging IPv6 on RIPng on page 316 and on debugging the packet flow apply here as well. In this situation, you are looking for routes that have metrics higher than 15 as that indicates they are unreachable.
Ideally if you debug the flow of the packets, and record the routes that are unreachable, you can create an accurate picture of the network outage.
Action to take on discovering a routing loop
Once you have mapped the problem on your network, and determined it is in fact a routing loop there are a number of steps to take in correcting it.
1. Get any offline routers back online. This may be a simple reboot, or you may have to replace hardware. Often this first step will restore your network to its normal operation, once the routing tables finish being updated.
2. Change your routing configuration on the edges of the outage. Even if step 1 brought your network back online, you should consider making changes to improve your network before the next outage occurs. These changes can include configuring features like holddowns and triggers for updates, split horizon, and poison reverse updates.
Holddowns and Triggers for updates
One of the potential problems with RIP is the frequent routing table updates that are sent every time there is a change to the routing table. If your network has many RIP routers, these updates can start to slow down your network. Also if you have a particular route that has bad hardware, it might be going up and down frequently, which will generate an overload of routing table updates.
One of the most common solutions to this problem is to use holddown timers and triggers for updates. These slow down the updates that are sent out, and help prevent a potential flood.
Holddown Timers
The holddown timer activates when a route is marked down. Until the timer expires, the router does not accept any new information about that route. This is very useful if you have a flapping route because it will prevent your router from sending out updates and being part of the problem in flooding the network. The potential down side is if the route comes back up while the timer has not expired, that route will be unavailable for that period of time. This is only a problem if this is a major route used by the majority of your traffic. Otherwise, this is a minor problem as traffic can be re-routed around the outage.
Triggers
Triggered RIP is an alternate update structure that is based around limiting updates to only specific circumstances. The most basic difference is that the routing table will only be updated when a specific request is sent to update, as opposed to every time the routing table changes. Updates are also triggered when a unit is ‘powered on’, which can include addition of new interfaces or devices to the routing structure, or devices returning to being available after being unreachable.
Split horizon and Poison reverse updates
Split horizon is best explained with an example. You have three routers linked serially, let’s call them A, B, and C. A is only linked to B, C is only linked to B, and B is linked to both A and C. To get to C, A must go through B. If the link to C goes down, it is possible that B will try to use A’s route to get to C. This route is A-B-C, so it will loop endlessly between A and B.
This situation is called a split horizon because from B’s point of view the horizon stretches out in each direction, but in reality it only is on one side.
Poison reverse is the method used to prevent routes from running into split horizon problems. Poison reverse “poisons” routes away from the destination that use the current router in their route to the destination. This “poisoned” route is marked as unreachable for routers that cannot use it. In RIP this means that route is marked with a distance of 16.
Debugging IPv6 on RIPng
The debug commands are very useful to see what is happening on the network at the packet level. There are a few changes to debugging the packet flow when debugging IPv6.
The following CLI commands specify both IPv6 and RIP, so only RIPng packets will be reported. The output from these commands will show you the RIPng traffic on your FortiGate unit including RECV, SEND, and UPDATE actions.
The addresses are in IPv6 format.
diagnose debug enable
diagnose ipv6 router rip level info diagnose ipv6 router rip all enable
These three commands will:
- turn on debugging in general
- set the debug level to information, a verbose reporting level
- turn on all rip router settings
Part of the information displayed from the debugging is the metric (hop count). If the metric is 16, then that destination is unreachable since the maximum hop count is 15.
In general, you should see an update announcement, followed by the routing table being sent out, and a received reply in response.
For more information, see Troubleshooting RIP on page 312.
