How to check CPU and memory resources
System resources are shared and a number of processes run simultaneously on the FortiGate unit. If one of these processes consumes nearly all the resources.
A quick way to monitor CPU and memory usage is on the System Dashboard using the System Resources widgets. They have both a visual gauge displayed to show you the usage.
To check the system resources on your FortiGate unit, run the following CLI command:
FGT# get system performance status
This command provides a quick and easy snapshot of the FortiGate.
The first line of output shows the CPU usage by category. A FortiGate that is doing nothing will look like:
CPU states: 0% user 0% system 0% nice 100% idle
However, if your network is running slow you might see something like:
CPU states: 1% user 98% system 0% nice 1% idle
This line shows that all the CPU is used up by system processes. Normally this should not happen as it shows the FortiGate is overloaded for some reason. If you see this overloading, you should investigate farther as it’s possible a process, such as scanunitid, is using all the resources to scan traffic, in which case you need to reduce the amount of traffic being scanned by blocking unwanted protocols, configuring more security policies to limit scanning to certain protocols, or similar actions. It is also possible that a hacker has gained access to your network and is overloading it with malicious activity such as running a spam server or using zombie PCs to attack other networks on the Internet. You can get additional CPU related information with the CLI command get system performance top. This command shows you all the top processes running on the FortiGate unit (names on the left) and their CPU usage. If a process is using most of the CPU cycles, investigate it to determine if it’s normal activity.
The second line of output from get system performance status shows the memory usage. Memory usage should not exceed 90 percent. If memory is too full, some processes will not be able to function properly. For example, if the system is running low on memory, antivirus scanning will go into failopen mode where it will start dropping connections or bypass the antivirus system.
The other lines of output, such as average network usage, average session setup rate, viruses caught, and IPS attacks blocked can also help you determine why system resource usage it high. For example, if network usage is high it will result in high traffic processing on the FortiGate, or if the session setup rate is very low or zero the proxy may be overloaded and not able to do its job.
How to troubleshoot high memory usage
As with any system, FortiOS has a finite set of hardware resources such as memory and all the running processes share that memory. Depending on their workload, each process will use more or less as needed, usually more in high traffic situations. If some processes use all the available memory, other processes will have no memory available and not be able to function.
When high memory usage happens, you may experience services that appear to freeze up and connections are lost or new connections are refused.
If you are seeing high memory usage in the System Resources widget, it could mean that the unit is dealing with high traffic volume, which may be causing the problem, or it could be when the unit is dealing with connection pool limits affecting a single proxy. If the unit is receiving large volumes of traffic on a specific proxy, it is possible that the unit will exceed the connection pool limit. If the number of free connections within a proxy connection pool reaches zero, problems may occur.
Use the following CLI command, which uses the antivirus failopen feature. Setting it to idledrop will drop connections based on the clients that have the most connections open. This helps to determine the behavior of the FortiGate antivirus system if it becomes overloaded in high traffic.
config system global
set av-failopen idledrop end
Use the following CLI command, which gives you information about current memory usage:
diagnose hardware sysinfo memory
Sample output:
total: used: free: shared: buffers: cached: shm:
Mem: 2074185728 756936704 1317249024 0 20701184 194555904 161046528
Swap: 0 0 0
MemTotal: 2025572 kB MemFree: 1286376 kB MemShared: 0 kB Buffers: 20216 kB Cached: 189996 kB SwapCached: 0 kB Active: 56644 kB Inactive: 153648 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 2025572 kB LowFree: 1286376 kB SwapTotal: 0 kB SwapFree: 0 kB
How to troubleshoot high CPU usage
FortiOS has many features. If many of them are used at the same time, it can quickly use up all the CPU resources. When this happens, you will experience connection related problems stemming from the FortiOS unit trying to manage its workload by refusing new connections, or even more aggressive methods.
Some examples of features that are CPU intensive are VPN high level encryption, having all traffic undergo all possible scanning, logging all traffic, and packets, and dashboard widgets that frequently update their data.
1. Determine how high the CPU usage is currently.There are two main ways to do this. The easiest is to go to System > Dashboard > Status and look at the system resources widget. This is a dial gauge that displays a percentage use for the CPU. If its at the red-line, you should take action. The other method is to use the Dashboard CLI widget to enter diag sys top.
Sample output:
Run Time: 11 days, 23 hours and 36 minutes
0U, 0S, 98I; 1977T, 758F, 180KF newcli 286 R 0.1 0.8 ipsengine 78 S < 0.0 3.1 ipsengine 64 S < 0.0 3.0 ipsengine 77 S < 0.0 3.0 ipsengine 68 S < 0.0 2.9 ipsengine 66 S < 0.0 2.9 ipsengine 79 S < 0.0 2.9 scanunitd 133 S < 0.0 1.8 pyfcgid 267 S 0.0 1.8 pyfcgid 269 S 0.0 1.7 pyfcgid 268 S 0.0 1.6 httpsd 139 S 0.0 1.6 pyfcgid 266 S 0.0 1.5 scanunitd 131 S < 0.0 1.4 scanunitd 132 S < 0.0 1.4 proxyworker 90 S 0.0 1.3 cmdbsvr 43 S 0.0 1.1 proxyworker 91 S 0.0 1.1 miglogd 55 S 0.0 1.1 httpsd 135 S 0.0 1.0
Where the codes displayed on the second output line mean the following:
- U is % of user space applications using CPU. In the example, 0U means 0% of the user space applications are using CPU.
- S is % of system processes (or kernel processes) using CPU. In the example, 0S means 0% of the system processes are using the CPU.
- I is % of idle CPU. In the example, 98I means the CPU is 98% idle.
- T is the total FortiOS system memory in Mb. In the example, 1977T means there are 1977 Mb of system memory.
- F is free memory in Mb. In the example, 758F means there is 758 Mb of free memory.
- KF is the total shared memory pages used. In the example, 180KF means the system is using 180 shared memory pages.
Each additional line of the command output displays information for each of the processes running on the FortiGate unit. For example, the third line of the output is:
newcli 286 R 0.1 0.8
Where:
- newcli is the process name. Other process names can include ipsengine, sshd, cmdbsrv, httpsd,scanunitd, and miglogd.
- 286 is the process ID. The process ID can be any number.
- R is the current state of the process. The process state can be:
- R running
- S sleep
- Z zombie
- D disk sleep.
- 0.1 is the amount of CPU that the process is using. CPU usage can range from 0.0 for a process that is sleeping to higher values for a process that is taking a lot of CPU time.
- 0.8 is the amount of memory that the process is using. Memory usage can range from 0.1 to 5.5 and higher. Enter the following single-key commands when diagnose sys top is running:
- Press q to quit and return to the normal CLI prompt.
- Press p to sort the processes by the amount of CPU that the processes are using.
- Press m to sort the processes by the amount of memory that the processes are using.
2. Determine what features are using most of the CPU resources.
There is a command in the CLI to let you see the top few processes currently running that use the most CPU resources. The CLI command get system performance top outputs a table of information. You are interested in the second most right column, CPU usage by percentage. If the top few entries are using most of the CPU, note which processes they are and investigate those features to try and reduce their CPU load. Some examples of processes you will see include:
- ipsengine — the IPS engine that scans traffic for intrusions
- scanunitd — antivirus scanner
- httpsd — secure HTTP
- iked — internet key exchange (IKE) in use with IPsec VPN tunnels
- newcli — active whenever you are accessing the CLI
- sshd — there are active secure socket connections
- cmdbsrv — the command database server application
Go to the features that are at the top of the list and look for evidence of them overusing the CPU. Generally the monitor for a feature is a good place to start.
3. Check for unnecessary CPU “wasters”.
These are some best practises that will reduce your CPU usage, even if you are not experiencing high CPU usage. Note that if you require a feature this section tells you to turn off, ignore it.
- Use hardware acceleration wherever possible to offload tasks from the CPU. Offloading tasks such as encryption frees up the CPU for other tasks.
- Avoid the use of GUI widgets that require computing cycles, such as the Top Sessions widget. These widgets are constantly polling the system for their information, which uses CPU and other resources.
- Schedule antivirus, IPS, and firmware updates during off peak hours. Usually these don’t consume CPU resources but they can disrupt normal operation.
- Check the log levels and which events are being logged. This is the severity of the messages that are recorded.
- Consider going up one level to reduce the amount of logging. Also if there are events you do not need to monitor, remove them from the list.
- Log to FortiCloud instead of memory or Disk. Logging to memory quickly uses up resources. Logging to local disk will impact overall performance and reduce the lifetime of the unit. Fortinet recommends logging to FortiCloud which doesn’t use much CPU.
- If the disk is almost full, transfer the logs or data off the disk to free up space. When a disk is almost full it consumes a lot of resources to find the free space and organize the files.
- If you have packet logging enabled, consider disabling it. When it’s enabled it records every packet that comes through that policy.
- Halt all sniffers and traces.
- Ensure you are not scanning traffic twice. If traffic enters the FortiGate unit on one interface, goes out another, and then comes back in again that traffic does not need to be rescanned. Doing so is a waste of resources. However, ensure that traffic truly is being scanned once.
- Reduce the session timers to close unused sessions faster. To do this in the CLI enter the following commands and values. These values reduce the values from defaults. Note that tcp-timewait has 10 seconds added by the system by default.
config system global
set tcp-halfclose-timer 30 set tcp-halfopen-timer 30 set tcp-timewait-timer 0 set udp-idle-timer 60
end
- Enable only features that you need under System > Config > Features.
4. When CPU usage is under control, use SNMP to monitor CPU usage. Alternately, use logging to record CPU and memory usage every 5 minutes.
Once things are back to normal, you should set up a warning system to alert you of future CPU overusage. A common method to do this is with SNMP. SNMP monitors many values on the FortiOS and allows you to set high water marks that will generate events. You run an application on your computer to watch for and record these events. Go to System > Config > SNMP to enable and configure an SNMP community. If this method is too complicated, you can use the System Resources widget to record CPU usage. However, this method will not alert you to problems – it will just record them as they happen.
It would be nice to add the commands used to kill a process. I am experiencing a high CPU usage in FortiManager. I cannot find the exact command to kill the process using the cpu. Can you someone help plz?
Get process_id…
#diag sys top
Kill process…
#diag sys kill 11 process_id
If the above does not kill, this will force it…
#fnsysctl kill –9 process_id
Indeed
Indeed, indeed. Watson, could you share which process is was and what you did you fix?
Is there any way to “lsof” a process? the trace and dump stuff was not enough. Need to find out more about what a particular process is doing before just killing it. Any ideas?
Is there any way to “lsof” a process? the trace and dump stuff was not enough. Need to find out more about what a particular process is doing before just killing it. Any ideas?
CPU states: 1% user 98% system 0% nice 1% idle
This line shows that all the CPU is used up by system processes. Normally this should not happen as it shows the FortiGate is overloaded for some reason. If you see this overloading, you should investigate farther as it’s possible a process, such as scanunitid, is using all the resources to scan traffic,
==> this is not correct. If the system space is busy, it is not related to a process but is most likely related to high CPS, session revalidation and more
if user space is busy, it is related to a deamon