Tag Archives: npu offload fortigate

NP4 IPsec VPN offloading configuration example

NP4 IPsec VPN offloading configuration example

Hardware accelerated IPsec processing, involving either partial or full offloading, can be achieved in either tunnel or interface mode IPsec configurations.

To achieve offloading for both encryption and decryption:

  • In Phase 1 configuration’s Advanced section, Local Gateway IP must be specified as an IP address of a network interface associated with a port attached to a network processor. (In other words, if Phase 1’s Local Gateway IP is Main Interface IP, or is specified as an IP address that is not associated with a network interface associated with a port attached to a network processor, IPsec network processing is not offloaded.)
  • In Phase 2 configuration’s P2 Proposal section, if the checkbox “Enable replay detection” is enabled, enc- offload-antireplay and dec-offload-antireplay must be set to enable in the CLI.
  • offload-ipsec-host must be set to enable in the CLI.

This section contains example IPsec configurations whose IPsec encryption and decryption processing is hardware accelerated by an NP4 unit contained in a FortiGate-5001B at both ends of the VPN tunnel.

Hardware accelerated IPsec VPN does not require both tunnel endpoints to have the same network processor model. However, if hardware is not symmetrical, the packet forwarding rate is limited by the slower side.

NP4 IPsec VPN offloading

NP4 IPsec VPN offloading

NP4 processors improve IPsec tunnel performance by offloading IPsec encryption and decryption. Requirements for hardware accelerated IPsec encryption or decryption are a modification of general offloading requirements. Differing characteristics are:

  • Origin can be local host (the FortiGate unit)
  • In Phase 1 configuration, Local Gateway IP must be specified as an IP address of a network interface for a port attached to a network processor
  • SA must have been received by the network processor
  • in Phase 2 configuration:
  • encryption algorithm must be DES, 3DES, AES-128, AES-192, AES-256, or null
  • authentication must be MD5, SHA1, or null
  • if encryption is null, authentication must not also be null
  • if replay detection is enabled, enc-offload-antireplay must also be enable in the CLI

If replay detection is enabled in the Phase 2 configuration, you can enable or disable IPsec encryption and decryption offloading from the CLI. Performance varies by those CLI options and the percentage of packets requiring encryption or decryption. For details, see NP4 IPsec VPN offloading on page 1261

To apply hardware accelerated encryption and decryption, the FortiGate unit’s main processing resources must first perform Phase 1 negotiations to establish the security association (SA). The SA includes cryptographic processing instructions required by the network processor, such as which encryption algorithms must be applied to the tunnel. After ISAKMP negotiations, the FortiGate unit’s main processing resources send the SA to the network processor, enabling the network processor to apply the negotiated hardware accelerated encryption or decryption to tunnel traffic.

 

Possible accelerated cryptographic paths are:

  • IPsec decryption offload
  • Ingress ESP packet > Offloaded decryption > Decrypted packet egress (fast path)
  • Ingress ESP packet > Offloaded decryption > Decrypted packet to FortiGate unit’s main processing resources
  • IPsec encryption offload
  • Ingress packet > Offloaded encryption > Encrypted (ESP) packet egress (fast path)
  • Packet from FortiGate unit’s main processing resources > Offloaded encryption > Encrypted (ESP) packet egress

Increasing NP4 offloading capacity using link aggregation groups (LAGs)

Increasing NP4 offloading capacity using link aggregation groups (LAGs)

NP4 processors can offload sessions received by interfaces in link aggregation groups (LAGs) (IEEE 802.3ad). A LAG combines more than one physical interface into a group that functions like a single interface with a higher capacity than a single physical interface. For example, you could use a LAG if you want to offload sessions on a 3Gbps link by adding three 1Gbps interfaces to the same LAG.

All offloaded traffic types are supported by LAGs, including IPsec VPN traffic. Just like with normal interfaces, traffic accepted by a LAG is offloaded by the NP4 processor connected to the interfaces in the LAG that receive the traffic to be offloaded. If all interfaces in a LAG are connected to the same NP4 processor, traffic received by that LAG is offloaded by that NP4 processor. The amount of traffic that can be offloaded is limited by the capacity of the NP4 processor.

If a FortiGate has two or more NP4 processors connected by an integrated switch fabric (ISF), you can use LAGs to increase offloading by sharing the traffic load across multiple NP4 processors. You do this by adding physical interfaces connected to different NP4 processors to the same LAG.

Adding a second NP4 processor to a LAG effectively doubles the offloading capacity of the LAG. Adding a third further increases offloading. The actual increase in offloading capacity may not actually be doubled by adding a second NP4 or tripled by adding a thrid. Traffic and load conditions and other factors may limit the actual offloading result.

The increase in offloading capacity offered by LAGs and multiple NP4s is supported by the ISF that allows multiple NP4 processors to share session information. On models that have more than one NP4 and no ISF, if you attempt to add interfaces connected to different NP4 processors to a LAG the system displays an error message.

 

There are also a few limitations to LAG NP4 offloading support for IPsec VPN:

  • IPsec VPN anti-replay protection cannot be used if IPSec is configured on a LAG that has interfaces connected to multiple NP4 processors.
  • Using a LAG connected to multiple NP4 processors for decrypting incoming IPsec VPN traffic may cause some of the incoming traffic to be decrypted by the CPU. So this configuration is not recommended since not all decryption is offloaded. (Using a LAG connected to multiple NP4 processors for encrypting outgoing IPsec VPN traffic is supported with no limitations.)
  • Because the encrypted traffic for one IPsec VPN tunnel has the same 5-tuple, the traffic from one tunnel can only can be balanced to one interface in a LAG. This limits the maximum throughput for one IPsec VPN tunnel in an NP4 LAG group to 1Gbps.

 

NP4 traffic shaping offloading

Accelerated Traffic shaping is supported with the following limitations.

  • NP4 processors support policy-based traffic shaping. However, fast path traffic and traffic handled by the FortiGate CPU (slow path) are controlled separately, which means the policy setting on fast path does not consider the traffic on the slow path.
  • The port based traffic policing as defined by the inbandwidth and outbandwidth CLI commands is not supported.
  • DSCP configurations are supported.
  • Per-IP traffic shaping is supported.
  • QoS in general is not supported.

You can also use the traffic shaping features of the FortiGate unit’s main processing resources by disabling NP4 offloding. See Disabling NP offloading for firewall policies on page 1203.

Configuring NP4 traffic offloading

Configuring NP4 traffic offloading

Offloading traffic to a network processor requires that the FortiGate unit configuration and the traffic itself is suited to hardware acceleration. There are requirements for path the sessions and the individual packets.

NP4 session fast path requirements

Sessions must be fast path ready. Fast path ready session characteristics are:

  • Layer 2 type/length must be 0x0800 (IEEE 802.1q VLAN specification is supported)
  • Layer 3 protocol must be IPv4
  • Layer 4 protocol must be UDP, TCP or ICMP
  • Layer 3 / Layer 4 header or content modification must not require a session helper (for example, SNAT, DNAT, and TTL reduction are supported, but application layer content modification is not supported)
  • Firewall policies must not include proxy-based security features (proxy-based virus scanning, proxy-based web filtering, DNS filtering, DLP, Anti-Spam, VoIP, ICAP, Web Application Firewall, or Proxy options).
  • If the FortiGate supports NTurbo, firewall policies can include flow-based security features (IPS, Application Control CASI, flow-based antivirus, or flow-based web filtering) .
  • Origin must not be local host (the FortiGate unit)

 

If you disable anomaly checks by Intrusion Prevention (IPS), you can still enable NP4 hardware accelerated anomaly checks using the  fp-anomaly field of the

config system interface CLI command. See Offloading NP4 anomaly detection on page 1270Offloading NP4 anomaly detection on page 1270

If a session is not fast path ready, the FortiGate unit will not send the session key to the network processor(s). Without the session key, all session key lookup by a network processor for incoming packets of that session fails, causing all session packets to be sent to the FortiGate unit’s main processing resources, and processed at normal speeds.

If a session is fast path ready, the FortiGate unit will send the session key to the network processor(s). Session key lookup then succeeds for subsequent packets from the known session.

 

Packet fast path requirements

Packets within the session must then also meet packet requirements.

  • Incoming packets must not be fragmented.
  • Outgoing packets must not require fragmentation to a size less than 385 bytes. Because of this requirement, the configured MTU (Maximum Transmission Unit) for network processors’ network interfaces must also meet or exceed the network processors’ supported minimum MTU of 385 bytes.

If packet requirements are not met, an individual packet will use FortiGate unit main processing resources, regardless of whether other packets in the session are offloaded to the specialized network processor(s).

In some cases, due to these requirements, a protocol’s session(s) may receive a mixture of offloaded and non- offloaded processing.

For example, FTP uses two connections: a control connection and a data connection. The control connection requires a session helper, and cannot be offloaded, but the data connection does not require a session helper, and can be offloaded. Within the offloadable data session, fragmented packets will not be offloaded, but other packets will be offloaded.

Some traffic types differ from general offloading requirements, but still utilize some of the network processors’ encryption and other capabilities. Exceptions include IPsec traffic and active-active high availability (HA) load balanced traffic.

 

Mixing fast path and non-fast path traffic

If packet requirements are not met, an individual packet will be processed by the FortiGate CPU regardless of whether other packets in the session are offloaded to the NP4.

Also, in some cases, a protocol’s session(s) may receive a mixture of offloaded and non-offloaded processing.

For example, VoIP control packets may not be offloaded but VoIP data packets (voice packets) may be offloaded.

Viewing your FortiGate’s NP4 configuration

Viewing your FortiGate’s NP4 configuration

To list the NP4 network processors on your FortiGate unit, use the following CLI command.

get hardware npu np4 list

The output lists the interfaces that have NP4 processors. For example, for a FortiGate-5001C:

get hardware npu np4 list

ID   Model        Slot      Interface

0    On-board                port1 port2 port3 port4

fabric1 base1 npu0-vlink0 npu0-vlink1

1    On-board                port5 port6 port7 port8

fabric2 base2 npu1-vlink0 npu1-vlink1

 

NP4lite CLI commands (disabling NP4Lite offloading)

If your FortiGate unit includes an NP4Lite processor the following commands will be available:

Use the following command to disable or enable NP4Lite offloading. By default NP4lite offloading is enabled. If you want to disable NP4Lite offloading to diagnose a problem enter:

diagnose npu nplite fastpath disable

This command disables NP4Lite offloading until your FortiGate reboots. You can also re-enable offloading by entering the following command:

 

diagnose npu nplite fastpath enable

NP4lite debug command. Use the following command to debug NP4Lite operation:

diagnose npl npl_debug {<parameters>}

NP4 Acceleration

NP4 Acceleration

NP4 network processors provide fastpath acceleration by offloading communication sessions from the FortiGate CPU. When the first packet of a new session is received by an interface connected to an NP4 processor, just like any session connecting with any FortiGate interface, the session is forwarded to the FortiGate CPU where it is matched with a security policy. If the session is accepted by a security policy and if the session can be offloaded its session key is copied to the NP4 processor that received the packet. All of the rest of the packets in the session are intercepted by the NP4 processor and fast-pathed out of the FortiGate unit to their destination without ever passing through the FortiGate CPU. The result is enhanced network performance provided by the NP4 processor plus the network processing load is removed from the CPU. In addition, the NP4 processor can handle some CPU intensive tasks, like IPsec VPN encryption/decryption.

Session keys (and IPsec SA keys) are stored in the memory of the NP4 processor that is connected to the interface that received the packet that started the session. All sessions are fast-pathed and accelerated, even if they exit the FortiGate unit through an interface connected to another NP4. The key to making this possible is the Integrated Switch Fabric (ISF) that connects the NP4s and the FortiGate unit interfaces together. The ISF allows any port connectivity. All ports and NP4s can communicate with each other over the ISF.

There are no special ingress and egress fast path requirements because traffic enters and exits on interfaces connected to the same ISF. Most FortiGate models with multiple NP4 processors connect all interfaces and NP4 processors to the same ISF (except management interfaces) so this should not ever be a problem.

There is one limitation to keep in mind; the capacity of each NP4 processor. An individual NP4 processor has a capacity of 20 Gbps (10 Gbps ingress and 10 Gbps egress). Once an NP4 processor hits its limit, sessions that are over the limit are sent to the CPU. You can avoid this problem by as much as possible distributing incoming sessions evenly among the NP4 processors. To be able to do this you need to be aware of which interfaces connect to which NP4 processors and distribute incoming traffic accordingly.

Some FortiGate units contain one NP4 processor with all interfaces connected to it and to the ISF. As a result, offloading is supported for traffic between any pair of interfaces.

Some FortiGate units include NP4Lite processors. These network processors have the same functionality and limitations as NP4 processors but with about half the performance. NP4lite processors can be found in mid-range FortiGate models such as the FortiGate-200D and 240D.

FortiGate-3000D fast path architecture

FortiGate3000D fast path architecture

The FortiGate-3000D features 16 front panel SFP+ 10Gb interfaces connected to two NP6 processors through an

Integrated Switch Fabirc (ISF). The FortiGate-3000D has the following fastpath architecture:

l  8 SFP+ 10Gb interfaces, port1 through port8 share connections to the first NP6 processor (np6_0).

l  8 SFP+ 10Gb interfaces, port9 through port16 share connections to the second NP6 processor (np6_1).

CONSOLE

MGMT 1

1                   3                   5

SFP+

7                   9

11                 13                 15

 

STATUS ALARM HA

 

POWER

USB

MGMT 2

2                   4                   6

8                  10                 12

14                 16

 

 

Integrated Switch Fabric

FortiASIC NP6

FortiASIC NP6

 

 

 

 

 

 

 

 

Sys

System Bus

 

CP8

CPU

CP8

 

CP8                                                    CP8

 

You can use the following get command to display the FortiGate-3000D NP6 configuration. The command output shows two NP6s named NP6_0 and NP6_1 and the interfaces (ports) connected to each NP6. You can also use the diagnose npu np6 port-list command to display this information.

 

get hardware npu np6 port-list

Chip   XAUI Ports   Max   Cross-chip

Speed offloading

—— —- ——- —– ———- np6_0  0    port1   10G   Yes

0    port6   10G   Yes

1    port2   10G   Yes

1    port5   10G   Yes

2    port3   10G   Yes

2    port8   10G   Yes

3    port4   10G   Yes

3    port7   10G   Yes

—— —- ——- —– ———- np6_1  0    port10  10G   Yes

0    port13  10G   Yes

1    port9   10G   Yes

1    port14  10G   Yes

2    port12  10G   Yes

 

2    port15  10G   Yes

3    port11  10G   Yes

3    port16  10G   Yes

—— —- ——- —– ———-

FortiGate-1500DT fast path architecture

FortiGate1500DT fast path architecture

The FortiGate-1500DT features two NP6 processors both connected to an integrated switch fabric. The FortiGate-1500DT has the same hardware configuration as the FortiGate-1500D, but with the addition of newer CPUs and DPDK technology that improves IPS performance.

The FortiGate-1500DT includes the following interfaces and NP6 processors:

  • Eight SFP 1Gb interfaces (port1-port8), eight RJ-45 Ethernet ports (port17-24) and four SFP+ 10Gb interfaces (port33-port36) share connections to the first NP6 processor.
  • Eight SFP 1Gb interfaces (port9-port16), eight RJ-45 Ethernet ports (port25-32) and four SFP+ 10Gb interfaces (port37-port40) share connections to the second NP6 processor.

 

 

Integrated Switch Fabric

FortiASIC NP6

FortiASIC NP6

 

 

 

 

 

 

 

Sys

System Bus

CP8

CPU

CP8

You can use the following get command to display the FortiGate-1500DT NP6 configuration. The command output shows two NP6s named NP6_0 and NP6_1. The output also shows the interfaces (ports) connected to each NP6. You can also use the diagnose npu np6 port-list command to display this information.

get hardware npu np6 port-list

Chip   XAUI Ports   Max   Cross-chip

Speed offloading

—— —- ——- —– ———- np6_0  0    port1   1G    Yes

0    port5   1G    Yes

0    port17  1G    Yes

0    port21  1G    Yes

0    port33  10G   Yes

1    port2   1G    Yes

1    port6   1G    Yes

1    port18  1G    Yes

1    port22  1G    Yes

1    port34  10G   Yes

2    port3   1G    Yes

2    port7   1G    Yes

2    port19  1G    Yes

2    port23  1G    Yes

2    port35  10G   Yes

3    port4   1G    Yes

3    port8   1G    Yes

3    port20  1G    Yes

3    port24  1G    Yes

3    port36  10G   Yes

—— —- ——- —– ———- np6_1  0    port9   1G    Yes

0    port13  1G    Yes

0    port25  1G    Yes

0    port29  1G    Yes

0    port37  10G   Yes

1    port10  1G    Yes

1    port14  1G    Yes

1    port26  1G    Yes

1    port30  1G    Yes

1    port38  10G   Yes

2    port11  1G    Yes

2    port15  1G    Yes

2    port27  1G    Yes

2    port31  1G    Yes

2    port39  10G   Yes

3    port12  1G    Yes

3    port16  1G    Yes

3    port28  1G    Yes

3    port32  1G    Yes

3    port40  10G   Yes

—— —- ——- —– ———-