Egress ECMP Load Distribution for VXLAN Traffic from the VTEP Switch

Equal-cost multi-path routing (ECMP) is a routing strategy where next-hop packet forwarding to a single destination can occur over multiple best paths. Tunnel next hops are updated based on underlay routes information. RIB/FIB information is leveraged to program next hops for a tunnel remote endpoint. If multiple next hops exists for a tunnel remote endpoint, an ECMP group is created using the list of next hops and the tunnel is programmed accordingly.

For example, a tunnel leaf1toleaf 2 with a remote IP address 32.4.11.1, there are 2 next hops, 192.178.0.6 and 192.178.0.2. Traffic going over tunnel leaf1toleaf 2 is hashed over these two next hop links.

CLI (network-admin@lleaf11) > tunnel-show

scope:                           cluster

name:                            leaf1toleaf2

type:                            vxlan

vrouter-name:                    leafpst1

peer-vrouter-name:               leafpst2

local-ip:                        22.3.11.1

remote-ip:                       32.4.11.1

router-if:                       eth12.11

next-hop:                        192.178.0.6

next-hop-mac:                    66:0e:94:8c:d4:0f

nexthop-vlan:                    4091

remote-switch:                   0

active:                          yes

state:                           ok

error:                           

route-info:                      32.4.11.0/24

scope:                           

name:                            

type:                            

vrouter-name:                    

peer-vrouter-name:               

local-ip:                        

remote-ip:                       

router-if:                       

next-hop:                        192.178.0.2

next-hop-mac:                    66:0e:94:5b:90:2b

nexthop-vlan:                    

remote-switch:                   4092

active:                          0

state:                           ok

error:                           

route-info:                      32.4.11.0/24

scope:                           cluster

name:                            leaf1toleaf2-2nd

type:                            vxlan

vrouter-name:                    leafpst1

peer-vrouter-name:               leafpst2

local-ip:                        22.3.12.1

remote-ip:                       32.4.12.1

router-if:                       eth9.12

next-hop:                        192.178.0.6

next-hop-mac:                    66:0e:94:8c:d4:0f

nexthop-vlan:                    4091

remote-switch:                   0

active:                          yes

state:                           ok

error:                           

route-info:                      32.4.11.0/24

scope:                           

name:                            

type:                            

vrouter-name:                    

peer-vrouter-name:               

local-ip:                        

remote-ip:                       

router-if:                       

next-hop:                        192.178.0.2

next-hop-mac:                    66:0e:94:5b:90:2b

nexthop-vlan:                    

remote-switch:                   4092

active:                          0

state:                           ok

error:                           

route-info:                      32.4.11.0/24

 

CLI (network-admin@leaf-pst-1) > vrouter-rib-routes-show ip 32.4.11.0

vrid ip        prelen number-of-nexthops nexthop     flags      vlan intf_ip     intf_id

---- --------- ------ ------------------ ----------- ---------- ---- ----------- -------

0    32.4.11.0 24     2                  192.178.0.6 ECMP,in-hw 4091 192.178.0.5 1

0    32.4.11.0 24     2              192.178.0.2 ECMP,in-hw 4092 192.178.0.1 0

 

CLI (network-admin@leaf-pst-1) > vrouter-rib-routes-show ip 32.4.12.0

vrid ip        prelen number-of-nexthops nexthop     flags      vlan intf_ip     intf_id

---- --------- ------ ------------------ ----------- ---------- ---- ----------- -------

0    32.4.12.0 24     2                  192.178.0.6 ECMP,in-hw 4091 192.178.0.5 1

0    32.4.12.0 24     2                  192.178.0.2 ECMP,in-hw 4092 192.178.0.1 0

 

 

VXLAN Routing In and Out of Tunnels


 

Informational Note:  The VXLAN tunnel loopback infrastructure, identified by the trunk object named "vxlan-loopback-trunk", is used for bridging multicast or broadcast traffic in the extended VLAN and for routing traffic before VXLAN encapsulation or after VXLAN decapsulation. Non-routed unicast traffic is bridged and encapsulated or decapsulated and bridged without using the VXLAN tunnel loopback.

This feature provides support for centralized routing for VXLAN VLANs. For hosts on different VXLAN VLANs to communicate with each other, SVIs on VXLAN VLAN are configured on one cluster pair in the fabric. Any VXLAN VLAN packets that need to be routed between two hosts are sent to a centralized overlay vrouter and then VXLAN encapsulated or decapsulated depending on source or destination host location.

Because the E68-M and E28Q cannot perform VXLAN routing in and out of tunnels in a single instance, loopback support exists. Netvisor OS is leveraging vxlan-loopback-trunk to support recirculation of the packets. Be sure to add ports to vxlan-loopback-trunk so that VXLAN routing in and out of tunnels works correctly. After VXLAN decapsulation, if packets are routed, the inner DMAC is either the vRouter MAC address or VRRP MAC address. The packet needs to recirculate after decapsulation as part of the routing operation. To accomplish this, Layer 2 entries for route RMAC address or VRRP MAC address on VXLAN VLAN are programmed to point to vxlan-loopback-trunk ports in hardware. The show output for the command, l2-table-show, is updated with a vxlan-loopback flag to indicate the hardware state.

CLI network-admin@switch > l2-table-show vlan 200

mac:                       00:0e:94:b9:ae:b0

vlan:                      200

vxlan                      10000

ip:                        2.2.2.2

ports:                     69

state:                     active,static,vxlan-loopback,router

hostname:                  Spine1

peer-intf:                 host-1

peer-state:                

peer-owner-state:          

status:                    

migrate:                   

mac:                       00:0e:94:b9:ae:b0

vlan:                      200

vxlan                      10000

ip:                        2.2.2.2

ports:                     69

state:                     active,static,vxlan-loopback,router

hostname:                  Spine1

peer-intf:                 host-1

peer-state:                active,vrrp,vxlan-loopback active,vrrp

peer-owner-state:          

status:                    

migrate:                   

CLI network-admin@switch > l2-table-show vlan 100

mac:                       00:0e:94:b9:ae:b0

vlan:                      100

vxlan                      20000

ip:                        1.1.1.1

ports:                     69

state:                     active,static,vxlan-loopback,router

hostname:                  Spine1

status:                    

migrate:                   

 

Also for Layer3 entries behind VXLAN tunnels, routing and encapsulation operations requires two passes . To obtain the Layer 3 entry, the hardware is pointing to vxlan-loopback-trunk. The show output of the l3-table-show displays the hardware state with a vxlan-loopback flag.

CLI (network-admin@Spine1) > l3-table-show ip 2.2.2.3 format all

mac:                    00:12:c0:88:07:75

ip:                    2.2.2.3

vlan:                  200

public-vlan:          200

vxlan:                10000

rt-if:                eth5.200

state:                active,vxlan-loopback

egress-id:            100030

create-time:          16:46:20

last-seen:            17:25:09

hit:                  22

tunnel:               Spine1_Spine4

 

VXLAN Port Termination

When overlay VLANs are configured on a port, Netvisor OS does not allow VXLAN termination on a port even if the VXLAN termination criteria is matched. This is mainly enforced for ports facing bare metal servers or single root input/output virtualization (SRIOV) hosts. With underlay VLANs configured on a port, Netvisor OS allows VXLAN termination on a port which could have HWvtep or SWVtep configured for that port.

Prior to Version 2.5.3, when overlay vlans are configured on a port, VXLAN encapsulated packets received on a port are not subjected to VXLAN tunnel termination. This restriction is now removed while keeping security constraint valid by enhancing port-config-modify with new parameter vxlan-termination.

One sample use case has both overlay and underlay VLANs on a port. In this case, Netvisor OS disables the VXLAN termination on the port since the port has overlay VLAN and therefore, any VXLAN encapsulated traffic received on this port is no longer terminated even if the destination is a local HWvtep.

To support this sample use case, Netvisor OS provides a port-config-modify parameter to enable or disable VXLAN termination on the port.

CLI network-admin@switch > CLI (network-admin@Spine1)>port-config-modify port 35 vxlan-termination

Enables tunnel termination of VXLAN encapsulated packets received on the port when VXLAN tunnel termination criteria is met.

CLI network-admin@switch > CLI (network-admin@Spine1)>port-config-modify port 35 no-vxlan-termination

Disables vxlan-termination on a port when VXLAN encapsulated packets are received on port. This enforces the security to prevent any malicious host from generating VXLAN encapsulated packets that would otherwise be subject to VXLAN tunnel termination.

Managed ports added to a VNET with vlan-type private, relies on VXLAN functionality and therefore always carry overlay VLANs only. Therefore when a port is configured to be a managed port, VXLAN termination is disabled by default.

 

Default Settings

1. VNETs with vlan-type private relies on VXLAN functionality. The vlan-type private are VXLAN overlay VLANs. Hence when a port is configured to be a managed port with vlan-type private, vxlan-termination is disabled by default.

2. Shared/underlay ports have vxlan-termination on by default and can use the port-config-modify com­mand to enable or disable vxlan-termination as is deemed to enforce port level security.

VXLAN termination is disabled on VXLAN loopback trunk ports.

Virtual Link Extension with Cluster Configurations

Limitations for this release are as follows:

Virtual Link Extension (VLE) on switches that are part of a cluster configuration is now supported by creating a dedicated VXLAN tunnel end points (VTEPs). The VTEPs are configured using one of the physical or primary IP addresses on the switch. The physical or primary IP address can be from a new Layer 3 interface dedicated for VLE configuration or from reusing the existing physical or primary IP addresses used to build the cluster VIP and used for VXLAN tunnel redundancy in a cluster environment. These dedicated tunnels and VTEPs are stateless with no dependency on each other.

Figure 3:

VLE-Cluster-Topology.png

 Example Topology for Virtual Link Extension and Cluster Configuration

In the example topology, Host1 is connected to both cluster nodes, PN-SW1 and PN-SW2. There is no VLAG on PN-SW1 and PN-SW2 on the connection to Host1. Host2 has 2 links connected to PN-SW3 which is a standalone switch. PN-SW3 does not have trunking configured on the ports connected to Host2. Both Host1 and Host2 are configured with LACP on links connecting to switches to High Availability (HA) functionality.

The first step is to create a new VLAN Layer 3 interface on the local vRouter that is used as a VTEP source IP. The VLAN is local only and dedicated for this usage.

In this example configuration, you need to configure one virtual link extension for each point to point connectivity.

1. Configure VLE VLANs as below for each virtual link extension and add the ports:

On PN-SW1 

CLI network-admin@switch > vlan-create id 400 vxlan 400 vxlan-mode transparent scope local

CLI network-admin@switch > vlan-port-add vlan-id 400 ports 11

On PN-SW2 

CLI network-admin@switch > vlan-create id 401 vxlan 401 vxlan-mode transparent scope local

CLI network-admin@switch > vlan-port-add vlan-id 401 ports 11

On PN-SW3

CLI network-admin@switch > vlan-create id 400 vxlan 400 vxlan-mode transparent scope local

CLI network-admin@switch > vlan-create id 401 vxlan 401 vxlan-mode transparent scope local

CLI network-admin@switch > vlan-port-add vlan-id 400 ports 11

CLI network-admin@switch > vlan-port-add vlan-id 401 ports 12

Create VXLAN tunnels using the Primary IP address. Note that 10.10.10.1 and 10.10.10.2 are primary IP addresses on PN-SW1 and PN-SW2 and 20.20.20.3 is primary IP on PN-SW3.

On PN-SW1

CLI network-admin@switch > tunnel-create scope local name VTEP1 vrouter-name vr-s1 local-ip 10.10.10.1 remote-ip 20.20.20.3

On PN-SW2

CLI network-admin@switch > tunnel-create scope local name VTEP2 vrouter-name vr-s2 local-ip 10.10.10.2 remote-ip 20.20.20.3

On PN-SW3

CLI network-admin@switch > tunnel-create scope local name VTEP3 vrouter-name vr-s3 local-ip 20.20.20.3 remote-ip 10.10.10.1

CLI network-admin@switch > tunnel-create scope local name VTEP4 vrouter-name vr-s3 local-ip 20.20.20.3 remote-ip 10.10.10.2

2. Add VLE VLANs and VXLANs to VXLAN tunnels.

On PN-SW1 

CLI network-admin@switch > tunnel-vxlan-add name VTEP1 vxlan 400

On PN-SW2

CLI network-admin@switch > tunnel-vxlan-add name VTEP2 vxlan 401

On PN-SW3

CLI network-admin@switch > tunnel-vxlan-add name VTEP3 vxlan 400

CLI network-admin@switch > tunnel-vxlan-add name VTEP4 vxlan 401