Safely Restoring Ports for Cluster Configurations



Note: This feature is only applied during the initial start up of the network.


Sub-second traffic loss for fail over events is required for a cluster configuration. There are two types of ports providing redundant data paths: 1) Layer 3 ports over ECMP redundant routed paths, and 2) virtual LAGS (VLAGs) providing redundant Layer 2 paths. During failover and recovery port events, it can take measurable time to change the hardware routing and MAC tables on larger networks. This delay incurs traffic loss on the network. To reduce delay, this feature allows you to incrementally restore these ports at start up. By incrementally restoring the ports, the changes to the hardware are prevented from contending with each other and reduces the delay between a port up and the hardware updates with the appropriate Layer 3 and Layer 2 information for the port. This process ensures sub-second fail over.


All non-Layer 3 and non-VLAG ports are restored first. This allows the cluster links to activate and the cluster configuration to synchronize information. Layer 3 and VLAG port restoration starts after the cluster synchronizes. This is predicated on the cluster becoming active, all Layer 2 and Layer 3 entries, such as status updates, exchanged, cluster STP status synchronized, and all router interfaces initialized.


The parameter, maximum-sync-delay, controls the maximum time to wait for synchronization in the case where the cluster cannot synchronize information. After synchronization is complete, Layer 3 ports are restored first, since Layer 3 traffic can traverse the cluster link to the peer VLAG port if needed. Currently the reverse is typically not true.


If VLAG ports are restored first, a Layer 3 adjacency between the two cluster nodes may be needed but may not exist in some network configurations. After Layer 3 ports are restored, Netvisor One waits a configurable Layer 3 port to VLAG delay to allow time for the routing protocols to converge and insert the routes. The delay time defaults to 15 seconds.


After the delay, the VLAG ports are restored incrementally. Incrementally restoring ports allows enough time to move Layer 2 entries from the cluster link to the port. Incrementally restoring ports also allows the traffic loss to occur in small, 200-300ms per port, rather than one large time span. This is particularly important for server clusters where temporary small losses are no issue, but fail or timeout for a large continuous traffic loss. If the node coming up is the cluster master, then no staggering and no Layer 3 to VLAG wait is applied. And if the node is the cluster master node, that means the peer is down or coming up, and not handling traffic. Therefore Netvisor One safely restores the ports as soon as possible to start traffic flowing between the nodes.


To configure a cluster for restoring Layer 3 ports, use the following commands:


cluster-bringup-modify

Modifies the cluster bring up configuration.

Specify one or more of the following options

l3-port-bringup-mode staggered|simultaneous

Specify the Layer 3 port bring up mode during start up.

l3-port-staggered-interval duration: #d#h#m#s

Specify the interval between Layer 3 ports in Layer 3 staggered mode. This can be in days, hours, minutes, or seconds.

vlag-port-bringup-mode staggered|simultaneous

Specify the VLAG port bring up mode during start up.

vlag-port-staggered-interval duration: #d#h#m#s   

 Specify the interval between VLAG ports in VLAG staggered mode.

This can be in days, hours, minutes, or seconds.

maximum-sync-delay duration: #d#h#m#s

Specify the maximum delay to wait for cluster to synchronize before starting Layer 3 or VLAG port bring up.

This can be in days, hours, minutes, or seconds.

l3-to-vlag-delay duration: #d#h#m#s

Specify the delay between the last Layer 3 port and the first VLAG port bring up.

This can be in days, hours, minutes, or seconds. The default value is 15 seconds.

 

To display the cluster port restoration configuration, use the cluster-bringup-show command:


cluster-bringup-show

Displays the cluster bring up configuration information.


CLI (network-admin@Leaf1) > cluster-bringup-show


switch:                       Leaf1

state:                        

l3-port-bringup-mode:         staggered

l3-port-staggered-interval:   3s

vlag-port-bringup-mode:       staggered

vlag-port-staggered-interval: 3s

maximum-sync-delay:           1m

l3-to-vlag-delay:             15s

l3-to-vlan-interface-delay:   0s