Safely Restoring Ports for Cluster Configurations


 

Informational Note:  This feature is only applied during the initial start up of the cluster slave switch.

failover and recovery port events, it can take measurable time to change the hardware routing and MAC tables on larger networks. This delay incurs traffic loss on the network. To reduce delay, this feature allows you to incrementally restore these ports at start up. By incrementally restoring the ports, the changes to the hardware are prevented from contending with each other and reduces the delay between a port up and the hardware updates with the appropriate Layer 3 and Layer 2 information for the port. This process ensures sub-second fail over.

All non-Layer 3 and non-VLAG ports are restored first. This allows the cluster links to activate and the cluster configuration to synchronize information. Layer 3 and VLAG port restoration starts after the cluster synchronizes. This is predicated on the cluster becoming active, all Layer 2 and Layer 3 entries, such as status updates, exchanged, cluster STP status synchronized, and all router interfaces initialized.

The parameter, maximum-sync-delay, controls the maximum time to wait for synchronization in the case where the cluster cannot synchronize information. After synchronization is complete, Layer 3 ports are restored first, since Layer 3 traffic can traverse the cluster link to the peer VLAG port if needed. Currently the reverse is typically not true.

If VLAG ports are restored first, a Layer 3 adjacency between the two cluster nodes may be needed but may not exist in some network configurations. After Layer 3 ports are restored, Netvisor OS waits a configurable Layer 3 port to VLAG delay to allow time for the routing protocols to converge and insert the routes. The delay time defaults to 15 seconds.

After the delay, the VLAG ports are restored incrementally. Incrementally restoring ports allows enough time to move Layer 2 entries from the cluster link to the port. Incrementally restoring ports also allows the traffic loss to occur in small, 200-300ms per port, rather than one large time span. This is particularly important for server clusters where temporary small losses are no issue, but fail or timeout for a large continuous traffic loss. If the node coming up is the cluster master, then no staggering and no Layer 3 to VLAG wait is applied. And if the node is the cluster master node, that means the peer is down or coming up, and not handling traffic. Therefore Netvisor OS safely restores the ports as soon as possible to start traffic flowing between the nodes.

To configure a cluster for restoring Layer 3 ports, use the following commands:

cluster-bringup-modify

Modifies the cluster bring up configuration.

Specify one or more of the following options

l3-port-bringup-mode staggered|simultaneous

Specify the Layer 3 port bring up mode during start up.

l3-port-staggered-interval duration: #d#h#m#s

Specify the interval between Layer 3 ports in Layer 3 staggered mode. This can be in days, hours, minutes, or seconds.

vlag-port-bringup-mode staggered|simultaneous

Specify the VLAG port bring up mode during start up.

vlag-port-staggered-interval duration: #d#h#m#s   

 Specify the interval between VLAG ports in VLAG staggered mode.

This can be in days, hours, minutes, or seconds.

maximum-sync-delay duration: #d#h#m#s

Specify the maximum delay to wait for cluster to synchronize before starting Layer 3 or VLAG port bring up.

This can be in days, hours, minutes, or seconds.

l3-to-vlag-delay duration: #d#h#m#s

Specify the delay between the last Layer 3 port and the first VLAG port bring up.

This can be in days, hours, minutes, or seconds. The default value is 15 seconds.

 

To display the cluster port restoration configuration, use the cluster-bringup-show command:

cluster-bringup-show

Displays the cluster bring up configuration information.

Configuring Layer 2 Multipathing for Virtual Chassis Link Aggregation (VLAG)

You can aggregate links between two switches by configuring Layer 2 multipathing and virtual chassis Link Aggregation.

A virtual chassis Link Aggregation Group (VLAG) allows links that are physically connected to two different switches to appear as a single Ethernet trunk to a third device. The third device can be a server, switch, or any other networking device. A VLAG can create Layer 2 multipathing which allows you to create redundancy, enabling multiple parallel paths between nodes.

A VLAG requires that a least one cross connection between the two switches, also called peers, where the VLAG links terminate. The specific ports that connect the different switches, do not require explicit configuration before creating a VLAG.

VLAGs can provide the following benefits:

Netvisor OS performs VLAG synchronization to coordinate active-standby and active-active configurations using the following rules:

Netvisor OS reports the state as up or down and synchronizes the state. For active-standby VLAGs, port up timestamps are exchanged to resolve any contest if both ports are up.

Netvisor OS performs synchronization from the primary node to the secondary node. If the secondary node requires synchronization, the secondary node sends a request to the primary node to perform the synchronization.

Synchronization messages are sent on a per-VLAG basis, and compare the local VLAG port state with the peer VLAG port state. The port state then determines any port enable or disable actions for active-standby VLAGs or port egress rule changes for active-active VLAGs.

VLAG synchronization occurs when a trigger happens on the configuration:

For any port in an active-standby VLAG, Netvisor records the time up of the port, and sends it as part of the VLAG synchronization message. the time up values are compared on both nodes to determine the active port.