Fast Failover for STP and Cluster
Previously, cluster STP operation did not support fast failover because NetVisor OS did not share STP state between the two nodes. As a result, when the master failed and when it came back online, the slave had to recompute the STP state from scratch. This resulted in topology changes twice, causing traffic loss until STP converged.
Currently, NetVisor OS supports fast failover by default. In the cluster-STP mode, the node that has been up longer is elected as the master. Cluster syncs (keep-alives) are used to detect the peer node being online, and negotiate the initial cluster state. Cluster syncs determine which node has been up longer based on exchanged uptime values.
The master runs the state machine for both nodes and sends the STP and port states to the slave. The slave in turn maintains the state as informed by the master. The slave generates its own BPDUs based on the synchronized state and forwards the BPDUs that it receives to the master. When the cluster goes offline, the slave/master uses the same bridge ID and priority, uses consistent port IDs in BPDUs, and continues from the existing synchronized state. The STP state machine state is thus never lost.
Internal state synchronization using consistent bridge ID/priority and port IDs regardless of whether the cluster is online or offline, and active-active vLAG handling ensure that an end node detects no topology change when the cluster nodes go offline/online.
When a cluster is created, the STP configuration between the two cluster nodes is checked and is synchronized. The following guidelines are true regardless of whether the cluster is online or offline, and whether the peer node is online or offline:
- Both nodes use the same bridge ID or priority.
- When node1 sends a BPDU, the port ID inside the packet is 1-256, except for active-active vLAGs.
- When node2 sends a BPDU, the port ID inside the packet is 257-512, except for active-active vLAGs.
- When either node sends a BPDU on an active-active vLAG, the port ID inside the packet is node1's port number.
- Configuration changes (STP mode, MST instances, bridge ID, etc.) are mirrored on both nodes through cluster transactions.
Due to the above guidelines, a BPDU sent on an active-active vLAG appears exactly the same to a third party receiver regardless of whether that packet came from cluster node1 or node2.
NetVisor OS provides two show commands to view the details of this functionality: stp-state-show and stp-port-state-show.
For example:
CLI (network-admin@Leaf1) > stp-state-show
switch: Leaf-1
vlan: 1
ports: none
instance-id: 1
name: stg-default
bridge-id: 66:0e:94:d5:b0:cc
bridge-priority: 32769
root-id: 66:0e:94:35:c2:ce
root-priority: 32769
root-port: 128
hello-time: 2
forwarding-delay: 15
max-age: 20
disabled: none
learning: none
forwarding: none
discarding: none
edge: none
designated: none
alternate: none
backup: none
CLI (network-admin@Switch2) > stp-port-state-show port 17
switch: Switch2
vlan: 1
port: 17
stp-state: Forwarding
role: Designated
selected-role: Designated
state: agreed,learn,learning,forward,forwarding,selected,send-rstp,synced,online,requested-online
designated-priority: 32769-66:0e:94:38:39:80,100,32769-66:0e:94:b7:65:91,32785
port-priority: 32769-66:0e:94:38:39:80,100,32769-66:0e:94:b7:65:91,32785
message-priority: 0-00:00:00:00:00:00,0,0-00:00:00:00:00:00,0
info-is: mine
hello-timer: 2
root-guard-timer: 0
sm-table-bits: 0xfaedee
sm-table: prx=discard*,bdm=not-edge*,ptx=idle*,pim=current*,prt-disabled=disable*,prt-root=root*,prt-desg=designated*,prt-alt-bk=block*,pst=forwarding*,tcm=active*