Configuring Maintenance Mode
Netvisor ONE version 6.1.0 introduces the Maintenance Mode feature in which the software prevents user traffic from entering or leaving a switch and gracefully steers the traffic to network peers. This mode is useful during hardware or software maintenance of the switch including Return Merchandise Authorization (RMA), correction of out of sync fabric transactions, cluster re-peer, switch power down or reboot, and software upgrade. This functionality is supported on cluster nodes, non-cluster spine nodes, and standalone switches. In a use case where you need to enable maintenance mode on a standalone switch, you must ensure that an alternate path exists if the switch goes down.
Note: You can enable maintenance mode only when control-network and fabric-network are configured as mgmt. You cannot change the fabric and control networks or the fabric VLAN while in maintenance mode.
A switch does not leave the fabric while entering maintenance mode. When you issue the command to enable maintenance mode, Netvisor ONE follows the normal port bring down sequence and disables all the user ports. Any cluster bring down configuration is also applicable. The sequence of actions performed by the software while entering maintenance mode (for a cluster switch) is as follows:
- Bring down orphan ports
- Bring down vLAG ports
- Disable VRRP service and BGP graceful shutdown
- Bring down Layer 3 ports
- Bring down ports with defer-bringdown configured
- Bring down cluster ports
While in maintenance mode, you can execute all CLI or REST API commands except port enable commands. The software defers all port enable actions and enables the ports only after exiting maintenance mode. If you reboot or power cycle the switch while in maintenance mode, the switch comes back up and stays in maintenance mode and does not enable any user ports.
A switch (cluster member) follows the below sequence of actions while leaving maintenance mode:
- Bring up cluster ports
- Enable VRRP and BGP services
- Bring up Layer 3 ports
- Bring up vLAG ports
- Bring up orphan ports or ports with defer-bringup configured
The switch assumes data forwarding responsibilities upon leaving maintenance mode. All pre-configured settings for cluster bring up are applicable during the exit and therefore, Netvisor ONE enforces staggered or delayed port bring based on the existing configuration. For more information, see the 'Restoring Ports for Cluster Configurations' section of the 'Configuring High Availability' chapter.
Use the command system-state-modify to enable or disable maintenance mode. For example, to enter maintenance mode, use the command:
CLI (network-admin@Leaf1) > system-state-modify maintenance-enable
Warning: This configuration can have traffic impact. If required, collect system snapshot via save-diags prior to this command
Please confirm y/n (Default: n):y
CLI (network-admin@Leaf1) >
Note: Pluribus recommends collecting the output of the save-diags command before entering maintenance mode. This helps in recording the state of the switch.
To view the status of maintenance mode, use the command:
CLI (network-admin@Leaf1) > system-state-show
system-state: Maintenance mode, Ports disabled
To leave maintenance mode, use the command:
CLI (network-admin@Leaf1) > system-state-modify maintenance-disable
Note: You must ensure that transactions (fabric and cluster TIDs) are in sync with the rest of the fabric before executing the command to disable maintenance mode.
Use the system-state-show command to view the status:
CLI (network-admin@Leaf1) > system-state-show
system-state: Operational, Ports enabled
The system-state-show command also displays the current state of port bring up or port bring down:
CLI (network-admin@Leaf1) > system-state-show
system-state: coming up, l3 to vlag wait
system-state: coming up, vlag ports being enabled
system-state: coming up, defer bringup ports wait
When a switch enters or leaves maintenance mode, an event log messages is logged, as seen from the log-event-show output:
CLI (network-admin@Leaf1) > log-event-show
event maintenance_enabled(11529) : level=note event-type=system : : System is in Maintenance mode
event maintenance_disabled(11530) : level=note event-type=system : : System is in Operational mode
You can enforce maintenance mode on a switch as soon as the cluster re-peer process is complete by using the sample command:
CLI (network-admin@Leaf1) > fabric-join repeer-to-cluster-node Leaf1 maintenance-enable
This operation is useful when replacing a cluster node and if you want to be in maintenance mode after the re-peer process.