Implementing a Fabric Upgrade
A switch that is part of a fabric can be upgraded locally using software-upgrade process or you can start a fabric-wide upgrade of all nodes in a fabric.
While doing a fabric wide upgrade, the switch on which fabric-upgrade command is issued acts as the controller node. It is mandatory to copy the package to /sftp/import/ directory of the controller node.
Netvisor copies the upgrade package to other nodes in the fabric as part of fabric-wide upgrade. The controller node monitors the progress of the upgrade on each node and you can view the status of the upgrade using the fabric-upgrade-status-show command. The controller node is identified by an “*” after the switch name in the status output.
Netvisor ONE enables you to implement a fabric-wide upgrade and reboot the switches at the same time or in a sequential order.
Upgrading the Fabric
Follow the below tasks to upgrade all switches in the fabric:
Upgrade Commands
Following are the commands that control the software or fabric upgrade process:
- fabric-upgrade-start – begin the upgrade process specifying the package name
- fabric-upgrade-status-show – monitor the progress of the upgrade for each node in the fabric
- fabric-upgrade-finish – assuming auto-finish option is not used, begin the reboot process based on options specified when upgrade is started
- fabric-upgrade-abort – abort the entire upgrade process and return switches to their prior state
The fabric-upgrade-start command defines all the future behavior of the upgrade process, meaning any optional settings need to be defined with the start command. In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric. No configuration changes are permitted during the upgrade process.
Before you start the fabric-wide upgrade
- Copy image to /sftp/import/ directory of controller node
- Ensure there is a reliable in-band and/or out-off-band connectivity between fabric members, which helps to distribute the software for the upgrade and monitor the progress of the upgrade process. The distribution of software to the nodes of the fabric is done in parallel, that is, each node receives the software approximately at the same time. An independent communications link is established over the fabric communications path to distribute the software to each node in the fabric.
- Console access to switches are recommended
- Switches do not accept any configuration commands once upgrade starts, so plan accordingly
Copying Image to the Switch
To copy the image:
- First, enable Secure File Transfare Protocol (SFTP) service by using the CLI command and create an /sftp/import directory:
CLI (network-admin@switch1)>admin-sftp-modify enable
sftp password:
confirm sftp password:
CLI (network-admin@switch1)>
OR
Enable shell access to copy the file to the folder by using the command:
CLI(admin@netvisor) > role-modify name network-admin shell
And access the shell:
CLI(admin@netvisor) > shell
network-admin@netvisor:~$ cd /sftp/import
network-admin@netvisor:/sftp/import$
- Copy the image to /sftp/import directory
root@server-os-9:~/# sftp sftp@switch1
The authenticity of host 'switch1 (10.0.0.02)' can't be established.
RSA key fingerprint is SHA256:SI8VQZgJCppbrF4sRcby36Fx7rz3Hh5EJllPPyScLZU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'switch1, 10.0.0.02 (RSA) to the list of known hosts.
* Welcome to Pluribus Networks Inc. Netvisor(R). This is a monitored system. *
* ACCESS RESTRICTED TO AUTHORIZED USERS ONLY *
* By using the Netvisor(R) CLI,you agree to the terms of the Pluribus Networks *
* End User License Agreement (EULA). The EULA can be accessed via *
* http://www.pluribusnetworks.com/eula or by using the command "eula-show" *
Password:
Connected to switch1
sftp> cd import
sftp> put nvOS-5.2.1-5020115690-onvl.pkg
Uploading nvOS-5.2.1-5020115690-onvl.pkg
nvOS-5.2.1-5020115690-onvl.pkg
nvOS-5.2.1-5020115690-onvl.pkg 100% 1870MB 7.5MB/s 04:00
Fabric upgrade with manual-reboot option
This option completes in three phases:
- Copy upgrade package to switches in fabric and start upgrade with fabric-upgrade-start command.
- Finish or abort fabric upgrade with fabric-upgrade-finish or fabric-upgrade-abort commands.
- Manually reboot switches with the switch-reboot command.
Starting the Fabric Upgrade
Before starting the upgrade process, ensure that all the nodes of the fabric are online, you can use the command fabric-node-show and check that the state is online for all the nodes.
Use the following command to copy the upgrade package from controller switch to all other switches in the fabric and start the upgrade process. Run the fabric-upgrade-finish command to reboot the fabric and complete the upgrade process:
CLI network-admin@switch >fabric-upgrade-start packages <image> manual-reboot
The fabric-upgrade-start command defines all behavior of the upgrade process during the upgrade, that is, any optional settings need to be defined with the “start” command (see optional settings below). In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric. No configuration changes are permitted during the upgrade process.
The optional setting parameters for the fabric-upgrade-start command includes:
- auto-finish — you can specify to automatically reboot the entire fabric after the upgrade is complete. The default is no-auto-finish.
- abort-on-failure — specify if you want the upgrade to stop if there is a failure during the process.
- manual-reboot — specify if you want to manually reboot individual switches after the upgrade process. If you specify no-manual-reboot, all switches reboot automatically after the upgrade is complete.
- prepare — specify if you want to perform setup steps prior to performing the upgrade. This step copies the offline software package and then extracts and prepares for the final upgrade process. Once you begin the prepare process, you cannot add new switches to the fabric.
A sample upgrade process is explained below:
Start the upgrade process by using the command:
CLI (network-admin@switch1) > fabric-upgrade-start packages nvOS-5.2.1-5020115690-onvl.pkg auto-finish
Warning: This will start software upgrade on your entire fabric.
Please confirm y/n (Default: n):y
Scheduled background update.
Monitoring the Upgrade Process
The controller node monitors the progress of the upgrade on each node and reports the status of the upgrade by using the fabric-upgrade-status-show command. There are many interim steps to the upgrade process and to continually monitor the upgrade process use the show-interval (in seconds) option with the fabric-upgrade-status-show command:
Use the following commands to:
- To monitor the progress of the upgrade for each node in the fabric:
CLI (network-admin@switch1) > fabric-upgrade-status-show
For example,
CLI (network-admin@tucana-colo-6) > fabric-upgrade-status-show show-interval 5
log switch state cluster
---------------------------------- --------------- ------------------ ----------------------
(0:00:36)Agent needs restart eq-colo-7 Agent restart wait aqr07-08(sec)
(0:00:34)Agent needs restart tucana-colo-7 Agent restart wait spine-cl(sec)
(0:03:57)Extracting signed bundle. aquarius-test-1 Running aquarius-test-1-2(sec)
(0:00:45)Agent needs restart dorado-test-3 Agent restart wait dorado-test-2-3(sec)
(0:03:57)Extracting signed bundle. aqr08 Running aqr07-08(pri)
(0:00:28)Agent needs restart tucana-colo-6* Agent restart wait spine-cl(pri)
(0:03:57)Extracting signed bundle. aquarius-test-2 Running aquarius-test-1-2(pri)
(0:00:38)Agent needs restart dorado-test-2 Agent restart wait dorado-test-2-3(pri)
(0:01:00)Agent needs restart scorpius10 Agent restart wait none
(0:00:47)Agent needs restart vnv-mini-1 Agent restart wait none
log switch state cluster
---------------------------------- --------------- ------------------ ----------------------
(0:00:36)Agent needs restart eq-colo-7 Agent restart wait aqr07-08(sec)
(0:00:34)Agent needs restart tucana-colo-7 Agent restart wait spine-cl(sec)
(0:04:02)Extracting packages. aquarius-test-1 Running aquarius-test-1-2(sec)
(0:00:45)Agent needs restart dorado-test-3 Agent restart wait dorado-test-2-3(sec)
(0:04:02)Extracting signed bundle. aqr08 Running aqr07-08(pri)
(0:00:28)Agent needs restart tucana-colo-6* Agent restart wait spine-cl(pri)
(0:04:02)Extracting packages. aquarius-test-2 Running aquarius-test-1-2(pri)
(0:00:38)Agent needs restart dorado-test-2 Agent restart wait dorado-test-2-3(pri)
(0:01:00)Agent needs restart scorpius10 Agent restart wait none
(0:00:47)Agent needs restart vnv-mini-1 Agent restart wait none
.
.
log switch state cluster
------------------------------------------------------------ --------------- ---------------- ----------------------
(0:01:53)Waiting for completion processing eq-colo-7 Upgrade complete aqr07-08(sec)
(0:01:25)Waiting for completion processing tucana-colo-7 Upgrade complete spine-cl(sec)
(0:06:24)Waiting for completion processing aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)
(0:02:29)Waiting for completion processing dorado-test-3 Upgrade complete dorado-test-2-3(sec)
(0:06:43)Waiting for completion processing aqr08 Upgrade complete aqr07-08(pri)
(0:01:23)Waiting to reboot tucana-colo-6* Upgrade complete spine-cl(pri)
(0:06:16)Waiting for completion processing aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)
(0:02:19)Waiting for completion processing dorado-test-2 Upgrade complete dorado-test-2-3(pri)
(0:06:09)Waiting for completion processing scorpius10 Upgrade complete none
(0:08:09)Upgrading nvOS 5.1.2-5010215446 -> 5.2.1-5020115690 vnv-mini-1 Running none
.
.
log switch state cluster
-------------------------------------------------- --------------- ---------------- ----------------------
(0:01:53)Current/Reboot BE: netvisor-16 eq-colo-7 Upgrade complete aqr07-08(sec)
(0:01:25)Waiting for completion processing tucana-colo-7 Upgrade complete spine-cl(sec)
(0:06:24)Waiting for completion processing aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)
(0:02:29)Destroy BE: netvisor-45 dorado-test-3 Upgrade complete dorado-test-2-3(sec)
(0:06:43)Waiting for completion processing aqr08 Upgrade complete aqr07-08(pri)
(0:01:23)Waiting to reboot tucana-colo-6* Upgrade complete spine-cl(pri)
(0:06:16)Current/Reboot BE: netvisor-10 aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)
(0:02:19)Software upgrade done. Waiting for reboot dorado-test-2 Upgrade complete dorado-test-2-3(pri)
(0:06:09)Waiting for completion processing scorpius10 Upgrade complete none
(0:13:17)Waiting for completion processing vnv-mini-1 Upgrade complete none
log switch state cluster
----------------------------------------------- --------------- ---------------- ----------------------
(0:01:53)Upgrade complete eq-colo-7 Reboot wait aqr07-08(sec)
(0:01:25)Upgrade complete tucana-colo-7 Reboot wait spine-cl(sec)
(0:06:24)Upgrade complete aquarius-test-1 Reboot wait aquarius-test-1-2(sec)
(0:02:29)Upgrade complete dorado-test-3 Reboot wait dorado-test-2-3(sec)
(0:06:43)Upgrade complete aqr08 Reboot wait aqr07-08(pri)
(0:01:23)Sending Reboot wait message to handler tucana-colo-6* Reboot wait spine-cl(pri)
(0:06:16)Upgrade complete aquarius-test-2 Reboot wait aquarius-test-1-2(pri)
(0:02:19)Upgrade complete dorado-test-2 Reboot wait dorado-test-2-3(pri)
(0:06:09)Upgrade complete scorpius10 Reboot wait none
(0:13:17)Waiting for completion processing vnv-mini-1 Upgrade complete none
Connection to tucana-colo-6 closed by remote host.
Connection to tucana-colo-6 closed.
The first entry in the log is the elapsed time of the upgrade process. It does not include waiting time. The switch with the asterisk (*) is the upgrade controller node where the fabric-upgrade-start command was issued.
During a fabric-wide upgrade, the messages displayed by the fabric-upgrade-status-show command, based on the current progress status is described in table below:
Message |
Description |
Downloading package bundle |
The upgrade package is downloaded from the initial node to all the other nodes. |
Extracting initial bundle |
Once successfully downloaded, the offline bundle is extracted. |
Extracting signed bundle |
The signature of the package is verified. |
Extracting packages |
The packages are extracted and readied to install. |
Agent needs restart |
The nodes wait for the package to be extracted on all nodes of the fabric. |
Upgrading nvOS * |
The switch upgrades Netvisor from the older version to the newer one |
Waiting for fabric-upgrade-finish/abort |
The switches wait for the user to complete the upgrade once it completes using either of the commands mentioned above. |
- Once the upgrade package is copied to all switches by fabric upgrade process and the upgrade process is completed, run the fabric-upgrade-finish or fabric-upgrade-abort command to either finish the upgrade or abort it.
CLI (network-admin@switch1) > fabric-upgrade-finish
You can issue this command any time during the fabric upgrade to reboot all nodes when upgrade is complete. Once the upgrade phase is complete, all switches display the Upgrade complete message in the log field. You can then reboot the fabric. Following is an example:
CLI (network-admin@switch1) > fabric-upgrade-finish
log switch state cluster
------------------------------------------------- ------ --------------- ------------
(0:13:00)Waiting for fabric-upgrade-finish/abort sw2 Upgrade complete spine(sec)
(0:12:04)Waiting for fabric-upgrade-finish/abort sw1* Upgrade complete spine(pri)
(0:16:49)Waiting for fabric-upgrade-finish/abort sw1 Upgrade complete none
(0:15:27)Waiting for fabric-upgrade-finish/abort sw2 Upgrade complete none
Finalizing upgrade. Manual reboot of nodes required.
- Manual reboot: each switch in the fabric need to be manually rebooted after the upgrade is completed. The fabric-upgrade-status-show command displays the status as switch waiting to reboot. For example,
CLI (network-admin@switch1) > fabric-upgrade-status-show
fabric-upgrade-status-show: Switch waiting to reboot
At this point, upgrade is completed on all switches, reboot switches one at a time by the following command:
CLI (network-admin@switch1) > switch-reboot
Note: You must reboot the controller switch at the end only.
Note: All the nodes of the fabric should be running the same software version for the Netvisor ONE features to work correctly.
- During the installation, if there is any issue, the upgrade process can be rolled back using the command fabric-upgrade-abort. To abort the upgrade process and return the switches to their prior state (no reboot needed):
CLI (network-admin@switch1) > fabric-upgrade-abort
Aborts the fabric upgrade process. All changes to the switches are cleaned up and the server-switches do not reboot. The configuration lock on the fabric is also released. If you issue the fabric-upgrade-abort command during the upgrade process, it may take some time before the process stops because the upgrade has to reach a logical completion point before the changes are rolled back on the fabric. This allows the proper cleanup of the changes.
Warning: DO NOT use the switch-reboot command to reboot the switch while upgrade is in progress.
Note: During the fabric-upgrade process, the fabric configuration is locked throughout the entire process and you cannot change any configurations during the process.
Related Command:
Other related commands for fabric-upgrade includes:
- fabric-upgrade-prepare-cancel — cancels a fabric upgrade that was prepared earlier.
- fabric-upgrade-prepare-resume — resume a fabric upgrade that was prepared earlier.
- fabric-upgrade-prepare-show — displays the status of prepared upgrades on the fabric nodes.
Review bootenv
A new boot environment is built during the upgrade process. Upon reboot this new boot environment becomes active and the new software is up-and-running on the switch. Generally, it is not required to interact with the boot environments during the upgrade process. It may be necessary to review the boot environments using the command bootenv-show if there is some failure during the upgrade process.