Implementing a Fabric Upgrade
A switch that is part of a fabric can be upgraded locally using software-upgrade process or you can start a fabric-wide upgrade of all nodes in the fabric.
While performing a fabric wide upgrade, the switch on which fabric-upgrade command is issued acts as the controller node. It is mandatory to copy the package to /sftp/import/ directory of the controller node.
Netvisor ONE copies the upgrade package to other nodes in the fabric as part of fabric-wide upgrade. The controller node monitors the progress of the upgrade on each node and you can view the status of the upgrade using the fabric-upgrade-status-show command. The controller node is identified by an “*” after the switch name in the status output.
Netvisor ONE enables you to implement a fabric-wide upgrade and reboot the switches at the same time or in a sequential order.
Upgrading the Fabric
Follow the tasks explained here to upgrade all switches in the fabric:
Upgrade Commands
Following are the commands that control the fabric upgrade process:
- fabric-upgrade-start – begin the upgrade process on entire fabric by specifying the package name
- fabric-upgrade-status-show – monitor the progress of the upgrade for each node in the fabric
- fabric-upgrade-finish – finalize when upgrade is complete
- fabric-upgrade-abort – abort the entire upgrade process and return switches to their prior state
The fabric-upgrade-start command defines all the future behavior of the upgrade process, that is, any optional settings need to be defined with the start command. In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric. No configuration changes are permitted during the upgrade process.
The fabric-upgrade-start command includes the following options:
CLI (network-admin@switch) > fabric-upgrade-start
fabric-upgrade-start |
Starts the software upgrade or prepare process on entire fabric. |
packages sftp-files name |
Comma separate list of software bundles. |
Specify between 0 and 7 of the following options: |
|
auto-finish|no-auto-finish |
Automatically starts the software upgrade on the entire fabric. The default option is no-auto-finish. |
abort-on-failure|no-abort-on-failure
|
Whether to abort fabric upgrade if a node fails or not. The default option is no-abort-on-failure. |
manual-reboot|no-manual-reboot |
Whether to defer to user for reboot after upgrade. |
download-count 1..5 |
Number of concurrent downloads. The default value is 5 (maximum). This option is introduced in version 6.1.0. |
prepare|no-prepare |
Perform setup steps for the actual upgrade. |
upload-server upload-server-string |
Upload config file to server via SCP. |
server-password |
SCP host password. |
During a fabric upgrade, all members of fabric downloads the upgrade bundle from controller node. By default, fabric upgrade allows a maximum of 5 switches in the fabric to download the upgrade bundle from controller at a given time.
However, this can cause issues if there is bandwidth constraint or can overwhelm the controller node if the controller is of a lower hardware specification switch. To address this issue, starting with Netvisor ONE version 6.1.0, you can use the download-count parameter of fabric-upgrade command to reduce the number of concurrent downloads depending upon your network conditions and hardware capabilities of the controller node. By default, the download-count is five.
For example, to set the download count to 2, use the command:
CLI (network-admin@switch) > fabric-upgrade-start packages nvOS-6.0.1-6010017911-onvl.pkg download-count 2
Before you start the fabric-wide upgrade
- Copy image to /sftp/import/ directory of controller node.
- Ensure there is a reliable in-band and/or out-off-band connectivity between fabric members, which helps to distribute the software for the upgrade and monitor the progress of the upgrade process. The distribution of software to the nodes of the fabric is done in parallel, that is, each node receives the software approximately at the same time. An independent communications link is established over the fabric communications path to distribute the software to each node in the fabric.
- Console access to switches are recommended.
- Switches do not accept any configuration commands once upgrade starts, so plan accordingly.
Copying Image to the Switch
To copy the image:
- First, enable Secure File Transfare Protocol (SFTP) service on all switches by using the following command and create an /sftp/import directory:
CLI (network-admin@switch)>switch* admin-sftp-modify enable
sftp password:
confirm sftp password:
CLI (network-admin@switch)>
OR
Enable shell access on all the switches to copy the file to the folder by using the command:
CLI(admin@netvisor) > switch* role-modify name network-admin shell
And access the shell:
CLI(admin@netvisor) > shell
network-admin@netvisor:~$ cd /sftp/import
network-admin@netvisor:/sftp/import$
- Copy the image to /sftp/import directory
root@server-os-9:~/# sftp sftp@switch
The authenticity of host 'switch (10.0.0.02)' can't be established.
RSA key fingerprint is SHA256:SI8VQZgJCppbrF4sRcby36Fx7rz3Hh5EJllPPyScLZU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'switch, 10.0.0.02 (RSA) to the list of known hosts.
* Welcome to Pluribus Networks Inc. Netvisor(R). This is a monitored system. *
* ACCESS RESTRICTED TO AUTHORIZED USERS ONLY *
* By using the Netvisor(R) CLI,you agree to the terms of the Pluribus Networks *
* End User License Agreement (EULA). The EULA can be accessed via *
* http://www.pluribusnetworks.com/eula or by using the command "eula-show" *
Password:
Connected to switch
sftp> cd import
sftp> put nvOS-6.1.0-6010018118-onvl.pkg
Uploading nvOS-6.1.0-6010018118-onvl.pkg
nvOS-6.1.0-6010018118-onvl.pkg
nvOS-6.1.0-6010018118-onvl.pkg 100% 332MB 7.5MB/s 04:00
Fabric upgrade with manual-reboot option
This option completes in three phases:
- Copy upgrade package to switches in fabric and start upgrade with fabric-upgrade-start command.
- Finish or abort fabric upgrade with fabric-upgrade-finish or fabric-upgrade-abort commands.
- Manually reboot switches with the switch-reboot command.
Starting the Fabric Upgrade
Before starting the upgrade process, ensure that all the nodes of the fabric are online, you can use the command fabric-node-show and check that the state is online for all the nodes.
Use the following command to copy the upgrade package from controller switch to all other switches in the fabric and start the upgrade process.
Run the fabric-upgrade-finish command to reboot the fabric and complete the upgrade process:
CLI network-admin@switch >fabric-upgrade-start packages <image> manual-reboot
The fabric-upgrade-start command defines all behavior of the upgrade process during the upgrade, that is, any optional settings need to be defined with the “start” command (see optional settings below). In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric. No configuration changes are permitted during the upgrade process.
The optional setting parameters for the fabric-upgrade-start command includes:
- auto-finish — specify to start software upgrade on the entire fabric. The default is no-auto-finish.
- abort-on-failure — specify if you want the upgrade to stop if there is a failure during the process.
- manual-reboot — specify if you want to manually reboot individual switches after the upgrade process. If you specify no-manual-reboot, all switches reboot automatically after the upgrade is complete.
- prepare — specify if you want to perform setup steps prior to performing the upgrade. This step copies the offline software package and then extracts and prepares for the final upgrade process. Once you begin the prepare process, you cannot add new switches to the fabric.
A sample upgrade process is explained below. Start the upgrade process by using the command:
CLI (network-admin@switch) > fabric-upgrade-start packages nvOS-6.1.0-6010018118-onvl.pkg auto-finish manual-reboot
Warning: This will start software upgrade on your entire fabric.
Please confirm y/n (Default: n):y
Scheduled background update.
Use:
* fabric-upgrade-status-show to check progress
* fabric-upgrade-finish to finalize when complete
* fabric-upgrade-abort to cancel cleanly
* switch-reboot on each switch in fabric to reboot manually when complete
Monitoring the Upgrade Process
The controller node monitors the progress of the upgrade on each node and reports the status of the upgrade by using the fabric-upgrade-status-show command. There are many interim steps to the upgrade process and to continually monitor the upgrade process use the show-interval (in seconds) option with the fabric-upgrade-status-show command:
Use the following commands to:
- To monitor the progress of the upgrade for each node in the fabric:
CLI (network-admin@switch) > fabric-upgrade-status-show
For example,
CLI (network-admin@switch) > fabric-upgrade-status-show show-interval 5
log switch state cluster
---------------------------------- --------------- ------------------ ----------------------
(0:00:36)Agent needs restart eq-colo-7 Agent restart wait aqr07-08(sec)
(0:00:34)Agent needs restart tucana-colo-7 Agent restart wait spine-cl(sec)
(0:03:57)Extracting signed bundle. aquarius-test-1 Running aquarius-test-1-2(sec)
(0:00:45)Agent needs restart dorado-test-3 Agent restart wait dorado-test-2-3(sec)
(0:03:57)Extracting signed bundle. aqr08 Running aqr07-08(pri)
(0:00:28)Agent needs restart switch* Agent restart wait spine-cl(pri)
(0:03:57)Extracting signed bundle. aquarius-test-2 Running aquarius-test-1-2(pri)
(0:00:38)Agent needs restart dorado-test-2 Agent restart wait dorado-test-2-3(pri)
(0:01:00)Agent needs restart scorpius10 Agent restart wait none
(0:00:47)Agent needs restart vnv-mini-1 Agent restart wait none
log switch state cluster
---------------------------------- --------------- ------------------ ----------------------
(0:00:36)Agent needs restart eq-colo-7 Agent restart wait aqr07-08(sec)
(0:00:34)Agent needs restart tucana-colo-7 Agent restart wait spine-cl(sec)
(0:04:02)Extracting packages. aquarius-test-1 Running aquarius-test-1-2(sec)
(0:00:45)Agent needs restart dorado-test-3 Agent restart wait dorado-test-2-3(sec)
(0:04:02)Extracting signed bundle. aqr08 Running aqr07-08(pri)
(0:00:28)Agent needs restart switch* Agent restart wait spine-cl(pri)
(0:04:02)Extracting packages. aquarius-test-2 Running aquarius-test-1-2(pri)
(0:00:38)Agent needs restart dorado-test-2 Agent restart wait dorado-test-2-3(pri)
(0:01:00)Agent needs restart scorpius10 Agent restart wait none
(0:00:47)Agent needs restart vnv-mini-1 Agent restart wait none
.
.
log switch state cluster
------------------------------------------------------------ --------------- ---------------- ----------------------
(0:01:53)Waiting for completion processing eq-colo-7 Upgrade complete aqr07-08(sec)
(0:01:25)Waiting for completion processing tucana-colo-7 Upgrade complete spine-cl(sec)
(0:06:24)Waiting for completion processing aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)
(0:02:29)Waiting for completion processing dorado-test-3 Upgrade complete dorado-test-2-3(sec)
(0:06:43)Waiting for completion processing aqr08 Upgrade complete aqr07-08(pri)
(0:01:23)Waiting to reboot tucana-colo-6* Upgrade complete spine-cl(pri)
(0:06:16)Waiting for completion processing aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)
(0:02:19)Waiting for completion processing dorado-test-2 Upgrade complete dorado-test-2-3(pri)
(0:06:09)Waiting for completion processing scorpius10 Upgrade complete none
(0:08:09)Upgrading nvOS 6.0.1-6000116966 -> 6.1.0-6010017911 vnv-mini-1 Running none
.
.
log switch state cluster
---------------------------------------- --------------- ---------------- ----------------------
(0:01:53)Current/Reboot BE: netvisor-16 eq-colo-7 Upgrade complete aqr07-08(sec)
(0:01:25)Waiting for completion processing tucana-colo-7 Upgrade complete spine-cl(sec)
(0:06:24)Waiting for completion processing aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)
(0:02:29)Destroy BE: netvisor-45 dorado-test-3 Upgrade complete dorado-test-2-3(sec)
(0:06:43)Waiting for completion processing aqr08 Upgrade complete aqr07-08(pri)
(0:01:23)Waiting to reboot switch* Upgrade complete spine-cl(pri)
(0:06:16)Current/Reboot BE: netvisor-10 aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)
(0:02:19)Software upgrade done. Waiting for reboot dorado-test-2 Upgrade complete dorado-test-2-3(pri)
(0:06:09)Waiting for completion processing scorpius10 Upgrade complete none
(0:13:17)Waiting for completion processing vnv-mini-1 Upgrade complete none
------------------------------------------ ---------------- ---------------------- --------------
(0:01:53)Upgrade complete eq-colo-7 Reboot wait aqr07-08(sec)
(0:01:25)Upgrade complete tucana-colo-7 Reboot wait spine-cl(sec)
(0:06:24)Upgrade complete aquarius-test-1 Reboot wait aquarius-test-1-2(sec)
(0:02:29)Upgrade complete dorado-test-3 Reboot wait dorado-test-2-3(sec)
(0:06:43)Upgrade complete aqr08 Reboot wait aqr07-08(pri)
(0:01:23)Sending Reboot wait message to handler switch* Reboot wait spine-cl(pri)
(0:06:16)Upgrade complete aquarius-test-2 Reboot wait aquarius-test-1-2(pri)
(0:02:19)Upgrade complete dorado-test-2 Reboot wait dorado-test-2-3(pri)
(0:06:09)Upgrade complete scorpius10 Reboot wait none
(0:13:17)Waiting for completion processing vnv-mini-1 Upgrade complete none
Connection to switch closed by remote host.
Connection to switch closed.
The first entry in the log is the elapsed time of the upgrade process. It does not include waiting time. The switch with the asterisk (*) is the upgrade controller node where the fabric-upgrade-start command was issued.
During a fabric-wide upgrade, the messages displayed by the fabric-upgrade-status-show command, based on the current progress status is described in table below:
Table 2-1: Fabric Upgrade Status Description
Message |
Description |
Downloading package bundle |
The upgrade package is downloaded from the initial node to all the other nodes. |
Extracting initial bundle |
Once successfully downloaded, the offline bundle is extracted. |
Extracting signed bundle |
The signature of the package is verified. |
Extracting packages |
The packages are extracted and readied to install. |
Agent needs restart |
The nodes wait for the package to be extracted on all nodes of the fabric. |
Upgrading nvOS * |
The switch upgrades Netvisor from the older version to the newer one |
Waiting for fabric-upgrade-finish/abort |
The switches wait for the user to complete the upgrade once it completes using either of the commands mentioned above. |
- Once the upgrade package is copied to all switches by fabric upgrade process and the upgrade process is completed, run the fabric-upgrade-finish or fabric-upgrade-abort command to either finish the upgrade or abort it.
CLI (network-admin@switch) > fabric-upgrade-finish
Once the upgrade phase is complete, all switches display the Upgrade complete message in the log field. You can then reboot the fabric. Following is an example:
CLI (network-admin@switch) > fabric-upgrade-finish
log switch state cluster
------------------------------------------------- ------ --------------- ------------
(0:13:00)Waiting for fabric-upgrade-finish/abort sw2 Upgrade complete spine(sec)
(0:12:04)Waiting for fabric-upgrade-finish/abort sw1* Upgrade complete spine(pri)
(0:16:49)Waiting for fabric-upgrade-finish/abort sw1 Upgrade complete none
(0:15:27)Waiting for fabric-upgrade-finish/abort sw2 Upgrade complete none
Finalizing upgrade. Manual reboot of nodes required.
- Manual reboot: each switch in the fabric need to be manually rebooted after the upgrade is completed. The fabric-upgrade-status-show command displays the status as switch waiting to reboot. For example,
CLI (network-admin@switch) > fabric-upgrade-status-show
fabric-upgrade-status-show: Switch waiting to reboot
At this point, upgrade is completed on all switches, reboot switches one at a time by the following command:
CLI (network-admin@switch) > switch-reboot
Note: You should reboot the controller switch at the end only.
Note: All the nodes of the fabric should be running the same software version for the Netvisor ONE features to work correctly.
- During the installation, if there is any issue, the upgrade process can be rolled back using the command fabric-upgrade-abort. To abort the upgrade process and return the switches to their prior state (no reboot needed):
CLI (network-admin@switch) > fabric-upgrade-abort
Aborts the fabric upgrade process. All changes to the switches are cleaned up and the server-switches do not reboot. The configuration lock on the fabric is also released. If you issue the fabric-upgrade-abort command during the upgrade process, it may take some time before the process stops because the upgrade has to reach a logical completion point before the changes are rolled back on the fabric. This allows the proper cleanup of the changes.
Warning: DO NOT use the switch-reboot command to reboot the switch while upgrade is in progress.
Note: During the fabric-upgrade process, the fabric configuration is locked throughout the entire process and you cannot change any configurations during the process.
Related Command:
Other related commands for fabric-upgrade includes:
- fabric-upgrade-prepare-cancel — cancels a fabric upgrade that was prepared earlier.
- fabric-upgrade-prepare-resume — resume a fabric upgrade that was prepared earlier.
- fabric-upgrade-prepare-show — displays the status of prepared upgrades on the fabric nodes.
Review bootenv
A new boot environment is built during the upgrade process. Upon reboot this new boot environment becomes active and the new software is up-and-running on the switch. Generally, it is not required to interact with the boot environments during the upgrade process. It may be necessary to review the boot environments using the command bootenv-show if there is some failure during the upgrade process.