Implementing a Fabric Upgrade


A switch that is part of a fabric can be upgraded locally using software-upgrade process or you can start a fabric-wide upgrade of all nodes in a fabric.  


While doing a fabric wide upgrade, the switch on which fabric-upgrade command is issued  acts as the controller node. It is mandatory to copy the package to /sftp/import/ directory of the controller node.  


Netvisor copies the upgrade package to other nodes in the fabric as part of fabric-wide upgrade.  The controller node monitors the progress of the upgrade on each node and you can view the status of the upgrade using the fabric-upgrade-status-show command.  The controller node is identified by an “*” after the switch name in the status output.


Netvisor ONE enables you to  implement a fabric-wide upgrade and reboot the switches at the same time or in a sequential order.


Upgrading the Fabric

Follow the below tasks to upgrade all switches in the fabric:


Upgrade Commands


Following are the commands that control the software or fabric upgrade process:


  • fabric-upgrade-start – begin the upgrade process specifying the package name
  • fabric-upgrade-status-show – monitor the progress of the upgrade for each node in the fabric
  • fabric-upgrade-finish – assuming auto-finish option is not used, begin the reboot process based on options specified when upgrade is started
  • fabric-upgrade-abort – abort the entire upgrade process and return switches to their prior state


The fabric-upgrade-start command defines all the future behavior of the upgrade process, meaning any optional settings need to be defined with the start command.  In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric.  No configuration changes are permitted during the upgrade process.



Before you start the fabric-wide upgrade


  1. Copy image to /sftp/import/ directory of controller node
  2. Ensure there is a reliable in-band and/or out-off-band connectivity between fabric members, which helps to distribute the software for the upgrade and monitor the progress of the upgrade process.  The distribution of software to the nodes of the fabric is done in parallel, that is, each node receives the software approximately at the same time.  An independent communications link is established over the fabric communications path to distribute the software to each node in the fabric.
  3. Console access to switches are recommended
  4. Switches do not accept any configuration commands once upgrade starts, so plan accordingly


Copying Image to the Switch


To copy the image:


  • First,  enable Secure File Transfare Protocol (SFTP) service by  using the CLI command and create an /sftp/import directory:


CLI (network-admin@switch1)>admin-sftp-modify enable

sftp password:

confirm sftp password:

CLI (network-admin@switch1)>


OR


Enable shell access to copy the file to the folder by using the command:


 CLI(admin@netvisor) > role-modify name network-admin shell


And access the shell:


 CLI(admin@netvisor) > shell

 network-admin@netvisor:~$ cd /sftp/import

 network-admin@netvisor:/sftp/import$


  • Copy the  image to /sftp/import directory


root@server-os-9:~/# sftp sftp@switch1

The authenticity of host 'switch1 (10.0.0.02)' can't be established.

RSA key fingerprint is SHA256:SI8VQZgJCppbrF4sRcby36Fx7rz3Hh5EJllPPyScLZU.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'switch1, 10.0.0.02 (RSA) to the list of known hosts.

* Welcome to Pluribus Networks Inc. Netvisor(R). This is a monitored system. *

* ACCESS RESTRICTED TO AUTHORIZED USERS ONLY *

* By using the Netvisor(R) CLI,you agree to the terms of the Pluribus Networks *

* End User License Agreement (EULA). The EULA can be accessed via *

* http://www.pluribusnetworks.com/eula or by using the command "eula-show" *

Password:

Connected to switch1

sftp> cd import

sftp> put nvOS-5.2.1-5020115690-onvl.pkg

Uploading nvOS-5.2.1-5020115690-onvl.pkg

nvOS-5.2.1-5020115690-onvl.pkg

nvOS-5.2.1-5020115690-onvl.pkg 100% 1870MB 7.5MB/s 04:00



Fabric upgrade with manual-reboot option


This option completes in three phases:


  • Copy upgrade package to switches in fabric and start upgrade with fabric-upgrade-start command.
  • Finish or abort fabric upgrade with fabric-upgrade-finish or fabric-upgrade-abort commands.
  • Manually reboot switches with the  switch-reboot command.


Starting the Fabric Upgrade


Before starting the upgrade process, ensure that all the nodes of the fabric are online, you can use the command fabric-node-show and check that the  state is online for all the nodes.


Use the following command to copy the upgrade package from controller switch to all other switches in the fabric and start the upgrade process.  Run the fabric-upgrade-finish command to reboot the fabric and complete the upgrade process:

CLI network-admin@switch >fabric-upgrade-start packages <image> manual-reboot


The fabric-upgrade-start command defines all behavior of the upgrade process during the upgrade, that is, any optional settings need to be defined with the “start” command (see optional settings below).  In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric.  No configuration changes are permitted during the upgrade process.


The optional setting parameters for the fabric-upgrade-start command includes:

  • auto-finish — you can specify to automatically reboot the entire fabric after the upgrade is complete. The default is no-auto-finish.
  • abort-on-failure — specify if you want the upgrade to stop if there is a failure during the process.
  • manual-reboot — specify if you want to manually reboot individual switches after the upgrade process. If you specify no-manual-reboot, all switches reboot automatically after the upgrade is complete.
  • prepare — specify if you want to perform setup steps prior to performing the upgrade. This step copies the offline software package and then extracts and prepares for the final upgrade process. Once you begin the prepare process, you cannot add new switches to the fabric.


A sample upgrade process is explained below:

Start the upgrade process by using the command:

CLI (network-admin@switch1) > fabric-upgrade-start packages nvOS-5.2.1-5020115690-onvl.pkg auto-finish

Warning: This will start software upgrade on your entire fabric.

Please confirm y/n (Default: n):y

Scheduled background update.


Monitoring the Upgrade Process


The controller node monitors the progress of the upgrade on each node and reports the status of the upgrade by using the fabric-upgrade-status-show command. There are many interim steps to the upgrade process and to continually monitor the upgrade process use the show-interval (in seconds) option with the fabric-upgrade-status-show command:

Use the following commands to:

  • To monitor the progress of the upgrade for each node in the fabric:

CLI (network-admin@switch1) > fabric-upgrade-status-show


For example,


CLI (network-admin@tucana-colo-6) > fabric-upgrade-status-show show-interval 5


log                                switch          state              cluster

---------------------------------- --------------- ------------------ ----------------------

(0:00:36)Agent needs restart       eq-colo-7       Agent restart wait aqr07-08(sec)

(0:00:34)Agent needs restart       tucana-colo-7   Agent restart wait spine-cl(sec)

(0:03:57)Extracting signed bundle. aquarius-test-1 Running            aquarius-test-1-2(sec)

(0:00:45)Agent needs restart       dorado-test-3   Agent restart wait dorado-test-2-3(sec)

(0:03:57)Extracting signed bundle. aqr08           Running            aqr07-08(pri)

(0:00:28)Agent needs restart       tucana-colo-6*  Agent restart wait spine-cl(pri)

(0:03:57)Extracting signed bundle. aquarius-test-2 Running            aquarius-test-1-2(pri)

(0:00:38)Agent needs restart       dorado-test-2   Agent restart wait dorado-test-2-3(pri)

(0:01:00)Agent needs restart       scorpius10      Agent restart wait none

(0:00:47)Agent needs restart       vnv-mini-1      Agent restart wait none

log                                switch          state              cluster

---------------------------------- --------------- ------------------ ----------------------

(0:00:36)Agent needs restart       eq-colo-7       Agent restart wait aqr07-08(sec)

(0:00:34)Agent needs restart       tucana-colo-7   Agent restart wait spine-cl(sec)

(0:04:02)Extracting packages.      aquarius-test-1 Running            aquarius-test-1-2(sec)

(0:00:45)Agent needs restart       dorado-test-3   Agent restart wait dorado-test-2-3(sec)

(0:04:02)Extracting signed bundle. aqr08           Running            aqr07-08(pri)

(0:00:28)Agent needs restart       tucana-colo-6*  Agent restart wait spine-cl(pri)

(0:04:02)Extracting packages.      aquarius-test-2 Running            aquarius-test-1-2(pri)

(0:00:38)Agent needs restart       dorado-test-2   Agent restart wait dorado-test-2-3(pri)

(0:01:00)Agent needs restart       scorpius10      Agent restart wait none

(0:00:47)Agent needs restart       vnv-mini-1      Agent restart wait none

.

.

log                                                          switch          state            cluster

------------------------------------------------------------ --------------- ---------------- ----------------------

(0:01:53)Waiting for completion processing                   eq-colo-7       Upgrade complete aqr07-08(sec)

(0:01:25)Waiting for completion processing                   tucana-colo-7   Upgrade complete spine-cl(sec)

(0:06:24)Waiting for completion processing                   aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)

(0:02:29)Waiting for completion processing                   dorado-test-3   Upgrade complete dorado-test-2-3(sec)

(0:06:43)Waiting for completion processing                   aqr08           Upgrade complete aqr07-08(pri)

(0:01:23)Waiting to reboot                                   tucana-colo-6*  Upgrade complete spine-cl(pri)

(0:06:16)Waiting for completion processing                   aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)

(0:02:19)Waiting for completion processing                   dorado-test-2   Upgrade complete dorado-test-2-3(pri)

(0:06:09)Waiting for completion processing                   scorpius10      Upgrade complete none

(0:08:09)Upgrading nvOS 5.1.2-5010215446 -> 5.2.1-5020115690 vnv-mini-1      Running          none

.

.

log                                                switch          state            cluster

-------------------------------------------------- --------------- ---------------- ----------------------

(0:01:53)Current/Reboot BE: netvisor-16            eq-colo-7       Upgrade complete aqr07-08(sec)

(0:01:25)Waiting for completion processing         tucana-colo-7   Upgrade complete spine-cl(sec)

(0:06:24)Waiting for completion processing         aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)

(0:02:29)Destroy BE: netvisor-45                   dorado-test-3   Upgrade complete dorado-test-2-3(sec)

(0:06:43)Waiting for completion processing         aqr08           Upgrade complete aqr07-08(pri)

(0:01:23)Waiting to reboot                         tucana-colo-6*  Upgrade complete spine-cl(pri)

(0:06:16)Current/Reboot BE: netvisor-10            aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)

(0:02:19)Software upgrade done. Waiting for reboot dorado-test-2   Upgrade complete dorado-test-2-3(pri)

(0:06:09)Waiting for completion processing         scorpius10      Upgrade complete none

(0:13:17)Waiting for completion processing         vnv-mini-1      Upgrade complete none

log                                             switch          state            cluster

----------------------------------------------- --------------- ---------------- ----------------------

(0:01:53)Upgrade complete                       eq-colo-7       Reboot wait      aqr07-08(sec)

(0:01:25)Upgrade complete                       tucana-colo-7   Reboot wait      spine-cl(sec)

(0:06:24)Upgrade complete                       aquarius-test-1 Reboot wait      aquarius-test-1-2(sec)

(0:02:29)Upgrade complete                       dorado-test-3   Reboot wait      dorado-test-2-3(sec)

(0:06:43)Upgrade complete                       aqr08           Reboot wait      aqr07-08(pri)

(0:01:23)Sending Reboot wait message to handler tucana-colo-6*  Reboot wait      spine-cl(pri)

(0:06:16)Upgrade complete                       aquarius-test-2 Reboot wait      aquarius-test-1-2(pri)

(0:02:19)Upgrade complete                       dorado-test-2   Reboot wait      dorado-test-2-3(pri)

(0:06:09)Upgrade complete                       scorpius10      Reboot wait      none

(0:13:17)Waiting for completion processing      vnv-mini-1      Upgrade complete none

Connection to tucana-colo-6 closed by remote host.

Connection to tucana-colo-6 closed.


The first entry in the log is the elapsed time of the upgrade process. It does not include waiting time. The switch with the asterisk (*) is the upgrade controller node where the fabric-upgrade-start command was issued.


During a fabric-wide upgrade,  the messages displayed by the fabric-upgrade-status-show command, based on the current progress status is described in table below:


Message

Description

Downloading package bundle

The upgrade package is downloaded from the initial node to all the other nodes.

Extracting initial bundle

Once successfully downloaded, the offline bundle is extracted.

Extracting signed bundle

The signature of the package is verified.

Extracting packages

The packages are extracted and readied to install.

Agent needs restart

The nodes wait for the package to be extracted on all nodes of the fabric.

Upgrading nvOS *

The switch upgrades Netvisor from the older version to the newer one

Waiting for fabric-upgrade-finish/abort

The switches wait for the user to complete the upgrade once it completes using either of the commands mentioned above.



  • Once the upgrade package is copied to all switches by fabric upgrade process and the upgrade process is completed, run the fabric-upgrade-finish or fabric-upgrade-abort  command to either finish the upgrade or abort it.  


CLI (network-admin@switch1) > fabric-upgrade-finish

You can issue this command any time during the fabric upgrade to reboot all nodes when upgrade is complete. Once the upgrade phase is complete, all switches display the Upgrade complete message in the log field. You can then reboot the fabric. Following is an example:


CLI (network-admin@switch1) > fabric-upgrade-finish


        log                                        switch   state                cluster

-------------------------------------------------  ------   ---------------    ------------

(0:13:00)Waiting for fabric-upgrade-finish/abort   sw2     Upgrade complete    spine(sec)

(0:12:04)Waiting for fabric-upgrade-finish/abort   sw1*    Upgrade complete    spine(pri)

(0:16:49)Waiting for fabric-upgrade-finish/abort   sw1     Upgrade complete    none

(0:15:27)Waiting for fabric-upgrade-finish/abort   sw2     Upgrade complete    none


Finalizing upgrade. Manual reboot of nodes required.


  • Manual reboot: each switch in the fabric need to be manually rebooted after the  upgrade is completed. The fabric-upgrade-status-show command displays the status as switch waiting to reboot.   For example,


CLI (network-admin@switch1) > fabric-upgrade-status-show

fabric-upgrade-status-show: Switch waiting to reboot


At this point, upgrade is completed on all switches, reboot switches one at a time by the following command:

CLI (network-admin@switch1) > switch-reboot


Note: You must reboot the controller switch at the end only.


Note: All the nodes of the fabric should be running the same software version for the Netvisor ONE features to work correctly.


  • During the installation, if there is any issue, the upgrade process can be rolled back using the command fabric-upgrade-abort.  To abort the upgrade process and return the switches to their prior state (no reboot needed):


CLI (network-admin@switch1) > fabric-upgrade-abort


Aborts the fabric upgrade process. All changes to the switches are cleaned up and the server-switches do not reboot. The configuration lock on the fabric is also released. If you issue the fabric-upgrade-abort command during the upgrade process, it may take some time before the process stops because the upgrade has to reach a logical completion point before the changes are rolled back on the fabric. This allows the proper cleanup of the changes.



Warning: DO NOT use the switch-reboot command to reboot the switch while upgrade is in progress.



Note: During the fabric-upgrade process, the fabric configuration is locked throughout the entire process and you cannot change any configurations during the process.


Related Command:


Other related commands for fabric-upgrade includes:

  • fabric-upgrade-prepare-cancel — cancels a fabric upgrade that was prepared earlier.
  • fabric-upgrade-prepare-resume — resume a fabric upgrade that was prepared earlier.
  • fabric-upgrade-prepare-show — displays the status of prepared upgrades on the fabric nodes.


Review bootenv


A new boot environment is built during the upgrade process.  Upon reboot this new boot environment becomes active and the new software is up-and-running on the switch.  Generally, it is not required to interact with the boot environments during the upgrade process.  It may be necessary to review the boot environments using the command bootenv-show if there is some failure during the upgrade process.