Implementing a Fabric Upgrade 


A switch that is part of a fabric can be upgraded locally using software-upgrade process or you can start a fabric-wide upgrade of all nodes in the fabric.  

While performing a fabric wide upgrade, the switch on which fabric-upgrade command is issued  acts as the controller node. It is mandatory to copy the package to /sftp/import/ directory of the controller node.  


NetVisor OS copies the upgrade package to other nodes in the fabric as part of fabric-wide upgrade.  The controller node monitors the progress of the upgrade on each node and you can view the status of the upgrade using the fabric-upgrade-status-show command.  The controller node is identified by an “*” after the switch name in the status output.


NetVisor OS enables you to  implement a fabric-wide upgrade and reboot the switches at the same time or in a sequential order. 


Upgrading the Fabric

Follow the tasks explained here to upgrade all switches in the fabric:


Upgrade Commands


Following are the commands that control the fabric upgrade process:


  • fabric-upgrade-start – begin the upgrade process on entire fabric by specifying the package name 
  • fabric-upgrade-status-show – monitor the progress of the upgrade for each node in the fabric
  • fabric-upgrade-finish – finalize when upgrade is complete
  • fabric-upgrade-abort – abort the entire upgrade process and return switches to their prior state


The fabric-upgrade-start command defines all the future behavior of the upgrade process, that is, any optional settings need to be defined with the start command.  In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric.  No configuration changes are permitted during the upgrade process.


The fabric-upgrade-start command includes the following options:


CLI (network-admin@switch) > fabric-upgrade-start        


fabric-upgrade-start                              

Starts the software upgrade or prepare process on entire fabric.

packages sftp-files name       

Comma separate list of software bundles.

Specify between 0 and 7 of the following options:


auto-finish|no-auto-finish                

Automatically starts the software upgrade on the entire fabric.  The default option is no-auto-finish.

abort-on-failure|no-abort-on-failure      

                   

Whether to abort fabric upgrade if a node fails or not. The default option is no-abort-on-failure

manual-reboot|no-manual-reboot            

Whether to defer to user for reboot after upgrade.

download-count 1..5                       

Number of concurrent downloads. The default value is 5 (maximum). This option is introduced in version 6.1.0.

prepare|no-prepare                        

Perform setup steps for the actual upgrade.

upload-server upload-server-string        

Upload config file to server via SCP.

server-password                           

SCP host password.


During a fabric upgrade, all members of fabric downloads the upgrade bundle from controller node. By default, fabric upgrade allows a maximum of 5 switches in the fabric to download the upgrade bundle from controller at a given time. 


However, this can cause issues if there is bandwidth constraint or can overwhelm the controller node if the controller is of a lower hardware specification switch. To address this issue, starting with NetVisor OS version 6.1.0, you can use the download-count parameter of fabric-upgrade command to reduce the number of concurrent downloads depending upon your network conditions and hardware capabilities of the controller node. By default, the download-count is five.


For example, to set the download count to 2, use the command:

       

CLI (network-admin@switch) > fabric-upgrade-start packages nvOS-6.0.1-6010017911-onvl.pkg download-count 2


Before you start the fabric-wide upgrade


  1. Copy image to /sftp/import/ directory of controller node.
  2. Ensure there is a reliable in-band and/or out-off-band connectivity between fabric members, which helps to distribute the software for the upgrade and monitor the progress of the upgrade process.  The distribution of software to the nodes of the fabric is done in parallel, that is, each node receives the software approximately at the same time.  An independent communications link is established over the fabric communications path to distribute the software to each node in the fabric.
  3. Console access to switches are recommended.
  4. Switches do not accept any configuration commands once upgrade starts, so plan accordingly.


Copying Image to the Switch


To copy the image:


  • First,  enable Secure File Transfare Protocol (SFTP) service on all switches by  using the following command and create an /sftp/import directory: 


CLI (network-admin@switch)>switch* admin-sftp-modify enable

sftp password:

confirm sftp password:

CLI (network-admin@switch)>


OR


Enable shell access on all the switches to copy the file to the folder by using the command:


 CLI(admin@netvisor) > switch* role-modify name network-admin shell


And access the shell:


 CLI(admin@netvisor) > shell

 network-admin@netvisor:~$ cd /sftp/import

 network-admin@netvisor:/sftp/import$


  • Copy the image to /sftp/import directory


root@server-os-9:~/# sftp sftp@switch

The authenticity of host 'switch (10.0.0.02)' can't be established.

RSA key fingerprint is SHA256:SI8VQZgJCppbrF4sRcby36Fx7rz3Hh5EJllPPyScLZU.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'switch, 10.0.0.02 (RSA) to the list of known hosts.

* Welcome to Arista Networks Inc. Netvisor(R). This is a monitored system.     *

*                ACCESS RESTRICTED TO AUTHORIZED USERS ONLY                    *

* By using the Netvisor(R) CLI, you agree to the terms of the Arista Networks  *

* End User License Agreement (EULA). The EULA can be accessed via              *

* http://www.arista.com/eula or by using the command "eula-show"

Password:

Connected to switch

sftp> cd import

sftp> put nvOS-6.1.0-6010018118-onvl.pkg

Uploading nvOS-6.1.0-6010018118-onvl.pkg 

nvOS-6.1.0-6010018118-onvl.pkg

nvOS-6.1.0-6010018118-onvl.pkg 100% 332MB 7.5MB/s 04:00


Fabric upgrade with manual-reboot option


This option completes in three phases:


  • Copy upgrade package to switches in fabric and start upgrade with fabric-upgrade-start command.
  • Finish or abort fabric upgrade with fabric-upgrade-finish or fabric-upgrade-abort commands.
  • Manually reboot switches with the  switch-reboot command.


Starting the Fabric Upgrade 


Before starting the upgrade process, ensure that all the nodes of the fabric are online, you can use the command fabric-node-show and check that the  state is online for all the nodes.


Use the following command to copy the upgrade package from controller switch to all other switches in the fabric and start the upgrade process.  

Run the fabric-upgrade-finish command to reboot the fabric and complete the upgrade process:

CLI network-admin@switch >fabric-upgrade-start packages <image> manual-reboot


The fabric-upgrade-start command defines all behavior of the upgrade process during the upgrade, that is, any optional settings need to be defined with the “start” command (see optional settings below).  In addition, the fabric-upgrade-start command acquires a configuration lock from all the members of the fabric.  No configuration changes are permitted during the upgrade process.


The optional setting parameters for the fabric-upgrade-start command includes:

  • auto-finish — specify to start software upgrade on the  entire fabric. The default is no-auto-finish.
  • abort-on-failure — specify if you want the upgrade to stop if there is a failure during the process.
  • manual-reboot — specify if you want to manually reboot individual switches after the upgrade process. If you specify no-manual-reboot, all switches reboot automatically after the upgrade is complete.
  • prepare — specify if you want to perform setup steps prior to performing the upgrade. This step copies the offline software package and then extracts and prepares for the final upgrade process. Once you begin the prepare process, you cannot add new switches to the fabric.


A sample upgrade process is explained below. Start the upgrade process by using the command:

CLI (network-admin@switch) > fabric-upgrade-start packages nvOS-6.1.0-6010018118-onvl.pkg auto-finish manual-reboot

Warning: This will start software upgrade on your entire fabric.

Please confirm y/n (Default: n):y

Scheduled background update.

Use:

* fabric-upgrade-status-show to check progress

* fabric-upgrade-finish to finalize when complete

* fabric-upgrade-abort to cancel cleanly

* switch-reboot on each switch in fabric to reboot manually when complete


Monitoring the Upgrade Process


The controller node monitors the progress of the upgrade on each node and reports the status of the upgrade by using the fabric-upgrade-status-show command. There are many interim steps to the upgrade process and to continually monitor the upgrade process use the show-interval (in seconds) option with the fabric-upgrade-status-show command:

Use the following commands to:

  • To monitor the progress of the upgrade for each node in the fabric: 

CLI (network-admin@switch) > fabric-upgrade-status-show


For example,


CLI (network-admin@switch) > fabric-upgrade-status-show show-interval 5


log                                switch          state              cluster

---------------------------------- --------------- ------------------ ----------------------

(0:00:36)Agent needs restart       eq-colo-7       Agent restart wait aqr07-08(sec)

(0:00:34)Agent needs restart       tucana-colo-7   Agent restart wait spine-cl(sec)

(0:03:57)Extracting signed bundle. aquarius-test-1 Running            aquarius-test-1-2(sec)

(0:00:45)Agent needs restart       dorado-test-3   Agent restart wait dorado-test-2-3(sec)

(0:03:57)Extracting signed bundle. aqr08           Running            aqr07-08(pri)

(0:00:28)Agent needs restart       switch*         Agent restart wait spine-cl(pri)

(0:03:57)Extracting signed bundle. aquarius-test-2 Running            aquarius-test-1-2(pri)

(0:00:38)Agent needs restart       dorado-test-2   Agent restart wait dorado-test-2-3(pri)

(0:01:00)Agent needs restart       scorpius10      Agent restart wait none

(0:00:47)Agent needs restart       vnv-mini-1      Agent restart wait none

log                                switch          state              cluster

---------------------------------- --------------- ------------------ ----------------------

(0:00:36)Agent needs restart       eq-colo-7       Agent restart wait aqr07-08(sec)

(0:00:34)Agent needs restart       tucana-colo-7   Agent restart wait spine-cl(sec)

(0:04:02)Extracting packages.      aquarius-test-1 Running            aquarius-test-1-2(sec)

(0:00:45)Agent needs restart       dorado-test-3   Agent restart wait dorado-test-2-3(sec)

(0:04:02)Extracting signed bundle. aqr08           Running            aqr07-08(pri)

(0:00:28)Agent needs restart       switch*         Agent restart wait spine-cl(pri)

(0:04:02)Extracting packages.      aquarius-test-2 Running            aquarius-test-1-2(pri)

(0:00:38)Agent needs restart   dorado-test-2  Agent restart wait dorado-test-2-3(pri)

(0:01:00)Agent needs restart   scorpius10     Agent restart wait none

(0:00:47)Agent needs restart   vnv-mini-1     Agent restart wait none

.

.

log                                                          switch          state            cluster

------------------------------------------------------------ --------------- ---------------- ----------------------

(0:01:53)Waiting for completion processing                   eq-colo-7       Upgrade complete aqr07-08(sec)

(0:01:25)Waiting for completion processing                   tucana-colo-7   Upgrade complete spine-cl(sec)

(0:06:24)Waiting for completion processing                   aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)

(0:02:29)Waiting for completion processing                   dorado-test-3   Upgrade complete dorado-test-2-3(sec)

(0:06:43)Waiting for completion processing                   aqr08           Upgrade complete aqr07-08(pri)

(0:01:23)Waiting to reboot                                   tucana-colo-6*  Upgrade complete spine-cl(pri)

(0:06:16)Waiting for completion processing                   aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)

(0:02:19)Waiting for completion processing                   dorado-test-2   Upgrade complete dorado-test-2-3(pri)

(0:06:09)Waiting for completion processing                   scorpius10      Upgrade complete none

(0:08:09)Upgrading nvOS 6.0.1-6000116966 -> 6.1.0-6010017911 vnv-mini-1      Running          none

.

.

log                                                switch          state            cluster

---------------------------------------- --------------- ---------------- ----------------------

(0:01:53)Current/Reboot BE: netvisor-16            eq-colo-7       Upgrade complete aqr07-08(sec)

(0:01:25)Waiting for completion processing         tucana-colo-7   Upgrade complete spine-cl(sec)

(0:06:24)Waiting for completion processing         aquarius-test-1 Upgrade complete aquarius-test-1-2(sec)

(0:02:29)Destroy BE: netvisor-45                   dorado-test-3   Upgrade complete dorado-test-2-3(sec)

(0:06:43)Waiting for completion processing         aqr08           Upgrade complete aqr07-08(pri)

(0:01:23)Waiting to reboot                         switch*  Upgrade complete spine-cl(pri)

(0:06:16)Current/Reboot BE: netvisor-10            aquarius-test-2 Upgrade complete aquarius-test-1-2(pri)

(0:02:19)Software upgrade done. Waiting for reboot dorado-test-2   Upgrade complete dorado-test-2-3(pri)

(0:06:09)Waiting for completion processing         scorpius10      Upgrade complete none

(0:13:17)Waiting for completion processing         vnv-mini-1      Upgrade complete none

------------------------------------------ ---------------- ---------------------- --------------

(0:01:53)Upgrade complete                       eq-colo-7       Reboot wait      aqr07-08(sec)

(0:01:25)Upgrade complete                       tucana-colo-7   Reboot wait      spine-cl(sec)

(0:06:24)Upgrade complete                       aquarius-test-1 Reboot wait      aquarius-test-1-2(sec)

(0:02:29)Upgrade complete                       dorado-test-3   Reboot wait      dorado-test-2-3(sec)

(0:06:43)Upgrade complete                       aqr08           Reboot wait      aqr07-08(pri)

(0:01:23)Sending Reboot wait message to handler switch*  Reboot wait      spine-cl(pri)

(0:06:16)Upgrade complete                       aquarius-test-2 Reboot wait      aquarius-test-1-2(pri)

(0:02:19)Upgrade complete                       dorado-test-2   Reboot wait      dorado-test-2-3(pri)

(0:06:09)Upgrade complete                       scorpius10      Reboot wait      none

(0:13:17)Waiting for completion processing      vnv-mini-1      Upgrade complete none

Connection to switch closed by remote host.

Connection to switch closed.


The first entry in the log is the elapsed time of the upgrade process. It does not include waiting time. The switch with the asterisk (*) is the upgrade controller node where the fabric-upgrade-start command was issued.


During a fabric-wide upgrade,  the messages displayed by the fabric-upgrade-status-show command, based on the current progress status is described in table below:

Table 2-1: Fabric Upgrade Status Description

Message

Description

Downloading package bundle

The upgrade package is downloaded from the initial node to all the other nodes.

Extracting initial bundle

Once successfully downloaded, the offline bundle is extracted.

Extracting signed bundle

The signature of the package is verified.

Extracting packages

The packages are extracted and readied to install.

Agent needs restart

The nodes wait for the package to be extracted on all nodes of the fabric.

Upgrading nvOS *

The switch upgrades NetVisor from the older version to the newer one

Waiting for fabric-upgrade-finish/abort

The switches wait for the user to complete the upgrade once it completes using either of the commands mentioned above.


  • Once the upgrade package is copied to all switches by fabric upgrade process and the upgrade process is completed, run the fabric-upgrade-finish or fabric-upgrade-abort  command to either finish the upgrade or abort it.  


CLI (network-admin@switch) > fabric-upgrade-finish 


Once the upgrade phase is complete, all switches display the Upgrade complete message in the log field. You can then reboot the fabric. Following is an example:


CLI (network-admin@switch) > fabric-upgrade-finish 


        log                                        switch   state                cluster

-------------------------------------------------  ------   ---------------    ------------

(0:13:00)Waiting for fabric-upgrade-finish/abort   sw2     Upgrade complete    spine(sec)

(0:12:04)Waiting for fabric-upgrade-finish/abort   sw1*    Upgrade complete    spine(pri)

(0:16:49)Waiting for fabric-upgrade-finish/abort   sw1     Upgrade complete    none

(0:15:27)Waiting for fabric-upgrade-finish/abort   sw2     Upgrade complete    none


Finalizing upgrade. Manual reboot of nodes required.


  • Manual reboot: each switch in the fabric need to be manually rebooted after the  upgrade is completed. The fabric-upgrade-status-show command displays the status as switch waiting to reboot.   For example,


CLI (network-admin@switch) > fabric-upgrade-status-show

fabric-upgrade-status-show: Switch waiting to reboot


At this point, upgrade is completed on all switches, reboot switches one at a time by the following command:

CLI (network-admin@switch) > switch-reboot


Note: You should reboot the controller switch at the end only.


Note: All the nodes of the fabric should be running the same software version for the NetVisor OS features to work correctly.


  • During the installation, if there is any issue, the upgrade process can be rolled back using the command fabric-upgrade-abort.  To abort the upgrade process and return the switches to their prior state (no reboot needed):


CLI (network-admin@switch) > fabric-upgrade-abort 


Aborts the fabric upgrade process. All changes to the switches are cleaned up and the server-switches do not reboot. The configuration lock on the fabric is also released. If you issue the fabric-upgrade-abort command during the upgrade process, it may take some time before the process stops because the upgrade has to reach a logical completion point before the changes are rolled back on the fabric. This allows the proper cleanup of the changes.



Warning: DO NOT use the switch-reboot command to reboot the switch while upgrade is in progress.



Note: During the fabric-upgrade process, the fabric configuration is locked throughout the entire process and you cannot change any configurations during the process.


Related Command:


Other related commands for fabric-upgrade includes:

  • fabric-upgrade-prepare-cancelcancels a fabric upgrade that was prepared earlier.
  • fabric-upgrade-prepare-resumeresume a fabric upgrade that was prepared earlier.
  • fabric-upgrade-prepare-showdisplays the status of prepared upgrades on the fabric nodes.


Review bootenv


A new boot environment is built during the upgrade process.  Upon reboot this new boot environment becomes active and the new software is up-and-running on the switch.  Generally, it is not required to interact with the boot environments during the upgrade process.  It may be necessary to review the boot environments using the command bootenv-show if there is some failure during the upgrade process.

north
    keyboard_arrow_up
    keyboard_arrow_down
    description
    print
    feedback
    support
    business
    rss_feed
    south