Managing RMAs for Switches

RMA Use Case


 

Informational Note:  This process applies to Version 2.5.4 and earlier.

A primary case for an RMA is a failed switch in the network. The configuration can be restored to a replacement switch using the following commands:

RMA Process

This procedure assumes that a failed switch is part of a HA pair (cluster). Nodes that are part of a cluster automatically back up each other configuration.

For an RMA case, the host id differs between the new switch and the old failed switch. Both cluster membership and service object locations are tied to the host id.

1. Retrieve the host id of the old node:

CLI> fabric-node-show name <old-hostname> format name,id

2. Evict the old node from the fabric. This allows to process fabric provisioning operations before the RMA is com­plete. Additionally, the presence of the old node ID interferes with subsequent steps.

CLI> fabric-node-evict name <old-hostname>

3. Setup the new switch with basic settings, like hostname and IP address.

Perform this step at the console when the switch is booted for the first time and can be modified:

CLI> switch-setup-modify

4. Configure the new switch to rejoin the fabric. As it is part of a cluster, use the repeer-to-cluster-node option.

CLI> fabric-join name <fabric-name> repeer-to-cluster-node <existing-peer-name>

This downloads the entire backed up configuration from the cluster peer and restarts Netvisor OS to apply it. This restores local, cluster, and fabric scoped configuration.

5. After restart, any service objects that were present on the failed switch, must be migrated to the new host. Use the value retrieved in Step 1 for the location parameter:

CLI> object-location-modify location <old-hostid> new-location <new-hostname>

The above command executes a bulk migration of all service objects (vRouters, VNET managers, OVSDB Interfaces) and sub-objects

RMA Process for Version 2.6.0 and Later

Netvisor OS fabric objects such as vRouters, VLAGs, clusters,and others are created on a switch in the fabric. Netvisor OS tracks the switch using a location field, which is currently the host ID of the switch where the fabric objects are configured.

This presents various issues when replacing a faulty switch with a new switch and a new host ID. Fabric-wide configurations that reference the old host ID requires updating to the new host ID. These updates require a few manual extra steps and are either confusing, or it isn’t clear what commands need execution.

The proposed solution changes the location from a host ID to a fabric-specific location id that is assigned to each switch as it joins the fabric. Netvisor OS keeps the same ID during the RMA process and reduces the RMA process to a single command.

The solution introduces a new parameter, location-id, which is unique among the fabric nodes. Each node that joins the fabric is    assigned a new location ID when it joins. All configurations that require a location is tied to the location ID instead of the host ID. When Netvisor OS executes the command, switch-config-import, the location ID is inherited from the imported configuration. Therefore, no updates required across the fabric because all configurations refer to the correct location ID.

The following commands are no longer necessary to restore an imported configuration on a new switch:

Anew parameter, location-id, is added to the commands, node-info and fabric-node-show output. This displays the location of the node.

A new command, fabric-node-location-mappings, displays the current fabric host ID to the location ID mappings. This is used as input for the command, switch-config-import, when importing configurations from earlier versions of software.

If you are importing a configuration from an earlier version of software, use the following syntax:

(CLI network-admin@Spine1)>switch-config-import upgrade-location-mappings

 

If the imported configuration already has location IDs, the parameter is ignored.