Rolling Back and Rolling Forward Transactions

NetVisor OS maintains a log file with the list of transactions with their respective undo commands  to be able to revert back, when necessary, to a previous state, that is, to roll back one or more transactions starting from the latest one. On the other hand, the list of executed commands can be used to redo certain transactions, in other words to roll forward one or more transactions.

However, this is only desirable under special circumstances, because the auto-recover feature by default automatically makes sure that all nodes are synchronized to the latest transaction.

For example in case of rare  conditions in which transactions diverge on different nodes (despite auto-recover), a roll back or roll forward action may be performed manually through the corresponding command. 

However, the auto-recover function may need to be temporarily disabled on the affected node(s) to permit the desired action.

The transaction-rollback-to command is used to roll back to an earlier fabric transaction number. The transaction-rollforward-to command is instead used to roll forward to a subsequent fabric transaction number.

For instance, the fabric state gets accidentally out of sync according to the fabric-node-show command output with, say, a missing interface addition transaction:

CLI (network-admin@pnswitch1) > fabric-node-show format name,fab-name,fab-tid,state,device-state, 

name      fab-name   fab-tid  state  device-state

--------- ---------- -------  ------ ------------

pnswitch2  pnfabric  1        online  ok 

pnswitch1  pnfabric  2        online  ok 

Hence the state can be rolled back to a previously synced ID to restore fabric-wide (scope fabric) consistency:

CLI (network-admin@pnswitch1) > transaction-rollback-to scope fabric tid 1

Warning: rolled back transactions are unrecoverable unless another fabric node has them. Proceed? [y/n] y 

After successfully rolling back the transaction (i.e., no error message is printed on the console), the change completes and the transaction is removed from the transaction log. 

Alternatively the state can be rolled forward to reattempt to successfully redo the previously failed fabric-wide interface addition:

CLI (network-admin@pnswitch1) > transaction-rollforward-to scope fabric tid 2

Added interface eth2.13

After successfully rolling forward a transaction (i.e., no error is printed on the console), the change completes and the transaction log is updated.

If multiple nodes go out of sync, you must recover each node separately.

An alternative approach (usually reserved to customer support for special cases) is to try to force a roll back or roll forward action when the configuration is in sync but the transaction ID fails to get updated:

CLI (network-admin@pnswitch1) > transaction-rollforward-to scope fabric tid 2 ignore-error

Added interface eth2.13

When a node is out of sync with the other nodes in the fabric, it can catch up by rolling forward all the missing transactions, which it can obtain from another fabric node. If auto-recovery is enabled, this is done without user intervention. If auto-recovery is disabled, the transaction-rollforward-to command can be invoked. 

Starting from NetVisor OS release 7.0.0, a round-robin enhancement to fabric synchronization has been implemented that helps in dealing with connection failures. The software attempts to contact the best possible node, which is the one that has the highest TID amongst all the peer nodes in the fabric (unless an explicit remote-node is specified in the CLI to be used). If that attempt fails, instead of retrying with the same node, NetVisor OS skips any node that it had previously attempted and failed to sync up with. 

In other words, when auto-recovery is enabled, the failed nodes are skipped in every resynchronization attempt until all nodes have been tried (Offline nodes are skipped as well). Similarly, when auto-recovery is disabled and a (manual) attempt to roll-forward fails with a node, in the next iteration any failed node(s) is/are skipped. Once a synchronization attempt has failed with all the nodes in the fabric, the software restarts from the first node in a round-robin fashion. 

In release 7.0.0 a new column is added to the fabric-node-show command output to display the roll-forward status of the fabric peer node(s): 

CLI (network-admin@switch) > fabric-node-show format rollforward-failed, 




In the rare cases when you apply a configuration to the fabric and a node does not respond to such configuration, as a last resort, you may want to evict the node from the fabric to troubleshoot the problem on the specific device. 

 To evict a node, that node must be offline, otherwise the eviction command will fail. Then you can use the fabric-node-evict command to perform the eviction process like so:  

CLI (network-admin@switch) > fabric-node-evict name pnswitch2  


CLI (network-admin@switch) > fabric-node-evict id b000021:52a1b620