Understanding Fabric Transactions


The Adaptive Cloud Fabric uses transactions to synchronize configuration changes across the nodes of the fabric. Netvisor ONE records transactions as atomic operations that must either succeed and persist or fail and rollback, across the entire fabric. Transactions cannot be partially completed.


The fabric and Netvisor ONE adheres to the four standard transaction requirements: atomicity, consistency, isolation and durability (also collectively known as ACID).


Netvisor ONE does not require a fixed master node to coordinate all transactions across the fabric. Transactions start from the node where a command is run. This node is called the originator node or the originator, and coordinates the transactions with all the other fabric nodes.


Netvisor ONE commands originate from clients such as a CLI user, a RESTful API user or an external orchestration system. The commands are executed on a chosen switch, which becomes the originator node.


The originator first applies the configuration change specified by the command on the local node. If that fails, Netvisor rolls back any partial changes and then returns a failure message to the user. Only after the local change succeeds does the originator start the transaction. Netvisor ONE then atomically sends the configuration change such as create, delete, modify, add or remove commands to other fabric nodes.


Netvisor ONE transmits fabric transactions over a dedicated TCP socket, does not retain it and closes after each phase of the transaction. Transactions are encrypted using the TLS protocol.

 

All transactions are logged in a log file on a per scope basis in this location: /var/nvOS/etc/<scope>, where <scope> is Local, Cluster, or Fabric.

 

Scope defines the set of nodes participating in a transaction:


  • Local — only the local node participates in the transaction.
  • Cluster — only two redundant nodes participate in the transaction.
  • Fabric — all nodes in the same fabric instance participate in the transaction.


For several commands, you can specify the scope of the intended action and therefore the scope of the ensuing transaction.


If a failure occurs on the fabric, transactions on certain nodes in the fabric can become out of sync. Once transactions become out of sync, no further transactions can be executed across the scope of local, fabric, or cluster.


You can verify the fabric node states with the command, fabric-node-show, and review the fab-tid values for matching values.


CLI (network-admin@switch) > fabric-node-show format name,fab-name,fab-tid,state,device-state,


name      fab-name   fab-tid state  device-state

--------- ---------- ------- ------ ------------

pnswitch2 pnfabric   2       online ok

pnswitch1 pnfabric   2       online ok

 

The state column represents the communication status between members of the fabric and the device-state column represents the overall health of each switch. Also note that the fabric transaction ID (2 in this example) is consistent across all members of the fabric.


Mismatching  fab-tid values for one or more nodes within the same fabric instance represent a corner case that the fabric control logic typically prevents. For more details,  see the Rolling Back and Rolling Forward Transactions section for more details.