Linux Disk (BTRFS) Mirroring on NRU02, NRU03 and NRU-S0301 Platforms 



Disk mirroring, also known as RAID1 (Redundant Array of Independent Disks),  is the replication of logical disk volumes onto separate disk(s) in real time that ensures  continuous availability of data. A mirrored volume is a complete logical representation of separate volume copies.


In NetVisor OS, the disk mirroring functionality is available on the following platforms:


  • NRU03 and NRU-S0301 platforms (NetVisor OS version 6.1.0 onward)
  • NRU02 (NetVisor OS version 7.0.1 onward)


On the above platforms, the BTRFS partition is mirrored using BTRFS RAID1 support and the swap partition is mirrored using MDADM RAID support, thereby enabling high availability (HA).


When you upgrade the software from an earlier release version to NetVisor OS version 6.1.0 on NRU03 and NRU-S0301 platforms or to version 7.0.1 on NRU02 platforms, the disk mirroring capability is enabled automatically. The software upgrade process ensures that you do not have to schedule separate maintenance window for disk mirroring.


Note

  • Disk mirroring is enabled by default as part of the upgrade process on NRU03 and NRU-S0301 platforms (in version 6.1.0) and on NRU02 platforms (in version 7.0.1). 
  • The time taken for upgrade process is longer on these two platforms due to the mirror create process, however, subsequent upgrades do not require the additional time. 


It is recommended to disable the FS Monitor service (FS_MON) when Disk Mirror capability is enabled on NRU02, NRU03, and NRU-S0301 nodes that were running the FS_MON service. This service takes a recovery action to isolate the node during an SSD failure. However, the Disk Mirror capability eliminates the need to isolate the node provided the switch was replaced after an initial disk failure.  

 

To get the current status of the disk, use the command:

 

root@nru03-sff-tm-1:~# /opt/nvOS/bin/pn-scripts/fs_mon_status.sh

Read/Write


To disable the FS Monitor service from SHELL, use the command:

 

root@nru03-sff-tm-1:~# /opt/nvOS/bin/pn-scripts/fs_mon_disable.sh

Removed symlink /etc/systemd/system/multi-user.target.wants/svc-fs_mon.service.”


As mentioned earlier, disk mirroring is enabled by default during the upgrade process. However, you can toggle  (enable/disable) disk mirroring using the switch-setup-modify command on the above platforms. In this case, you must schedule a separate maintenance window for disk mirroring.



Note: When you toggle the disk mirroring functionality, the switch automatically reboots if the mirroring action is successful. 


To toggle disk mirroring, use the switch-setup-modify command as shown in the below example:


CLI (network-admin@nru-sff) > switch-setup-modify enable-disk-mirror

OR

CLI (network-admin@nru-sff) > switch-setup-modify disable-disk-mirror


Below is a sample output displayed during the software upgrade to NetVisor OS version 6.1.0:


CLI (network-admin@nru-sff) > software-upgrade package nvOS-6.1.0-6010018137-onvl.pkg 

Scheduled background update. Use software-upgrade-status-show to check. Switch will reboot itself. DO NOT reboot manually.

--------------------------------------------------------------------

 [Apr11.07:48:12] Starting software upgrade ...

 [Apr11.07:48:13] Checking available disk space...

 [Apr11.07:48:13] Avbl free space: 200.29G, Required: 1.28G

 [Apr11.07:48:13] Unpacking local package bundle...

 [Apr11.07:48:13] Extracting initial bundle.

 [Apr11.07:48:29] Decrypting signed bundle.

 [Apr11.07:48:30] Extracting signed bundle.

 [Apr11.07:48:46] Extracting packages.

 [Apr11.07:49:03] Fetching repository metadata.

 [Apr11.07:49:04] Skipping dpkg update in current boot image

 [Apr11.07:49:05] Computing package update requirements.

 [Apr11.07:49:05] Upgrade agent version: 6.0.1-6000116966

 [Apr11.07:49:05] Upgrading software upgrade framework

 [Apr11.07:49:09] Fetching repository metadata.

 [Apr11.07:49:09] Skipping dpkg update in current boot image

 [Apr11.07:49:09] Computing package update requirements.

 [Apr11.07:49:10] Upgrade agent version: 6.1.0-6010018137

 [Apr11.07:49:10] Upgrading nvOS 6.0.1-6000116966 -> 6.1.0-6010018137

 [Apr11.07:49:56] Mirroring ONVL disk.

 [Apr11.07:52:20] Upgrading nvOS 6.0.1-6000116966 -> 6.1.0-6010018137

 [Apr11.07:52:20] Software upgrade completed. Rebooting.

 Shared connection to nru-sff closed.


Below is  a sample configuration on enabling and disabling disk mirroring on a NRU-S0301 platform:


CLI (network-admin@nru-sff) > switch-setup-modify enable-disk-mirror

Warning: This will change disk mirroring state and reboot automatically.

Please confirm y/n (Default: n):y

Successfully enabled mirroring. Rebooting.


To verify the switch setup details, use the command:


CLI (network-admin@nru-sff) > switch-setup-show

switch-name:               nru-sff

mgmt-ip:                   10.13.6.48/23

mgmt-ip-assignment:        static

mgmt-ip6:                  fe80::9ac5:dbff:fe43:e3fa/64

mgmt-ip6-assignment:       autoconf

mgmt-link-state:           up

mgmt-link-speed:           1g

in-band-ip:                192.168.6.48/24

in-band-ip6:               fe80::640e:94ff:feff:9e66/64

in-band-ip6-assign:        autoconf

gateway-ip:                10.13.6.1

dns-ip:                    10.135.2.13

dns-secondary-ip:          10.20.4.1

domain-name:               pluribusnetworks.com

ntp-secondary-server:      0.ubuntu.pool.ntp.org

ntp-tertiary-server:       1.ubuntu.pool.ntp.org

timezone:                  America/Los_Angeles

date:                      2021-11-04,18:01:44

hostid:                    150998527

location-id:               2

enable-host-ports:         yes

banner:                    * Welcome to Arista Networks Inc. Netvisor(R). This is a monitored system.   *

mgmt-lag:                  disable

mgmt-lacp-mode:            off

device-id:                 R1279-F0001-01XXXXXXXXXX

ntp:                       on

disk-mirror:               enabled

banner:                    *ACCESS RESTRICTED TO AUTHORIZED USERS ONLY

    *

banner:                    * By using the Netvisor(R) CLI,you agree to the terms of the Arista Networks *

banner:                    * End User License Agreement (EULA). The EULA can be accessed via*

banner:                    * http://www.arista.com/eula or by using the command "eula-show" *


To disable disk mirroring, use the command:


CLI (network-admin@nru-sff) > switch-setup-modify disable-disk-mirror

Warning: This will change disk mirroring state and reboot automatically.

Please confirm y/n (Default: n):y

Successfully disabled mirroring. Rebooting.


Note: Once the disk-mirror functionality is disabled using the above CLI command, the subsequent software-upgrades do not enable disk mirroring on that switch.


To verify the change:


CLI (network-admin@nru-sff) > switch-setup-show

switch-name:               nru-sff

mgmt-ip:                   10.13.6.48/23

mgmt-ip-assignment:        static

mgmt-ip6:                  fe80::9ac5:dbff:fe43:e3fa/64

mgmt-ip6-assignment:       autoconf

mgmt-link-state:           up

mgmt-link-speed:           1g

in-band-ip:                192.168.6.48/24

in-band-ip6:               fe80::640e:94ff:feff:9e66/64

in-band-ip6-assign:        autoconf

gateway-ip:                10.13.6.1

dns-ip:                    10.135.2.13

dns-secondary-ip:          10.20.4.1

domain-name:               pluribusnetworks.com

ntp-secondary-server:      0.ubuntu.pool.ntp.org

ntp-tertiary-server:       1.ubuntu.pool.ntp.org

timezone:                  America/Los_Angeles

date:                      2021-11-04,18:40:03

hostid:                    150998527

location-id:               2

enable-host-ports:         yes

banner:                    * Welcome to Arista Networks Inc. Netvisor(R). This is a monitored system.   *

mgmt-lag:                  disable

mgmt-lacp-mode:            off

device-id:                 R1279-F0001-01XXXXXXXXXX

ntp:                       on

disk-mirror:               disabled

banner:                    *ACCESS RESTRICTED TO AUTHORIZED USERS ONLY

    *

banner:                    * By using the Netvisor(R) CLI,you agree to the terms of the Arista Networks *

banner:                    * End User License Agreement (EULA). The EULA can be accessed via*

banner:                    * http://www.arista.com/eula or by using the command "eula-show" *


The disk mirroring configuration changes are logged into the disk mirror log file and is available at: /var/nvOS/log/disk_mirror.log


Additionally, an alert (that can help network admins to take required action) is generated and are saved to the console, event, and syslog messages when:


  • One of the disk goes missing in BTRFS RAID1 configuration
  • BTRFS device statistics has non-zero IO errors


For example, below is the sample output in different locations when a disk goes missing:


  • From the output of log-event-show command:


CLI (network-admin@nru-sff) > log-event-show

 

  • From the /var/log/syslog file:


Feb 22 20:52:19 nru-sff root: BTRFS error: missing device in BTRFS mirror


  • From the console:


nru-sff login: BTRFS error: missing device in BTRFS mirror

root@nru-sff:~#BTRFS error: missing device in BTRFS mirror


General Guidelines


  • On NRU02 platforms (available from NetVisor OS version 7.0.1 onward),  the Solaris disk gets overwritten and is used for mirroring the BTRFS partition. That is, you cannot boot to Solaris BE after the  mirroring operation. Note that disabling the mirror functionality using CLI command does not bring back the Solaris partitions. 
  • On NRU02, NRU03, and NRU-S0301 platforms, to avoid the disk mirroring as part of upgrade process to NetVisor OS version 7.0.1, /etc/do_not_mirror_disk should be present. This can be achieved by entering "touch /etc/do_not_mirror_disk" at the Linux prompt.
  • Both disks on the switch should be identical in size for disk mirroring to be successful. If the disks are not identical, then the mirroring process aborts.
  • If one of the disks goes offline due to hardware issues, then it is expected that the disk stays offline because BTRFS may not work if the disk goes offline intermittently and considers the offline disk as corrupt disk.
  • If either disk fails while disk mirroring is enabled, then a switch replacement via RMA process is required as the disk cannot be replaced in the field.
  • If the second disk fails for any reason, then the switch changes to read-only mode and loses normal functioning capability. This scenario can be avoided by promptly replacing the switch if and when the first disk fails.
  • MDADM package is installed on all BEs (boot-environments), during software upgrade, to enable swapping of disks. 
  • While upgrading to NetVisor OS version 6.1.0, all existing BEs (including BEs from older versions) on both NRU03 and NRU-S0301 platforms get the mirroring functionality enabled. Therefore, after upgrading to version 6.1.0, if you later rollback to a previous version on the same switch, then the mirroring functionality is preserved and the older BEs continue to get mirrored onto the second disk.
  • Only the latest version of OS/ONL field diagnostics released in November 2020 is retained during disk mirroring. All older versions or other partitions on disk 2 are removed during the disk mirroring process.
  • If it is required to install OS/ONL field diagnostics, then first disable disk mirroring, install the field diagnostics, and enable disk mirroring again.  The expected mirror creation time is 10-15 minutes.
  • As part of mirroring process, the SWAP size is reduced from 32 GB to 24 GB to accommodate the ONL field diagnostics partition. This is a one-time operation and cannot be rolled back, which means that disabling disk mirroring does not increase the swap size back to 32 GB.


north
    keyboard_arrow_up
    keyboard_arrow_down
    description
    print
    feedback
    support
    business
    rss_feed
    south