Resilvering multiple drives at once in the same mirror vdev

sunbart · September 17, 2023, 2:57pm

My NAS has a pool of 6 6TB drives, set up as a stripe of 2-drive mirror vdevs. Recently, one drive went Faulted and the second drive in that same mirror went Degraded.

I have 2 new 8TB drives ready to replace both drives in the mirror and physical slots in the drive cage to add both the new drives to the system while keeping the originals in place. Though as I understand it, the Faulted drive is completely dead and so I can just get rid of it and hope that the Degraded drive survives long enough to resilver a replacement. (Is that right?)

My question is, is it a good idea to add both the new drives to the mirror vdev and resilver them at the same time? Is that even a thing that works? Would ZFS read from the Degraded drive only once and resilver both the new drives at once? Or would it resilver one new drive from the original and then the second new drive from the original and the already-resilvered new drive?

If I understand things correctly, resilvering both the new drives would get me fully redundant faster, but the question is, would it put a worse load on the already Degraded drive, which is the only copy of that data there is (in that NAS - all the mission-critical data is backed up)

Just for info, I’m running TrueNAS Scale on an old desktop (with an i7 4790k, 16gigs of RAM and an LSI SAS9200-8i), with the drives in a Frankensteined drive cage off an old 2u server So while the setup is janky, I think (read: I guess and I hope) the system is powerful enough to do the two resilvers at the same time.

I googled this for a while and everything I found was talking about resilvering multiple drives, but each in a different mirror vdev.

Also, how would this change if the original drives were both Online and I was just replacing them to gain capacity?

quartsize · September 19, 2023, 2:25pm

It’d be easier to tell if you pasted the zpool status output, but it sounds like there might not be anything wrong with the second drive, just that the mirror it’s part of is Degraded because one of its component disks is Faulted. As to the Faulted drive, that sounds more like a too-many-errors situation than a completely-dead situation. Does the underlying block device still exist? Does smartctl -x show a bunch of grown defects or uncorrected errors? Medium errors? You may just need to check the cabling and zpool clear it (and then proceed with attaching a third disk just in case).

I’m not sure what happens if you zpool attach two devices to a mirror at once. You may be able to find out by simulating the process with a file-pool. It may also depend on whether you use zpool attach -s. Either way, the bottleneck seems like it’s going to be the single disk being read from rather than the other things you listed.