TL;DR - No. The pool could not be saved in the face of both drives malfunctioning. However contents have been restored from recent (previous day) backups. Details below for your viewing pleasure.
Not a rhetorical question, unfortunately. A couple months ago one of the drives (in the mirror) started playing up. When I looked into warranty, it was one day past the purchase date. I contacted WD and they provided an RMA number (props to them!) Before I sent it back, I put it in another host and ran diskroaster https://github.com/favoritelotus/diskroaster/ on it and it performed w/out error. I put it back in, added it back to the mirror and watched it resilver and scrub without any issue. I concluded I had a bad cable connection and didn’t return it. Weeks later (and while I was out of town) it stopped responding to SATA commands. On my return, I revived the RMA (which had expired by several days) but before I could pull the drive, the other drive in the mirror started developing reallocated/pending sectors at an alarming rate. The situation was:
hbarta@oak:~$ zpool status tank
pool: tank
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0B in 06:24:10 with 0 errors on Fri Mar 13 02:27:30 2026
scan: resilvered (mirror-0) 4.29T in 10:48:18 with 0 errors on Thu Mar 12 20:03:20 2026
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x5000cca278d16d38 FAULTED 71 167 538 too many errors
wwn-0x5000cca291ea5db6 ONLINE 0 0 0
errors: No known data errors
hbarta@oak:~$
The first drive on the list is the one with reallocated sectors and the second one is the one that occasionally goes AWOL.
The situation progressed to:
root@oak:/home/hbarta/Programming/Ansible/Pi# zpool status tank -v
pool: tank
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
scan: resilvered 20.2G in 00:03:44 with 0 errors on Fri Apr 3 13:16:29 2026
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 6 40 0
wwn-0x5000cca278d16d38 FAULTED 65 144 193 too many errors
wwn-0x5000cca291ea5db6 ONLINE 3 44 0
errors: List of errors unavailable: pool I/O is currently suspended
root@oak:/home/hbarta/Programming/Ansible/Pi#
After a couple reboots ZFS is recovering beyond my expectations:
root@oak:~# zpool status tank -v
pool: tank
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 6.07G in 00:38:49 with 0 errors on Mon Apr 6 15:28:41 2026
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x5000cca278d16d38 DEGRADED 5 0 0 too many errors
wwn-0x5000cca291ea5db6 ONLINE 0 0 0
errors: No known data errors
root@oak:~#
At present I have another pool on this host that is a copy of tank. I’ve stopped processes that use the tank (an unexpected advantage of dockerized services) and plan to perform one more backupo of tank to drago_standin, export tank and rename drago_standin to tank and proceed as if everything is normal. Once everything is confirmed working, I’ll probably bring a spare host up that has sufficient drive capacity to copy tank and make another copy.
This is more excitement that I really want on a Monday morning.
Note: The first drive that was playing up is the second one in the status.