Recovery of very unhealthy ZFS pool: recovery story

Ghwomb · May 26, 2025, 9:30am

By my own fault I have a very unhealthy seven year old ZFS-pool. Status as of now is this:

root@truenas[~]# zpool status -x                                                                                                                                                                                                                [34/307]
  pool: frankenpool                                                                                                                                                                                                                                     
 state: DEGRADED                                                                                                            
status: One or more devices has experienced an error resulting in data                                                                                                                                                                                  
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.                                                                                                                                                                                                                        
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A 
  scan: resilvered 183G in 1 days 05:27:19 with 9790 errors on Sun May 25 05:20:36 2025                                                                                                                                                                 
config:                                                                                                                     

        NAME                                            STATE     READ WRITE CKSUM
        frankenpool                                     DEGRADED     0     0     0
          raidz3-0                                      DEGRADED     0     0     0
            gptid/db0de007-cba8-11e5-84fb-d43d7ebc5ce0  DEGRADED     0     0 53.0K  too many errors
            gptid/5d3a0859-adc2-11e5-8401-d43d7ebc5ce0  DEGRADED     0     0 53.2K  too many errors
            gptid/269386ab-7d8f-11e6-9331-d43d7ebc5ce0  DEGRADED     0     0 54.6K  too many errors
            gptid/cdde06fd-f2d6-11e6-b1be-d43d7ebc5ce0  DEGRADED     0     0 27.1K  too many errors
            14042028026288025165                        UNAVAIL      0     0     0  was /dev/gptid/636876a9-adc2-11e5-8401-d43d7ebc5ce0
            gptid/b8e22b15-881e-11e6-9163-d43d7ebc5ce0  DEGRADED    45     0  104K  too many errors
            gptid/08054759-f2d6-11e6-b1be-d43d7ebc5ce0  ONLINE       0     0   290
            13049790948581386619                        UNAVAIL      0     0     0  was /dev/gptid/9135367f-7dca-11e6-9b67-d43d7ebc5ce0
            gptid/6c0bbc77-adc2-11e5-8401-d43d7ebc5ce0  DEGRADED     0     0 53.0K  too many errors

errors: 9803 data errors, use '-v' for a list

Of course no backup exist so I’ll try to recover as much as possible. How you ask? Stay tuned!
So if this is of any interest I’ll post background and progress. Otherwise I’ll just delete this so the forum does not get polluted.

irate.overlord · May 28, 2025, 12:05am

Good luck! I have no idea how you are going to do that, but I sincerely wish you the best.

Ghwomb · May 28, 2025, 6:46pm

Thanks a lot! Can’t say I have a grand plan, more like hunches and general directtion.

Ghwomb · May 28, 2025, 6:58pm

So this is a pool with nine drives in a RaidZ3 created about eight years ago using FreeNAS 9, TechSNAP, way too much confidence and not enough money. As I remember it was in semi-production. Of course only hard drives nearing end of life was used,. And in order to make the build sound less, all hard drive was set to spin down after a 5-15 minutes of inactivity. Thus further exacerbating the situation by shortening the life span of the drives.

It was kind of dying back then but life got in my way of recovering it. But now I have a bit of free time so time to try to resuscitate!

Ghwomb · May 28, 2025, 7:12pm

So first I try to find the correct hard drives. Got no idea why i did not mark them in any way. But after a few hours of trying different drives it was good enough to start reslivering. As seen above it took a while. When it came to lost files it was not too bad:

A few FreeNAS auto-backups of the operating system
About 45 .JPEG images.
One music album
Two movies

Only the images are of interest and the errors looks like this:

frankenpool/nafs@nafs-2025-05-22_18-00:<0xd027>
frankenpool/nafs@nafs-2025-05-22_18-00:<0xd029>
frankenpool/nafs@nafs-2025-05-22_18-00:/2016/09/17/IMG_1574_1.JPG
frankenpool/nafs@nafs-2025-05-22_18-00:/2016/09/17/IMG_1575_1.JPG

No idea if it is possible to restore the image by pulling it from an old snapshot (which may or may not exist, but probably not). If anybody know please let me know.

Ghwomb · May 28, 2025, 7:17pm

There is only two datasets of interest nafs (which was used as a mount point for NFS) and stoneage (no idea). So I created two new snapshots in order to send them to another machine with fully functioning hard drives

root@truenas[~]# zfs list -t snapshot
NAME                                                  USED  AVAIL  REFER  MOUNTPOINT
frankenpool/nafs@nafs-2025-05-22_18-00                798M      -   914G  -
frankenpool/nafs@nafs-2025-05-25_11-52                  0B      -   914G  -
frankenpool/stoneage@stoneage-2025-05-26_10-16          0B      -   158G  -

Ghwomb · May 28, 2025, 7:23pm

Sending the stoneage dataset with zfs send frankenpool/stoneage@stoneage-2025-05-26_10-16 | pv -Wbraftp | ssh root@192.168.40.1 zfs receive -Fuv routepool/stoneage went ok, but a tad slow. I assume it is because health of the pool. Probably should have done it without the p flag on pv since progress bar did not seem to work. Also probably should have use -s on ZFS receive so it can be resumed if interrupted.

131GiB 6:44:58 [0.00 B/s] [6.14MiB/s]
137GiB 7:03:14 [0.00 B/s] [2.93MiB/s]
183GiB 10:01:13 [5.20MiB/s] [5.20MiB/s]
received 183G stream in 36071.87 seconds (5.20M/sec)

Ghwomb · May 28, 2025, 7:27pm

Sending the nafs dataset did not go as smooth:

root@truenas[~]# zfs send frankenpool/nafs@nafs-2025-05-25_11-52 | pv -Wbraft | ssh root@192.168.40.1 zfs receive -Fuv routepool/nafs                                                                                                                  
(root@192.168.40.1) Password:
(root@192.168.40.1) Password:
receiving full stream of frankenpool/nafs@nafs-2025-05-25_11-52 into routepool/nafs@nafs-2025-05-25_11-52
warning: cannot send 'frankenpool/nafs@nafs-2025-05-25_11-52': insufficient replicas
 408GiB 1:02:17:58 [4.42MiB/s] [4.42MiB/s]
cannot receive new filesystem stream: incomplete stream
root@truenas[~]# 
root@truenas[~]#

So I’m back to do and old school copy:
root@truenas[/mnt/frankenpool]# scp -r nafs root@192.168.40.1:/routepool/nafs

And that is what is progressing nicely and very slow.

mercenary_sysadmin · May 28, 2025, 9:29pm

You can’t get them back out of a snapshot unless they were different versions in that old snapshot. If the file has not changed since you took the snapshot, then the snapshot and the current copy actually share the same physical blocks (and the same physical sectors on disk).

Sounds like you had a pretty good outcome. The thing I would have recommended if I’d gotten in here before you got started is NOT physically removing ANY of the degraded drives until you’ve resilvered at least one new drive in.

That’s because (as you likely are aware, since you had 50K+ CKSUMs on multiple drives and yet still only lost a few files) the CKSUMs on one drive aren’t necessarily going to be the same CKSUMs on a different drive. So by not removing the DEGRADED drives until you’ve resilvered a new one in, you’re maximizing the chances that you’ll have enough parity available in any CKSUM’d block to be able to recover it.

By contrast, if you pull a drive immediately and add a new one, any CKSUM’d blocks for which that drive’s non corrupt sector(s) could have provided the minimum parity necessary to reconstruct it become permanently corrupt and lost.

I’m glad you had a reasonably good outcome from your horror story! This is my favorite war story right here: