ZFS got unhappy and reported data corruption, but doesn’t give me any indication of what files are corrupted or if a drive causes problems.
I’ve started a scrub and it’s stuck. Syncoid gets stuck at a single particular dataset, but files seem intact at first glance. There are no drive errors and the list of corrupted files is empty. I successfully rsync-ed all the files from the problematic dataset to another pool.
What’s the proper way to recover from this situation? Should I try nuking just the dataset, or the entire pool?
The pool:
root@menator:~# zpool status -xv ssd
pool: ssd
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub in progress since Sat Nov 30 00:57:13 2024
342G / 362G scanned at 223M/s, 0B / 362G issued
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
ssd ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SSDPR-CX400-01T-G2_G11122972 ONLINE 0 0 0
ata-SSDPR-CX400-01T-G2_G32090121 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-SSDPR-CX400-02T-G2_G3G067163 ONLINE 0 0 0
ata-SSDPR-CX400-02T-G2_G3G067170 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
root@menator:~#
The system in question:
root@menator:~# zfs --version
zfs-2.2.6-1
zfs-kmod-2.2.4-1
root@menator:~# uname -a
Linux menator 6.8.5-301.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024 x86_64 GNU/Linux
My remote backup box reported this an hour ago. This is the dataset that can’t be sent:
Nov 30 00:08:19 tao bash[679897]: Sending incremental ssd/podman/nginx-proxy-manager@autosnap_2024-11-29_20:00:02_hourly ... autosnap_2024-11-30_00:00:07_hourly (~ 3.1 MB):
Nov 30 00:08:19 tao bash[679902]: warning: cannot send 'ssd/podman/nginx-proxy-manager@autosnap_2024-11-29_21:00:03_hourly': Zły argument
Nov 30 00:08:19 tao bash[681163]: cannot receive incremental stream: most recent snapshot of tao/remote/menator-ssd/podman/nginx-proxy-manager does not
Nov 30 00:08:19 tao bash[681163]: match incremental source
Nov 30 00:08:19 tao bash[681167]: lzop: Broken pipe: <stdout>
Nov 30 00:08:19 tao bash[681166]: mbuffer: error: outputThread: error writing to <stdout> at offset 0x44000: Broken pipe
Nov 30 00:08:19 tao bash[681166]: mbuffer: warning: error during output to <stdout>: Broken pipe
Nov 30 00:08:19 tao bash[679897]: CRITICAL ERROR: ssh -S /tmp/syncoid-backup@menator.lan-1732921602-9985 backup@menator.lan ' zfs send -w -I '"'"'ssd/podman/nginx-proxy-manager'"'"'@'"'"'autosnap_2024-11-29_20:00:02_hourly'"'"' '"'"'ssd/podman/nginx-proxy-manager'"'"'@'"'"'autosnap_2024-11-30_00:00:07_hourly'"'"' | lzop | mbuffer -q -s 128k -m 16M' | mbuffer -q -s 128k -m 16M | lzop -dfc | pv -p -t -e -r -b -s 3203160 | zfs receive -s -F 'tao/remote/menator-ssd/podman/nginx-proxy-manager' 2>&1 failed: 256 at /usr/sbin/syncoid line 889.