Recovering files in case of errors

mercenary_sysadmin · August 21, 2023, 9:45pm

root@box:~# dd if=corruptfile conv=sync,noerror bs=4K of=repairedfile

This is the way you pull data from a file with unrecoverable CKSUM errors in it. ZFS treats an irreparable CKSUM the same way the kernel treats hardware I/O errors, so you can use the same technique you’d use to recover data from a drive with failed sectors.

Now, to be clear, you aren’t going to get the actual contents of any bad blocks in that file. You’re going to get zero-blocks for those, because what’s happening is dd is substituting a block of zeroes for a block that returns an I/O error and no data.

Another way that you can recover corrupt data is with replication and the zfs_send_corrupt_data=1 module option:

root@box:~# echo 1 > /sys/modules/zfs/parameters/zfs_send_corrupt_data
root@box:~# zfs snapshot pool/corrupt@1
root@box:~# zfs send pool/corrupt@1 | zfs receive pool/repaired
root@box:~# echo 0 > /sys/modules/zfs/parameters/zfs_send_corrupt_data

You’ve now got a “repaired” version of your corrupt dataset at pool/repaired. The data isn’t any less corrupted than it was before, of course… but much like our first fix, the corrupt blocks are now replaced with blocks ZFS sees as “correct.” In the case of dd, that meant all-zero blocks; in the case of zfs_send_corrupt_data=1, those blocks are filled with 0x2f5baddb10c. In both cases, the entire block is missing, with a placeholder left where it used to be.