Verify integrity of snapshots

I need to temporarily move my dataset to another device. My SSD is using emulated 512 sectors instead of native 4k, all data gets wiped to make the change.

I have several snapshots of my root dataset, including one today. A scrub on the current pool shows 0 errors, everything under smartctl looks good too.

After send/recv to another device, I was immediately hit with several I/O errors when attempting to boot from it. zpool status shows several errors as hex values <0x1b121>, <0x344a>, etc. After a scrub on the new device, the errors disappear. Another attempt at booting from that device again shows I/O errors.

I setup a new pool on a 3rd device, send/recv the dataset, and attempt to boot. Again, hit with several I/O errors. A scrub on this device shows only checksum errors. The devices themselves appear to be fine.

I’m wondering if one of my snapshots might be bad? The pool isn’t encrypted, but the dataset is.

My send/recv command looks like this:
zfs send -RLwv pool/dataset@today | zfs recv newpool/dataset

I notice that you didn’t say anything about encryption… but I also notice the argument for raw send in your ZFS send command. Encrypted source? If so, it sounds like you’re hitting a known bug in raw sends from an encrypted source.

I might have something going on here other than (or on top of) the bug with raw S/R. Or I’m doing something wrong.

  • Pools are created on the devices they reside on, not S/R from another device.
  • Encryption keys are passphrases.

My goal was only to S/R my encrypted datasets to a new external pool, boot from external, change internal NVMe storage to native 4k (wipes data), then S/R datasets back to internal NVMe.

First attempt was to raw S/R to external SSD. Boot the root dataset from USB, modify NVMe block size, setup new pool and raw S/R back. The pool imports fine, encryption key loads fine. As soon as the root dataset tried to mount, I/O errors.

Second attempt was to S/R to new dataset (not raw), with the recv side setting up the encryption. Again, pool imports fine, key loads fine, I/O errors upon attempting to mount.

Third attempt same as second, except no encryption on the recv side. I/O errors.

Fourth attempt, move internal NVMe to external enclosure, boot from that. Pool fine, key fine, I/O errors upon attempting to mount. Internal storage is not HW encrypted.

Errors usually show as something like <0x0>:<0x56656≥. Perform scrub, errors disappear upon completion. No SMART errors.

Fifth attempt, NVMe reinstalled, boot Ubuntu ISO, import pool, raw S/R root dataset to external pool. Install new NVMe, boot ISO, raw S/R from external to new NVMe. Success!

Unfortunately, that method isn’t working for my large (600GB) dataset. It mounts fine from the old NVMe in an external enclosure, but S/R to internal NVMe, it errors when attempting to mount. That dataset is also on my storage server. Raw S/R from that also fails to mount. I might have to rsync it to a new dataset.

After some digging through bug reports, the bug I think I was encountering was fixed in zfs 2.1.7. Since 2.1.13 is already available in Debian, I went straight from 2.1.2 to 2.1.13.

I’ve tested sending several datasets to other pools and unlocking/mounting on other systems with no issues so far.

The root dataset issue was also a blunder on my part on top of the bug. I use a program called usbguard to prevent random USB devices being used on the system. I forgot to disable it before trying to boot my root dataset from USB. As soon as I unlocked the dataset, usbguard kicked in during the boot process and booted the device from the system.