Checking datasets on new zpool match old zpool

aaron · February 6, 2025, 8:49pm

Hello,

I am moving all my datasets to a new zpool. I used syncoid to sync the data (a --recursive of the root dataset) and I’m pretty sure it’s fine, but as I quite like my data I would like to check that the new disks have the same data as my old disks before I move over and format the old ones.

I have been doing a few things, like diffing a snapshot list with “name,refer” columns (after seding to make the dataset names match) to check no snapshots are missing.

I was about to knock together a quick script to cycle through different snapshots on each dataset doing an rsync -rvnc --delete source target or diff -qr source target to check the files are as I expected when I thought I should check there isn’t a more obvious approach (aside from running a scrub and trusting ZFS)?

Thanks in advance!

allan · February 6, 2025, 9:05pm

If you confirm that the snapshot exists on both sides, and has the same GUID, you can be sure that the whole snapshot is there, and everything is bit-for-bit identical.

zfs get guid pool/dataset@snapshot
on both sides, the values should be identical.

When you are using syncoid to copy them, the snapshot will NOT exist until it is completely copied (it is hidden and named %recv temporarily until it is either whole or cancelled)

mercenary_sysadmin · February 6, 2025, 10:28pm

Following up on Allan’s answer with an example: this is the production and hotspare side of a VM hosting setup of mine.

root@elided-hs0:/# zfs get guid -rt snap data/images/fileserver/D | awk '{print $3}' | tail -n5
6290248623997082506
13472248163790702356
994323836312740596
15223236497183565418
16510527134723411610

root@elided-hs0:/# ssh elided-prod0 zfs get guid -rt snap zroot/images/fileserver/D | awk '{print $3}' | tail -n5
6290248623997082506
13472248163790702356
994323836312740596
15223236497183565418
16510527134723411610

Also adding onto Allan’s answer, for clarity for the newbies:

This is true, but it’s not syncoid-specific; this is how zfs receive works regardless of how it’s invoked. Sorry about the pedantry, but I’ve discovered that a LOT of folks think syncoid is doing a hell of a lot more than it actually does–syncoid is just an orchestrator for OpenZFS’ built-in replication, and literally executes a command in the shell for you.

If you run syncoid --debug [source] [target], you’ll even get to see the actual command line it builds for you–and you can copy and paste it yourself, and it will work just fine. This is often handy when you’re trying to troubleshoot where something goes wrong; syncoid does its level best to both pass through raw errors and add more human-friendly explanations where it can, but sometimes you just need to see the original, umodified console output in full, and know for a fact that nothing is changing or suppressing any of it.

aaron · February 7, 2025, 1:02pm

That’s great, thanks @allan and @mercenary_sysadmin! I appreciate the quick response.

Is that GUID something that is derived from the actual data contained in the snapshot or is it “just” metadata? What I am getting at is that can I be confident that if that GUID is correct then it actually means the data has not changed in some way that it should not have. I’m thinking of reported bugs like this and this, where data corruption was not picked up with a scrub.

Either way I have already added the “guid” column into my snapshot list diff and that should be a much quicker way to find any likely discrepancies. Thanks again!

mercenary_sysadmin · February 7, 2025, 4:51pm

GUID is an acronym for Globally Unique ID, which means it can’t be derived from content–content isn’t globally unique, so an identifier derived from it can’t be either.

The entire purpose of any GUID–ZFS or otherwise–is to uniquely identify a thing by assigning it a randomly generated string of data, and for that randomly generated string of data to exist in such a huge problem space that the odds of the same GUID being organically generated twice, even across every such thing in the world, are astronomically low.

One example of GUIDs is the MAC address on every network card. That’s the thing that looks like 62:5E:1A:12:34:56 when you do a full ip a or ifconfig. These are set in hardware by the vendor at the time of manufacturing, and if two network cards with the same MAC ever show up on the same local network, they cannot both function at the same time. So those HAVE to be unique!

In the case of MAC addresses, not all of those octets are actually pseudo-random. The first few identify the vendor–for example, 00:14:A4 is one of several vendor prefixes used by Hon Hai Precision Tools, better aka Foxconn.

I don’t know if there are any built in identifiers in ZFS GUIDs that work similarly, or if it’s purely a unique identifier and not a class identifier to some degree as well. But what you do know is that you won’t have the same one twice, so if you see two snapshots that share a GUID, they’re the same snapshot, and if you see two snapshots that DON’T have the same GUID, they are not the same snapshot.

Make sense?