How does a subsequent raw send to an unmounted encrypted dataset know what to send?

Pyrroc · May 20, 2025, 9:23pm

Here’s my current hierarchy for this question:

Windows File Server
   NFS mounted local rsync
Unencrypted zfs dataset (gets snapped after daily rsync)
   syncoid no-sync-snap
Encrypted zfs dataset (key loaded and mounted locally)
   syncoid no-sync-snap sendoptions="Rw" recvoptions="u"
Encrypted zfs dataset (no key loaded, unmounted, offsite)

I’m using --nosync-snap because the daily snaps from the original dataset are sufficient and I know that the first replication will nuke any existing sync snaps on the middle (encrypted, keyed, mounted, local) dataset.

When I do the final syncoid send to the offsite untrusted server, how does syncoid know what pieces are already there and to only send what’s new?

Do I need to flatten this out and do my rsync to an encrypted dataset, and sync from that directly to both the second local copy (on another server) and the remote untrusted server using sync-snaps?

I realize the extra local copy isn’t absolutely necessary, but storage is cheap and the data isn’t.

Thanks,
-Pyrroc

mercenary_sysadmin · May 20, 2025, 9:25pm

It compares the list of snapshots, bases the incremental send on the most recent common snapshot, and Bob’s your uncle.

It doesn’t need to be able to decrypt the blocks in that snapshot, on either end, in order to perform an incremental raw send. It just has to know what the most recent common snapshot between the two is.

edited to add: remember, zfs send doesn’t know anything about the remote end, ever. It never needs to “know what to send” because you are the one telling it what to send. In syncoid’s case, syncoid just does the donkey work for you: first it SSHes into the remote server to pull a list of snaps, compares it to the local list of snaps, then it knows one of the following:

target does not yet exist–so create it with a full replication based on the most recent snapshot on source (or the freshly created sync snapshot, if --no-sync-snap is not in play)
target exists, and there is a common snapshot–so build a zfs send command using the most recent common as a base, and either the most recent on the source or a freshly created sync snapshot (again, depending on the use of --no-sync-snap) as the incremental to patch to
target exists, but has no common snapshots, in which case syncoid informs you it “cowardly refuses” to overwrite the target.

Pyrroc · May 20, 2025, 9:31pm

So even though the dataset is encrypted, the snapshots are visible? (not an issue at all from my standpoint)

ETA: remote dataset is encrypted, unkeyed, and unmounted

ETA2: Oh, yup, just saw that in the Ars primer that you wrote a few years ago. I had missed that.

mercenary_sysadmin · May 20, 2025, 10:14pm

Correct. The contents of individual blocks are encrypted, but the following are not:

the name of the pool
the names of the datasets and/or zvols
the sizes of the datasets and/or zvols
the names and number and sizes of any snapshots

However, the names and sizes of files and directories within datasets (or zvols) are encrypted, so it’s safe to have poolname/datasetname/reallyembarrassingfilename.mp4, but not safe to have poolname/reallyembarrassingdatasetname/reallyembarrassingfilename.mp4.

HankB · May 21, 2025, 12:09am

It’s my understanding that ZFS send/receive backs up datasets as opposed to files. That files and directories are not visible in the encrypted datasets makes sense in this context. (Jim can correct me if I’m wrong.)

More conventional backup tools such as rsync backup files and thus cannot operate on encrypted filesystems w/out the key.

I also like that syncoid can compare snapshots and “know” what needs to be sent rather than examining each file to see if it has changed.

These are some of the reasons I’m such a fan of ZFS and tools like sanoid/syncoid just make using it that much easier.

mercenary_sysadmin · May 21, 2025, 12:30am

Essentially, yes. Although it might be a bit more accurate to say replication works at the block level, not the file level, and that datasets and snapshots are the only groupings of blocks that are relevant to replication, or that it is or can be aware of.

File and folder level metadata blocks are fully encrypted and do not need to be decrypted in order for replication to work–altnough those metadata blocks would need to be decrypted in order to mount the encrypted filesystem.