I get that it can’t sync an entire dataset with subsets to a location that already has parts synced. I need to tell it to sync only new stuff.
If I understand what you’re asking for correctly here–that you want new things from machineA:poolA/dataset to show up on machineB:poolB/dataset without undoing any local changes made on machineB:poolB/dataset–it can’t be done with ZFS replication (which is what syncoid orchestrates).
ZFS replication is snapshot-based. So, let’s start out with your first full replication from mA:poolA/dataset to mB:poolB/dataset.
root@B:~# syncoid -r mA:poolA/dataset poolB/dataset
The first thing that happens here is syncoid takes a snapshot of poolA/dataset. The snapshot it takes will be named with the hostname and the current date, but for simplicity, let’s just call it “@1.”
Next, syncoid replicates the @1 snapshot from mA to mB, with a command that looks like this:
root@mB:~# ssh root@mA “zfs send -r poolA/dataset@1” | zfs receive poolB/dataset
After this finishes, you have a copy of mA:poolA/dataset@1 in place and mounted on mB, as poolB/dataset.
Now, you do more stuff on both machines locally. Then, you run syncoid again. This time, it’s an incremental replication, because you have @1 on both machines. First, syncoid takes a new snapshot, which again for simplicity’s sake we’ll just call @2.
Then, it replicates. Again, the actual command it uses looks something like this:
root@mB:~# ssh root@mA “zfs send -rI poolA/dataset@1 poolA/dataset@2” | zfs receive poolB/dataset
Here’s the kicker, though: in order for mB:poolB/dataset to receive this incremental snapshot, it must wipe out any local changes made–including snapshots taken locally. @1 proceeds directly to @2, and that’s that.
So you lose anything you did locally on machineB:poolB/dataset when you replicate in from machineA:poolA/dataset. There is no way around this, it’s a necessary part of how replication works.
Does this answer your question? Or am I misinterpreting it?