Can you delay status of the live dataset between two hosts by retention policy with syncoid?

Hi there,

I am currently replicating a dataset from one host (storage-vm) to another host (live-vm) with syncoid on an hourly timer. The command that my systemd service uses is /usr/sbin/syncoid --recursive tank/dataset root@live-vm:tank/target and the snapshots it’s transferring come from sanoid, which snapshots the dataset hourly.

The retention policy on storage-vm is to keep 24 hourlies, 7 dailies, 3 monthlies. The retention policy on live-vm is only 24 hourlies. All of this works fine. Only _hourly snapshots make it to live-vm and they are pruned after 24h. So far, so good.

Now, what I want is to NOT transfer the state of the LIVE dataset between the two. What I explicitly want is: When I delete files and folders inside “storage-vm:tank/dataset” with rm that these files and folder should still be present on the live dataset “live-vm:tank/target” until the last snapshot that contained them is pruned, i.e. 24h after I deleted the files/folders on storage-vm.

Is something like this even possibly with ZFS/syncoid? ChatGPT gaslit me by saying that this were the standard behaviour of syncoid and that it “never transfers the state of the live dataset, only snapshots” and that my “files will only stop showing up in the live dataset on live-vm when the last snapshot that contains them is pruned there”. Both, of course, were not true.

After I told it that my files & folders got deleted on live-vm exactly 1h after I deleted them on storage-vm, i.e. after the very next snapshot replication and not 24h after, it advised to use the option “–no-sync-snap” but after reading the description of “–no-sync-snap” I fail to see how this should help.

Is there actually an option with ZFS/syncoid to delay deletion of files & folders between two synced hosts by whatever time period you set in the retention policies?

Thanks in advance!

zfs send (via syncoid or otherwise) only replicates snapshots.

As long as one of your last 24 snapshots have the files that you deleted (on the live dataset), they are still present and accessible (via the snapshots folder) in your backup server.

There is no way to have a specific set of files be available on the backup server independent of the snapshot that contains it.

Essentially on the backup server, don’t think about “live dataset”. Think about (read-only) snapshots and how to get whatever files you want.

If you’re interested in some tool to help you look through various snapshots, check out GitHub - kimono-koans/httm: Interactive, file-level Time Machine-like tool for ZFS/btrfs/nilfs2 (and even Time Machine and Restic backups!)

I need clarity on which is the source and which is the target. It’s not at all clear from what you’ve written, and the answer VERY much depends on getting that right.

Do you want to:

  • Delete files on the replication source, without them being deleted on the replication target

OR:

  • Delete files on the replication target without incoming replication from the source reverting your file deletions?

Oh okay, so the data is still there but it’s not directly in the folder where the target dataset is mounted but rather in target/.zfs/snapshot/autosnap_2025_xx_yy_zz:aa:bb_hourly (with 2025_xx_yy_zz:aa:bb_hourly being the last snapshot where the data was not yet deleted).

Hmm, that’s not the way I need it to be, dang.

The setup is more of an “outgoing” type of situation instead of a backup:

There is the storage-vm which is the main file server that hosts all the data and serves datasets to users via NFS. Sometimes users need a larger amount of their data on external media at a time, which is not feasible to do via NFS on the 1G lines in the offices.

So I planned to have this live-vm, which is connected to an external “terminal-style” setup (mouse, keyboard & monitor with a high-speed USB socket) where users can grab their data to external drives with USB 10Gbps speed. The live-vm is connected with 10G network to the storage-vm, so transfer between the two is much faster than from storage-vm to end user via 1G office-network.

The plan was that users request the data they need, it will be copied (from storage-vm:/home/iser.name) to storage-vm:/outgoing dataset by an admin, replicated via syncoid to live-vm:/outgoing when the next snapshots replication happens (over 10G network) and then users can go to the terminal and copy the data over to external media over 10G USB there.

Ideally, I wanted the admin to be able to delete the data in storage-vm:/outgoing as soon as the snapshot was replicated (so it doesn’t take up extra space on the storage-vm) and then the user has 24h hours to grab their data until the data would also be deleted on the live-vm:/outgoing side (due to 24h pruning). This way, we won’t have to connect a potentially unsafe external HDD directly to the storage but the users still get their data within ~1h instead of having to copy over night on 1G office network.

So, the way it works, I will have to wait until the user has actually copied their data to external HDD and only then delete it in /outgoing on the storage-vm, which then replicates with the next snapshot to live-vm and then 24h later fully releases the space when the last snapshot containing the data is pruned.

Sorry, I was just typing my first reply while you posted this. :slight_smile:

I want to delete files on the source without them being deleted immediately on the target. They should be deleted on the target only 24h later, when the last snapshot containing them is pruned. If this is possible, of course. ChatGPT said: it is. But in my actual setup, the data on the target was deleted on the next snapshot replication after I deleted the data on the source.

Not possible. The live filesystem is itself a special case snapshot and therefore is ALWAYS destroyed by incoming replication. That’s not a syncoid thing, that’s literally how ZFS works.

HOWEVER, what you could do is clone the most recent snapshot, wrap a VM definition around the clone, then boot the VM (although you’re also going to have to figure out how to make your second VM not have the same IP address as the first… You could do this by using DHCP leases and separate virtual MAC addresses for the NIC on the source VM and the NIC on the cloned VM on the target. Although they would still have the same machine ID, and that can occasionally cause really weird intermittent issues…)

The clone won’t be affected by incoming replication, but it will need to be stopped, clone destroyed, and new clone created from the newest snapshot, in order to update the clone.

If your VM has a virtual C: and a virtual D: in different datasets, you might consider creating an independent VM on the target, which you just shut down (or disconnect from D: ) during incoming replication, then after replication finishes, you once again clone the newest dataset–of D: only, remember, not of the C: drive–and either boot that VM back up, or reattach D:, depending on which way you went with that.

If you want the users to have simultaneous access to both newer files AND files deleted prior to those newer files’ creation, replication won’t work period and you’ll need to consider alternate solutions, like using Windows Volume Shadow Copy inside the VM to give your users access to “Previous Versions”. You wouldn’t want to rely on that as your “real” backup method–it is NOT more than maybe three nines reliable–but it’s very easy for normal users to self serve with, and giving them that tool will take an easy couple or three nines off the probability of you having to step in for any given minor single user “oops I accidentally my file” issue.

You could also consider putting something together with rsync, or if you’re feeling REALLY ambitious, it’s possible to tie Samba’s port of VSC directly to your actual snapshots themselves! (This tends to be easier on FreeBSD, IME. It’s been a few years since last I tried, but Debian and Ubuntu usually don’t compile the optional code when they build their Samba packages–plus, Linux ACLs aren’t directly compatible with Windows ACLs, adding yet more wrinkles to that mix.

Hmm, I have to re-think my strategy, then. :sweat_smile:

Maybe, I just cron a script which rm -rf’s the contents of storage-vm:/outgoing every Friday evening or so. This way the admin can just “copy and forget” outgoing data during the week and the space will have been fully reclaimed by the following Monday on both VMs due to the 24h pruning. :thinking: The office is closed over the weekend anyways, so there is no chance that a user would try to get their data on a Saturday or Sunday at the terminal.