Syncoid --no-rollback usage

SirGeorge · January 7, 2025, 6:48pm

I’m replicating tank to backup using syncoid. Sanoid manages snapshots on both pools. I want the backup pool to keep more snapshots compared to the live pool tank.

sanoid.conf looks like:

[tank/ds1]
        use_template = keepShort

[backup/ds1]
        use_template = keepLong

[template_keepShort]
        frequently = 0
        hourly = 24
        daily = 3
        weekly = 3
        monthly = 3
        yearly = 0
        autosnap = yes
        autoprune = yes

[template_keepLong]
        frequently = 0
        hourly = 24
        daily = 14
        weekly = 8
        monthly = 12
        yearly = 1
        autosnap = no
        autoprune = yes

Every day, this runs: syncoid -r --no-sync-snap tank/ds1 backup/ds1

--no-sync-snap is used because I have multiple backup targets. I’m not illustrating them here, for this example.

When the above command runs, syncoid uses the following zfs receive options: -s -F

The -F will “destroy snapshots and file systems that do not exist on the sending side”, per OpenZFS docs.

In my use case, that -F flag removes any snapshots I’ve pruned from tank with it’s shorter sanoid policy.

To meet my goal of a longer snapshot retention in the backup pool, should I be using --no-rollback, like this?

syncoid -r --no-sync-snap --no-rollback tank/ds1 backup/ds1`

Any potential issues to watch out for?

Some interesting references I found prior to posting:

mercenary_sysadmin · January 8, 2025, 12:19am

You’re misunderstanding. Receive -F will destroy any snapshots on the target that are newer than the most recent commonly held snapshot. This is actually a hard requirement for replication; in the absence of -F, that replication attempt would simply fail.

Receive -F does not destroy older snapshots on the target, whether they still exist on the source or not.

SirGeorge · January 8, 2025, 2:07am

Ah, ok!

However, the behavior seems to change when:

The dataset I’m replicating has child datasets
Sanoid takes snapshots using recursive = zfs.
And, the datasets are using native zfs encryption (not sure this should matter?)

Modifying my first example:

[tank/ds1]
+   recursive = zfs
    use_template = keepShort

[backup/ds1]
    use_template = keepShort

Assume a dataset structure like:

tank
  tank/ds1
    tank/ds1/childA
    tank/ds1/childB

When running:

syncoid --sendoptions="Rw" --no-sync-snap tank/ds1 backup/ds1

Any snapshots, newer or older, on backup but not on tank are destroyed.

If I add --no-rollback:

syncoid --sendoptions="Rw" --no-sync-snap --no-rollback tank/ds1 backup/ds1

Those older snapshots are not destroyed on backup.

From OpenZFS:

When a snapshot replication package stream that is generated by using the zfs send -R command is received, any snapshots that do not exist on the sending location are destroyed by using the zfs destroy -d command.

And under the comments of the -F flag:

If receiving an incremental replication stream (for example, one generated by zfs send -R [-i |-I ]), destroy snapshots and file systems that do not exist on the sending side.

To ensure snapshots are not destroyed on backup, I’m only adding --no-rollback to the syncoid command. I notice if I add --debug that the zfs receive command has only -s and not -F.

Is there some other magic happening when I add --no-rollback? Or is the -F flag behaving one way for my first example (single dataset-to-dataset replication) and a different way for my second example (zfs recursive snapshots, child datasets, encryption)?

UPDATE: Figured a test case might help, as well. Here’s what I’m doing:

Using this sanoid.conf (I kept backup out of the config for now, to ensure no pruning from sanoid was in the mix):

[tank/ds1]
        recursive = zfs
        use_template = keepShort

# Create the dataset
zfs create -o encryption=aes-256-gcm -o keylocation=file:///root/key.file -o keyformat=passphrase tank/ds1
zfs create tank/ds1/childA
zfs create tank/ds1/childB

# Create some initial snapshots
systemctl start sanoid.service

Now I have…

tank/ds1
tank/ds1@autosnap_2025-01-08_02:12:06_monthly
tank/ds1@autosnap_2025-01-08_02:12:06_weekly
tank/ds1@autosnap_2025-01-08_02:12:06_daily
tank/ds1@autosnap_2025-01-08_02:12:06_hourly 
tank/ds1/childA
tank/ds1/childA@autosnap_2025-01-08_02:12:06_monthly
tank/ds1/childA@autosnap_2025-01-08_02:12:06_weekly
tank/ds1/childA@autosnap_2025-01-08_02:12:06_daily
tank/ds1/childA@autosnap_2025-01-08_02:12:06_hourly
tank/ds1/childB
tank/ds1/childB@autosnap_2025-01-08_02:12:06_monthly
tank/ds1/childB@autosnap_2025-01-08_02:12:06_weekly
tank/ds1/childB@autosnap_2025-01-08_02:12:06_daily
tank/ds1/childB@autosnap_2025-01-08_02:12:06_hourly

# Sync initially to backup
syncoid --sendoptions="Rw" --no-sync-snap tank/ds1 backup/ds1

# Destroy something on the source dataset to simulate pruning
zfs destroy -r tank/ds1@autosnap_2025-01-08_02:12:06_monthly

# Create a new snapshot on source
zfs snapshot -r tank/ds1@test

Now to compare behaviors.

syncoid --sendoptions="Rw" --no-sync-snap tank/ds1 backup/ds1

Results in:

backup/ds1
backup/ds1@autosnap_2025-01-08_02:12:06_weekly
backup/ds1@autosnap_2025-01-08_02:12:06_daily
backup/ds1@autosnap_2025-01-08_02:12:06_hourly
backup/ds1@test
backup/ds1/childA
backup/ds1/childA@autosnap_2025-01-08_02:12:06_weekly
backup/ds1/childA@autosnap_2025-01-08_02:12:06_daily
backup/ds1/childA@autosnap_2025-01-08_02:12:06_hourly
backup/ds1/childA@test
backup/ds1/childB
backup/ds1/childB@autosnap_2025-01-08_02:12:06_weekly
backup/ds1/childB@autosnap_2025-01-08_02:12:06_daily
backup/ds1/childB@autosnap_2025-01-08_02:12:06_hourly
backup/ds1/childB@test

I gained the @test snapshot, but I lost @autosnap_2025-01-08_02:12:06_monthly the one I deleted on the source (tank).

# Add a source snapshot
zfs snapshot -r tank/ds1@test-again

# Delete a source snapshot
zfs destroy -r tank/ds1@autosnap_2025-01-08_02:12:06_weekly

# And sync again
syncoid --no-rollback --sendoptions="Rw" --no-sync-snap tank/ds1 backup/ds1

Results in:

backup/ds1
backup/ds1@autosnap_2025-01-08_02:12:06_weekly
backup/ds1@autosnap_2025-01-08_02:12:06_daily
backup/ds1@autosnap_2025-01-08_02:12:06_hourly
backup/ds1@test
backup/ds1@test-again
backup/ds1/childA
backup/ds1/childA@autosnap_2025-01-08_02:12:06_weekly
backup/ds1/childA@autosnap_2025-01-08_02:12:06_daily
backup/ds1/childA@autosnap_2025-01-08_02:12:06_hourly
backup/ds1/childA@test
backup/ds1/childA@test-again
backup/ds1/childB
backup/ds1/childB@autosnap_2025-01-08_02:12:06_weekly
backup/ds1/childB@autosnap_2025-01-08_02:12:06_daily
backup/ds1/childB@autosnap_2025-01-08_02:12:06_hourly
backup/ds1/childB@test
backup/ds1/childB@test-again

The new @test-again snapshot appears. The @autosnap_2025-01-08_02:12:06_weekly which was destroyed on the source has not been destroyed on the target.

mercenary_sysadmin · January 8, 2025, 2:40am

You just discovered why syncoid -r walks the tree of datasets manually, rather than using ZFS recursion. Because it gets REAL screwy when you add or remove datasets to a tree you’ve already been snapshotting and replicating using zfs-native recursion.

If you want to stop having weird issues when you add or remove datasets, stop using --sendoptions=R and start just using -r.

I would generally recommend recursive=yes rather than recursive=zfs in sanoid.conf, for the same reasons.

SirGeorge · January 8, 2025, 2:59am

Thanks!

Why does --no-rollback “solve” my issue here? Simply by removing the -F flag from the zfs receive command?

SirGeorge · January 11, 2025, 10:11pm

Just to follow up. I’ve switched to using syncoid -r in place of syncoid --sendoptions="R". All is working.

Digging back in my notes I figured out why I was using --sendoptions="R" in the first place. When setting up the remote dataset initially, I noticed that syncoid -r was not preserving the encryptionroot property on the initial send.

Eventually, I landed on this GitHub issue: syncoid with recursive raw send do not maintain encryptionroot · Issue #614 · jimsalterjrs/sanoid · GitHub

you need to redo the initial replication to preserve the encryption root with the -R option

When I did the initial replication with --sendoptions="R" then the encryptionroot property was properly set on the target dataset. And then I left it in the command forevermore.

I am still using recursive=zfs in my sanoid.conf as I recall - but of course failed to document - that I was getting syncoid errors when letting sanoid do snapshots it’s own way with recursive=yes. I’m not noticing any downside to recursive=zfs… so letting that one be.

mercenary_sysadmin · January 12, 2025, 12:08am

I think that one is fairly safe. The thing you want to watch out for is what happens after you add a dataset as a child of a dataset you’ve already been snapshotting zfs-recursively, then after you’re sure of THAT find out what happens after you’ve DESTROYED a dataset which was part of a tree being snapshotted recursively.

I always have trouble remembering the exact sequence of events required to make things get screwy. But I never have much trouble FINDING that sequence again, if I actually go looking for it, if you follow my drift.

SirGeorge · January 12, 2025, 1:19am

what happens after you add a dataset as a child of a dataset you’ve already been snapshotting zfs-recursively

I have done this one.

Initial replication was done with --sendoptions="R". Didn’t bother testing syncoid -r with it to see if the new child dataset would send properly with all it’s properties.
The snapshot policy of recursive=zfs means sanoid doesn’t “backfill” all of the snapshots to policy. So my sanoid --monitor-health scripts get mad for a while because it thinks I’m missing weekly/monthly/etc snapshots per the policy in sanoid.conf.

Found this, which explains this is by design:

github.com

jimsalterjrs/sanoid/blob/2259625b082af82b33959b7f51965573d4a3d889/sanoid.conf#L35C1-L37C94


      
          	# * zfs - taken a zfs snapshot with the '-r' flag; zfs will recursively take a snapshot of the whole
          	#         dataset tree which is consistent. Newly-added child datasets will not immediately get snapshots,
          	#         and must instead slowly catch up to policy over time. Slightly lower storage load.

A terrible bodge was: zfs snapshot tank/ds1/newChild@autosnap_..._monthly. But then syncoid --sendoptions="R" threw an error because the naming of this snapshot didn’t match the parent / other children. Suppose I could have picked the same name, but the date in the name of the snapshot would have been a lie.

find out what happens after you’ve DESTROYED a dataset which was part of a tree being snapshotted recursively

Hmm. Gotta test this one.

Thanks for the advice!

mercenary_sysadmin · January 12, 2025, 4:49am

Oh good, I DID document it somewhere at some point!