Keeping a minimum number of snapshots

technodelic · March 11, 2024, 10:59pm

In my understanding, Sanoid’s intended use-case is to have Sanoid make snapshots and have Syncoid replicate these to a backup target machine. Sanoid also runs on the backup target, but with different policies around number of snapshots to keep (i.e. snapshots kept for a longer time on the target).

In this setup, if something happens to the host machine and no more snapshots are coming in, Sanoid on the backup target machine will keep autopruning the snapshots until the backup disk is empty.

Is there a way to prevent this? To tell Sanoid to keep a minimum number of snapshots? Or do disable autopruning if no snapshots are coming in, or if the master machine has not been heard from?

derkades · March 11, 2024, 11:28pm

Sanoid won’t keep pruning snapshots. If you’ve configured it to keep say 7 dailies, it will keep 7 dailies regardless of their age.

HankB · March 12, 2024, 12:25am

I’ve never thought about it that way. sanoid makes snapshots to meet local needs, like rollback, restoring deleted files and so on. syncoid will make snapshots as needed to mirror datasets to other destinations. I’m not even sure if syncoid will use the snapshots that sanoid creates by default though I’d be surprised if there wasn’t an option to do that.

And yes, sanoid prunes snapshots by count, not by age. If you’re concerned about how it works, I’d suggest implementing it (in parallel to whatever other backup strategy you employ) and see how it behaves.

mercenary_sysadmin · March 12, 2024, 3:06pm

You’ve misunderstood.

Let’s say you’ve got a dataset which is configured in sanoid.conf with daily=30. When you run sanoid --cron and it begins looking for stale snapshots to prune, each daily snapshot must meet both of the two necessary criteria before pruning:

the snapshot to be pruned is older than $daily days old
its parent dataset has at least $daily+1 daily snapshots present

So in the case of a backup server that ceases getting incoming replication, it will also stop pruning old snapshots. Even if sanoid --cron runs every minute of every day for ten years on a machine no longer receiving or taking new snapshots, it will never prune one after the number of dailies drops to 30 (which happens the first time sanoid runs after a successful syncoid run that put a 31st daily on the system).

A side note, since you seem new, and this is an extremely common error: make sure backup targets do not use the production template. Target datasets should use either the backup template (if receving incoming replication daily) or the hotspare template (if receiving incoming replication hourly or more frequently).

The difference is that the production template takes new snapshots locally, whereas the backup and hotspare templates do not–you only get new snapshots on datasets with the backup or hotspare template applied when the’re replicated in from a source.

It’s important not to get that wrong, because if both source and target take an hourly snapshot at the exact same time, syncoid will get confused and try to use the “common” snapshot–which isn’t actually common, it just has the exact same name on both sides–as a basis for replication, which will then fail.

technodelic · March 12, 2024, 5:44pm

Thank you all for your clarifications that Sanoid prunes by count and not by age. This clears up my misunderstanding.

The Sanoid docs do say “How many hourly/daily/monthly/yearly backups should be kept” which, interpreted literally, means exactly what you’ve all said.

I think my confusion came partly from the wording of the autoprune setting: “Should old snapshots be pruned” (emphasis mine). This made me wonder exactly how/when snapshots would become “old” and caused me to think in temporal terms. Further down the page, the examples describe the “production template … will take more frequent snapshots, but not hang on to them as long. My archive template takes fewer snapshots, but will hang onto them longer.” This also made me think in terms of dates and age.

I now understand that snapshots become “old” when the number of them exceeds the number in the hourly/daily/monthly/yearly setting. The template settings don’t cause Sanoid to hang on to snapshots “longer” but rather to keep more of them.

Thank you Jim for reminding me to use the backup or hotspace templates. I was already doing that but especially with backups it’s good to have reminders! Measure twice, cut once…

mercenary_sysadmin · March 12, 2024, 5:45pm

I’m not even sure if syncoid will use the snapshots that sanoid creates by default

syncoid replicates all snapshots from source to target.

By default, it also creates a fresh snapshot immediately prior to replication. This serves as a final snapshot so you don’t miss any of the newest data this time around, and the common snapshot to base incremental replication on the next time around.

If you don’t want the replication snapshots, you can use the argument --no-sync-snap and syncoid will only use existing snapshots for replication, rather than creating its own in addition. But either way, you get the whole snapshot stream: syncoid snapshots, sanoid snapshots, manually created snapshots, all the snapshots.

HankB · March 12, 2024, 10:37pm

Thanks for that clarification.

Am I correct that if I am using syncoid to backup to two different file servers such as:

A -> B
A -> C

I should use --no-sync-snap for one (or both) of these to manage snapshots on the destination? I’m thinking that A->B syncoid snapshots would get copied to C but that syncoid would not manage the A->B snapshots on C?

(My Pi 4B server had 27K snapshots and it was clear that I needed to address this. I’ve got that down to 21K )

mercenary_sysadmin · March 13, 2024, 5:44pm

If you’re doing replication to multiple targets from a single source, then yeah, you probably want --no-sync-snap because otherwise you end up with sync snaps from foreign hosts, and no way to destroy them.

The alternative is to just create cron jobs or systemd tasks to destroy the foreign snapshots. If you’re doing pull replication from source A to targets B and C like this:

C<--A-->B

Then you’ll have sync snaps in two forms: @syncoid_B_datestamp and @syncoid_C_datestamp. They’ll wind up on all three hosts, as replication from C<–A picks up snapshots left on A by the B<–A replication, and vice versa.

This isn’t a problem on source host A, because target hosts B and C and each destroy their stale sync snapshots on host A. But it’s a problem for B and C, because neither has a connection to the other, so neither will prune copies of its sync snaps on the other.

However, you can do something like this (this is a quick one-off, NOT a suggested full production script):

for snap in `zfs list -t snap | grep syncoid_C | awk '{print $1}' | xargs -I% zfs destroy %`

When run on target host B, the above command would find and destroy all of B’s sync snaps, which is safe enough since C does not have a direct replication relationship with B. If you want to get fancier, you could cause your cleanup scripts on B and C to only look for foreign sync snaps older than [ period of time you’d like to keep them ].

Please remember, again, this is a simplified example NOT a complete recommended script. It MIGHT be sufficient as-is on SOME people’s systems, but there are holes in it such as “what happens if you’ve got a dataset with spaces in the name”, which that very simple example command would absolutely cough up a hairball and die on.

SirGeorge · March 17, 2024, 1:10am

If I use the same Sanoid templates for both production and backup, but I override my backup dataset to autosnap=no - do I still have the same concern?

My use case is replication of snapshots from production (which snaps hourly, daily, weekly, etc) to backup (which never snaps anything at all, just receives incoming snapshots from production and keeps the same snapshots that production has).

An example config for me would be:

[production/dataset1]
  use_template = my_special_production_template

[backup/dataset1]
  use_template = my_special_production_template
  autosnap = no

[template_my_special_production_template]
        frequently = 0
        hourly = 6
        daily = 14
        weekly = 8
        monthly = 12
        yearly = 1
        autosnap = yes
        autoprune = yes

mercenary_sysadmin · March 18, 2024, 12:18am

autosnap=no is the only really crucial difference between the production and backup/hotspare templates. Apart from that, the other differences are in how long it takes before they start WARNing or CRITing on snapshot freshness, because the backup template only expects to receive new snapshots once daily, and the hotspare template expects to get them already an hour old.

HankB · March 18, 2024, 2:29am

Thanks for the clarification WRT “foreign” syncoid snapshots. I wanted to be sure I wasn’t going to interfere with syncoid. I was cleaning some of them up with something similar to what you suggested except that I wrote the list to /tmp/smapshot-list first and then reviewed it before passing it on. Incidentally I used zfs list -t snap -H -o name to eliminate the awk step and grep -v "syncoid_$(hostname) to ID the foreign snapshots.

best,