How do I prune syncoid snapshots? Sanoid doesn't prune them

raidz99 · April 18, 2024, 10:41pm

Hi,

I’m using syncoid for replication (from host pve1 to pve2 and from pve1 to bak1).
To prune old snapshots, I set up sanoid, but I made some mistakes I fail to see…

On pve2, I have many unexpected syncoid snaps for bak1, and vice versa.

How do I prune syncoid snapshots?

Detailed description of my setup.

Via cron I run on pve1:

syncoid --quiet --identifier=pve2 --no-privilege-elevation --recursive rpool/homes syncoid@pve2:dpool/pve1/homes

syncoid --quiet --identifier=bak1 --no-privilege-elevation --recursive rpool/homes syncoid@bak1:dpool/pve1/homes

Currently, I’m having thousands of snapshots on each replication destination for the other destination, i.e. on pve2 I have many for bak1 and on bak1 many for pve2.

Example:

On pve2:
1 x syncoid_pve1-pve2_2024-04-17:20:05:14-GMT00:00
10,000 x syncoid_pve1-bak1_2024-04-17:20:05:16-GMT00:00

on bak1:
1 x syncoid_pve1-bak1_2024-04-17:20:05:16-GMT00:00
10,000 x syncoid_pve1-pve2_2024-04-17:20:05:14-GMT00:00

(of course, date and time differs for each)

On target pve2, when I run sanoid --prune-snapshots --verbose --debug I get only 63 total, such as syncoid snapshots (to bak1) are not considered:

Filesystem rpool/pve1/homes has:
     63 total snapshots (newest: 0.7 hours old)
          36 hourly
              desired: 36
              newest: 0.4 hours old, named autosnap_2024-04-18_20:00:12_hourly
          25 daily
              desired: 30
              newest: 21.3 hours old, named autosnap_2024-04-18_00:00:14_daily
          2 monthly
              desired: 3
              newest: 429.3 hours old, named autosnap_2024-04-01_00:00:13_monthly

I took a look to the source code and saw my ($snaptype) = ($snapname =~ m/.*_(\w*ly)/); which could possibly mean only to take snapshots with ending with “ly”, and if ($snapname =~ /^autosnap/) seems to limit working on “autosnap” snapshots only, but so many sources tell sanoid is to be used to manage syncoidsnapshots that I think I must miss something?!

How do I prune syncoid snapshots?

Any hint appreciated!

my sanoid.conf:

[rpool/pve1/homes]
  use_template = prune_prod
[template_prune_prod]
        hourly = 36
        daily = 30
        monthly = 3
        yearly = 0
        daily_hour = 23
        daily_min = 59
        autosnap = no
        autoprune = yes

HankB · April 19, 2024, 12:31am

Does the discussion in this thread and the previous thread that I linked help answer your question?

raidz99 · April 19, 2024, 3:34am

Thanks for your help!

I already read the other text, but I thought it is a diffirent situation (A->C->D) and I don’t use --no-sync-snap, but mostly default values. If I understood correctly, --no-sync-snap causes the issue of accumulated snaps on D? I had assumed it would prevent it (given that sanoid runs correctly).

In the thread you linked in the end a sanoid configuration has been posted. Also other sources agree that (on each syncoid receiver) sanoid must be used, it is said to be task of sanoid (and not syncoid) to prune. And so I did - finally, I think my sanoid configuration should be quite correct, as autosnaps are in fact pruned…

So I’m afraid unfortunately this does not fully answer my question, or I fail to get the answer.

HankB · April 19, 2024, 1:55pm

Here’s the situation as I understand it. Hopefully someone will correct me if I’ve got it wrong.

sanoid will prune snapshots created by sanoid only and will do so on the host on which they are created or on a host that receives a dataset with sanoid snapshots.
syncoid will manage snapshots it creates on the host it creates them on as well as the destination host (when the --no-sync-snap is not used.)
syncoid will transfer all snapshots from the initial snapshot to the final snapshot when it transfers a dataset.

I stopped using --no-sync-snap because if syncoid was not run for a while, sanoid could prune the previous ‘to’ snapshot on the source and then the entire dataset would need to be sent.

A consequence of 2) and 3) is that when I have an A → B → C backup configuration, syncoid snapshots resulting from the A → B transfer were copied to C and not managed my syncoid or sanoid.

Following is the shell script I’m using to manage this. Use at your own risk, of course. It only manages syncoid snapshots. sanoid and ad hoc snapshots are left unmolested.

hbarta@olive:~/Programming/shell_scripts/syncoid$ cat purge-foreign-syncoid-snaps.sh
#!/bin/bash

# Purge foreign syncoid snapshots that result from pulling
# snapshots created on remote hosts by pulling snapshots there.
# AKA What a tangled web we weave!
#
# Only valid snapshots include the string "syncoid_$(hostname)"
# 

#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
#
# DANGER WILL ROBINSON - DO NOT USE THIS IF YOU ALSO USE -no-sync-snap
#
# With that option, syncoid will use foreign snaps instead of creating it's own.
#
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

Usage="Usage: purge-foreign-syncoid-snapshots.sh pool [pool] ... [pool]"

if [ $# == 0 ]
then
    echo "$Usage"
fi

echo "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ start  " "$(/bin/date +%Y-%m-%d-%H%M)"

for pool in "$@"
do
    echo "Checking $pool"
    for snap in $(/bin/zfs list -t snap -o name -H -r "$pool"|/bin/grep syncoid|/bin/grep -v "syncoid_$(hostname)")
    do
        echo "destroying $snap"
        /bin/zfs destroy "$snap"
    done
done

echo "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ finish " "$(/bin/date +%Y-%m-%d-%H%M)"
echo
hbarta@olive:~/Programming/shell_scripts/syncoid$

mercenary_sysadmin · April 19, 2024, 3:25pm

HankB’s answer is correct. The script he offered is “correct enough for most people,” but will malfunction on pools that have spaces in the names of datasets or zvols; it might have other potential name encapsulation issues but I haven’t checked that closely.

If you’re certain you won’t ever have weird shit in your dataset/zvol/snapshot names, HankB’s script is fine as-is, from what I can see at a quick look.

Topslakr · April 19, 2024, 4:17pm

I use --no-sync-snaps where I can, but I do have situations where I prefer to create a syncoid snap at the moment of replication. To keep that tidy, I run the following one-liner to tidy up old syncoid snaps that have outlived their usefulness.

zfs list -t snapshot | grep syncoid | cut -d " " -f 1 | xargs -L 1 zfs destroy

Curious how risky this one is

mercenary_sysadmin · April 19, 2024, 4:34pm

Neither more nor less dangerous, from what I can see. Example breakage:

root@elden:/tmp# zfs list -rt all demopool ; echo
NAME                                          USED  AVAIL     REFER  MOUNTPOINT
demopool                                      184K   832M       24K  /demopool
demopool/this dataset has spaces in            24K   832M       24K  /demopool/this dataset has spaces in
demopool/this dataset has spaces in@syncoid     0B      -       24K  -

root@elden:/tmp# zfs list -rt snapshot demopool | grep syncoid | cut -d " " -f 1 | xargs -L 1 zfs destroy
cannot open 'demopool/this': dataset does not exist

quartsize · April 19, 2024, 5:13pm

I think what you’re going for is closer to zfs list -t snapshot -o name | grep syncoid | xargs -d"\n" ..., but that really you’d want zfs list to be able to produce null-delimited output so you could use -0 instead.

Topslakr · April 19, 2024, 5:47pm

This has been fun! I updated the one-liner to escape the spaces, though putting the snapshot name in quotes seems also to work.

zfs list -H -o name -t snapshot | grep 'syncoid' | sed 's/ /\\ /g' | xargs -n 1 zfs destroy -v

Its output:
will destroy Storage/demo pool@syncoid-test1

mercenary_sysadmin · April 19, 2024, 5:57pm

When scripting this kind of thing in Bash, the best practice generally is to use a safer loop structure such as while...read. I covered this for Ars Technica a few years ago, and be sure to check out the final section, because I specifically covered the challenge of iterating through OpenZFS snapshots!

Be warned, this is still an approach with arcane pitfalls (such as setting IFS properly). They can be navigated, but you still can’t be sloppy about it, or you’re going to leave holes. Bash just is not a great language for working with this kind of data; if you’ve got the tolerance for it, something like… ahem… Perl… is generally a lot easier to work with safely.

quartsize · April 19, 2024, 6:28pm

It’s hard to be sure you’re properly handling all the possibilities with programmatic string-escaping schemes. I don’t remember if ZFS names are allowed to contain backslashes, but that’s not something you have to concern yourself with if you instead use a method which splits the input into arguments more straightforwardly.

raidz99 · April 20, 2024, 12:45pm

Coud it be that zfs list needs some -print0 alike features for safe xargs?

(However hopefully nobody ever uses line feeds or such in names, even if it would be allowed :))

mercenary_sysadmin · April 20, 2024, 3:35pm

Have you tested to see whether quotation marks themselves are valid characters in the openzfs namespace (IIRC, they are) and if so, what effect that has on your script?

You also have to watch out potentially for things like semicolons and other special characters. I don’t recall immediately which potentially problematic characters are or are not possible to embed in the namespace, and if you’re the only one taking snapshots on your system and you’re sure you simply won’t make dumb names, okay… But if there’s the potential that your script might ever get run on a system where you AREN’T the only one who might have named a dataset or zvol or snapshot, now you have to worry about the Bash equivalent of Little Bobby Tables.

inthebrilliantblue · April 27, 2024, 6:11pm

Im a bit late to this party, but I would like to link to a script I use for removing old zfs snapshots. It is pretty simple to setup in a crontab.