How to install Sanoid (specifically cron)

tom_aus · March 25, 2024, 2:47am

G’day,

Currently I am using ZFS snapshots and manually sending and receiving them to a backup server using a bash script I run by hand. I am interested in a more automatic solution and am interested in using sanoid/syncoid for this role.

I have been on the lookout for a complete guide on how to do this, so far I am looking at the install.md file on github (sanoid/INSTALL.md at master · jimsalterjrs/sanoid · GitHub) and Avoiding data disasters with Sanoid | Opensource.com.

I am confused by what I need to put in my crontab, opensource.com says:

* * * * * /usr/local/bin/sanoid --cron

and the repo says:

*/15 * * * * root flock -n /var/run/sanoid/cron-take.lock -c "TZ=UTC sanoid --take-snapshots"
*/15 * * * * root flock -n /var/run/sanoid/cron-prune.lock -c "sanoid --prune-snapshots"

I was just wondering which was correct?

Any help would be greatly appreciated,

Cheers,

furicle · March 25, 2024, 11:58am

The more complicated form on GitHub separates taking and removing snapshots. It’s a bit tidier.

All of this assumes you aren’t running a systemd distro, or you really want cron… The systemd time method (higher up the same page) works nicely.

mercenary_sysadmin · March 25, 2024, 2:49pm

Either works. The simpler form sanoid --cron is what I (I’m the founding developer) use on my own systems.

The more complicated approach you’re referring to splits the “taking fresh snapshots” and “destroying stale snapshots” phases into discrete cron jobs, but if you’re not doing something else that needs to happen attached to one of those phases but not the other, there’s little or no point I can see in doing so.

The flock calls in that example are also unnecessary. This is an attempt to keep multiple sanoid processes from running at the same time, but sanoid already does its own internal filesystem locking for the exact same reason.

tom_aus · March 27, 2024, 11:54am

Fantastic, I used the native ubuntu package and everything is going swimmingly and I didn’t need to configure cron (yay).

raidz99 · March 30, 2024, 1:56pm

The flock calls in that example are also unnecessary

I tried without the flock calls and I got mails with warnings like

ControlSocket /tmp/syncoid-syncoid-recv@pve2-node-3-1711757401 already exists, disabling multiplexing
cannot resume send: incremental source 0xa30bc7ed66078c4a no longer exists
cannot receive: failed to read from stream

(My setup is slightly more complicated and I have an “A->B, A-C” scheme)

I think this happens when one job is still running (cron schedules every 5 minutes) when a second “instance” is started by cron. I think the same would happen in a simple setup (the second instance would generate some kind of output).

Am I doing something wrong or is flock a good idea in such cases?

mercenary_sysadmin · March 30, 2024, 3:34pm

Well, for one thing, you’re talking about syncoid, not sanoid. Syncoid can’t easily just say “only run one instance of me at a time” because it’s extremely common to need parallel replication jobs running.

In your case, you’re using flock to prevent the exact same syncoid run from being instantiated twice. That’s fine–it does in fact prevent the second instance from firing up, as you can see by the absence of errors in your logs–but the only real effect is suppressing the log messages, which are just telling you the same thing: that you can’t actually run two of those processes at once, and therefore it is killing off the second attempt before it can even begin.

Technically, the flock call is a bit more efficient than letting zfs itself figure out that the second receive process isn’t allowed. But I can’t imagine a scenario where the difference is significant.

The best argument for using flock in your case is that you’re aware that your scheduling is primitive and may result in unwanted parallelism, and since it’s a thing you know you did, it’s a thing you are also mitigating yourself instead of relying on upstream tooling to do it for you. This also means that even if the upstream mitigations fail for some reason, yours will still prevent the unwanted parallelism despite the upstream failures.

The best argument against using flock in your case is that it’s not actually trapping any errors that weren’t already planned for and handled upstream, and therefore re-trapping them yourself makes your crontab slightly more difficult for humans to read and comprehend, without making it significantly better on the machine level.

Either argument is valid and sufficient, so pick whichever one you like better and pat yourself on the back for doing the right thing. Whichever thing that turns out to be.

raidz99 · March 31, 2024, 8:18am

ohh, thank you so much for your detailed clarification!
You explained very well, thank you, I think I now got the detailes.
Yes indeed, missing errors, here for example if my cron job script would hang for whatever reason, always is a potential issue… And the zillion of other things that could go wrong.