Setting up syncoid for offsite backup

Hello.

I’ve been lurking here for awhile and not quite been able to figure out what I’m doing. As a ZFS neophyte, I’m just trying to wrap my head around using sanoid/syncoid. But, I think I’ve been using too many disparate sources and recommendations and find myself more confused than I started.

I have several servers for both personal and business use and after listening to 2.5 Admins for several years, finally made the leap to ZFS when rebuilding my primary server last year. However, I never got around to really taking advantage of all it’s benefits. I think that last night I was finally able to get sanoid running automatic snapshots on my test server.

The part I have confused myself on is getting syncoid setup to push (or pull??) those to/from my offsite backup.

I’ve found numerous blog posts of people doing it, as well as numerous partially related topics here; but I feel like I’ve walked away more uncertain about the process than anything. I’m comfortable with the Linux CLI, but know very little about ZFS.

Can anyone recommend a good guide for getting syncoid setup?

2 Likes

There are a few if you Google around.

However, here’s the bare bones of it.

Sanoid = snapshot management. This should be running on both sides, but you want to disable autosnap on the backup host.

Syncoid = replication tool. GENERALLY you want to run this as a cron job from the backup host, and do a pull style replication.

The documentation on Github is pretty good for getting it up and running. sanoid/INSTALL.md at master · jimsalterjrs/sanoid · GitHub.

The wiki also has some example templates you can use as a base:

Though, as I learned recently, be careful about trailing spaces in /etc/sanoid/sanoid.conf! :joy:

3 Likes

The basics are as follows, assuming you want to run syncoid on the target pulling from the source (pull backup) as opposed to running it on the source pushing to the target (push backup):

  • generate an SSH key on the target. for now, just generate one for root–later, when you’re feeling fancy, you can set up ZFS delegation and use an unprivileged user account, but for right now it sounds like you need to see a W as soon as possible.

  • copy the pubkey from the target to the source. Place that pubkey in the source’s /root/.ssh/authorized_keys file. If you’re utterly unfamiliar with this, this can be as simple as literally just putting the pubkey in /root/.ssh and naming it authorized_keys. Later, if you need more than one key to work, you just put extra keys in the same file after the first one, each on their own line. There is no additional syntax to worry about.

  • Now, make sure your SSH key sharing worked–you should be able to ssh root@source from your target machine and immediately get a prompt. If not, don’t go any further–you need to get this part right before going any further.

  • Once you’ve got SSH working using keys, you’re ready to roll: syncoid -r root@source:sourcepool/dsname targetpool/dsname will replicate the dataset sourcepool/dsname and all children from the source machine to targetpool/dsname (and so forth) on the target system.

Essentially, syncoid makes ZFS replication look and feel like a much faster version of rsync. Note that in addition to the datasets themselves, you get all snapshots which were taken on the source present on the target as well.

ALSO note, you should be running Sanoid on the target as well as the source–but on the target, the module referring to replicated-in datasets should use the backup or hotspare templates, not the production template. This is because you should not attempt to take snapshots locally of datasets you routinely replicate in from a different source; at best, incoming replication wipes out all the snapshots you took locally (it has to, this is part of how replication works) and at worst, you wind up with same-name but different-GUID snapshots that don’t match and can’t be used as replication targets, and you wind up having to figure out how to unsnarl a confusing mess. :slight_smile:

If you have further questions, ask: but try to get through the SSH key sharing bits and confirm that you can SSH from target to source successfully first, please.

3 Likes

Thanks @bladewdr and @mercenary_sysadmin.

Yes, I did that in combination with using the GitHub docs. But I think I found some poor quality results and when attempting to use those to better understand the docs, I screwed things up and confused myself slightly.

Apologies for the delay, I was away for the weekend.

Yes, you are correct. I want to run syncoid on the target and pull from the source and yes, ideally I would like to get it backed up sooner than later. While I already have SSH keys setup on both boxes, I’m also running Tailscale on both boxes as well, with Tailscale SSH enabled and setup. Given that my offsite backup is located at a friend’s house who runs Windows and has minimal security skills, that’s why I wanted to use Tailscale and keep things as locked up as possible. I may eventually use encrypted backups, but for now wanted to understand how to set things up and use it correctly before tackling that.

So, is this easier to run as root vs running as a user with sudo privileges? Or is it easiest to run as root until I setup ZFS delegation? At the moment, I have root login disabled on both boxes. I can just re-enable it in my Tailscale ACL, but wanted to make sure first.

Seems like it shouldn’t be too difficult. As I understand what you both have shared, in the simplest terms I need to do the following:

  • Setup sanoid and syncoid on target box with a backup template
  • Setup cron job to run syncoid on target box
  • Setup ZFS delegation

I can confirm that I am able to SSH from target to source both as user and root (I re-enabled it in the aforementioned ACL).

1 Like

My recommendation? Get this set up and working in some test VMs first, before deploying it with critical data.

It’s incredibly simple to make a pool of sparse files, and you can use that to test doing send-receive with Syncoid, in fact, that’s exactly how I tested things.

for i in {0..3}; do truncate -s 5T /tmp/$i.raw; done

zpool create -o ashift=12 testpool raidz1 /tmp/*.raw

From there I usually copy in some junk data - cping some log files from /var/log is usually my go to.

There, now you have a pool to play with on both sides, without needing actual hardware.

2 Likes

Hmmmm, something didn’t go correctly here. I probably should have done what @bladewdr said in the previous reply.

I assumed (and should have asked) that this was supposed to be run from the target machine. In other words, the remote backup box. Is that correct?

When I first ran syncoid -r root@source:sourcepool/dsname targetpool/dsname (and yes, I ran it with the name of my source and pool names and target names and not the example given), I got the following:

CRITICAL ERROR: Target backups/server exists but has no snapshots matching with my-pool/vm!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.

          NOTE: Target backups/server dataset is < 64MB used - did you mistakenly run
                `zfs create backups/server` on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.

So, I deleted the pool I had created and ran syncoid again.

This time, I got the following error:

WARN: ZFS resume feature not available on target machine - sync will continue without resume support.
INFO: Sending oldest full snapshot my-pool/vm@autosnap_2024-06-27_12:57:01_monthly (~ 265.9 GB) to new target filesystem:
cannot open 'backups': dataset does not exist
cannot receive new filesystem stream: unable to restore to destination
64.0KiB 0:00:00 [ 189KiB/s] [>                                                                                                              ]   0%
mbuffer: error: outputThread: error writing to <stdout> at offset 0x17000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
^CCRITICAL ERROR: ssh      -S /tmp/syncoid-root@source-1719867443-5981 root@source ' zfs send  '"'"'my-pool/vm'"'"'@'"'"'autosnap_2024-06-27_12:57:01_monthly'"'"' | lzop  | mbuffer  -q -s 128k -m 16M' | mbuffer  -q -s 128k -m 16M | lzop -dfc | pv -p -t -e -r -b -s 285496691432 | sudo zfs receive   -F 'backups/server' failed: 2 at /usr/local/sbin/syncoid line 549.

What did I screw up and what do I need to change?

1 Like

Did you delete the whole pool, or just the dataset named server that you’d created beneath the pool? Because the only thing you were supposed to destroy was the dataset, not the entire pool beneath it.

1 Like

Yes, I did. However, it was empty. So, is it safe to assume that my error came from deleting the whole pool and not just the dataset?

Newbie decision/mistake. Trying to learn as I go here.

1 Like

Yes. The actual pool has to exist, because otherwise there’s no target to replicate to. But when it’s an initial replication, the target dataset cannot exist yet, because it has to have the same actual origin as the source dataset.

So, eg:

root@target:~# zpool create targetpool mirror wwn-disk1 wwn-disk2
root@target:~# syncoid -r root@source:sourcepool/stuff targetpool/stuff

NOT:

root@target:~# zpool create targetpool mirror wwn-disk1 wwn-disk2
root@target:~# zfs create targetpool/stuff
root@target:~# syncoid -r root@source:sourcepool/stuff targetpool/stuff

Because if you try to do the latter, well… just like the error message tried to tell you, targetpool/stuff already exists, and does not share any common snapshots with root@source:sourcepool/stuff.

1 Like

Got it. Thank you for the explanation. That makes sense.

1 Like

@mercenary_sysadmin After your helpful explanation yesterday, I corrected my rookie mistakes and I now have 30 hourlies (is that even a word :thinking:) and 2 monthlies pulled from source onto target. Thank you!

Now I just need to learn how to restore from the zfs backup… :face_with_monocle:

Once I figure that out, I may consider reconfiguring my topology as discussed in the other topic. It also occurred to me, that with compression enabled on my current backup box, that disk will be usable with the layout you suggested for longer than my ignorant self had originally imagined.

2 Likes

Full disaster recovery where like the entire pool got destroyed, or your HBA fried all your disks? You stop the cron job or systemd timer that’s running your automatic backup, and then you zfs send in the other direction :slight_smile:

Just don’t forget to turn it back on when your production is back up.

You can also cherry pick files out of snapshots if needed, which is what most restore scenarios look like, and usually don’t even require you to touch the backup server. Oh no, Jerry from accounting accidentally overwrote this mission-critical spreadsheet! Just mount the snapshot, copy out the file you need, and done.

2 Likes

I’ve read others mentioning that as well, particularly on Reddit.

I’ll have to figure out how to do that. I can definitely see that being useful, especially in a scenario like you describe.

I’m sure it’s happened to all of us, besides just Jerry in accounting. :laughing:

1 Like

Here are three ways of doing it:

  1. you can use the special snapshot directories:

zfs snapshot poolname/dsname@snapname
ls /poolname/dsname/.zfs/snapshot/snapname

  1. or you can mount the snapshot anywhere you’d like, read-only:

zfs snapshot poolname/dsname@snapname
zfs set mountpoint=/tmp/snapname poolname/dsname@snapname
zfs mount poolname/dsname@snapname
ls /tmp/snapname

  1. or you can mount a writeable clone of the snapshot anywhere you’d like:

zfs snapshot poolname/dsname@snapname
zfs clone poolname/dsname@snapname poolname/tmpclone
ls /poolname/tmpclone
echo WRITEABLE >> /poolname/tmpclone/writeable.txt

4 Likes

Thanks! That looks easy.

1 Like

So, I have my user privileges updated and that seems to run fine. I can execute zfs and syncoid commands without needing sudo privileges. I followed the directions on GitHub here and that all seems fine (as far as I can tell).

My issue is that my cron job is not running. I’ve checked syncoid and the last time it was run (based on the most recent hourlies) was when I did it manually, several days ago. Following is the cron job for my current user on my backup box.

30 0 * * * /usr/local/bin/syncoid --no-privilege-elevation -r serveruser@my-server:my-pool/vm backups/vm
1 Like

What user is the cron job running as?

1 Like

As current user on the box. In other words, just the admin user I created when I stood up the system.

I used crontab -e for that user. It also is what shows after crontab -l for the same user.

Should I have done the crontab under a different user?

1 Like

If you can run the syncoid command manually as that user from the command line, then that’s not the issue. (Can you run the command in your crontab successfully, directly from the shell, as the same user?)

You might want to use --verbose and shell redirection to log its output, if things aren’t working from cron. syncoid --verbose source target 2>&1 >> /var/log/syncoid.log kinda deal. Maybe get fancy and echo a timestamp line in first, also from your cron line.

1 Like

Okay, so I’m still screwing something up here. I’ve edited the crontab to try and test it every 5 minutes and every minute. And two things are not happening. Nothing is being output to the log file and the command does not seem to be run. If it matters, I’ve tried setting the owner of the log file to both root and as the current user and still nothing is written there.

For reference, following is the testing cronjob.

* * * * * /usr/local/bin/syncoid --no-privilege-elevation --verbose -r source destination 2>&1 >> /var/log/syncoid.log

Another curious thing is when checking systemctl status cron cron attempts to run for the current user and wants to output to an email, but discards the output because I do not have an MTA installed (and don’t care to do so at the moment). It’s odd that it doesn’t output it to the log instead.

Edit: To make sure the log was being captured, I tested running syncoid directly with output sent to the log file. This worked correctly, as expected. It wrote to the log file indicating that the most recent hourly was synced. And running zfs list -rt all confirms this as true.

1 Like