HOWTO set up a Raspberry Pi 4 to pull backups your ZFS pool using `sanoid` and `syncoid` with a non-privileged user

kaihp · August 14, 2023, 12:51pm

(Author’s note: I have copied over this HOWTO I made in 2020 from ‘that other place’ while making minor edits to bring the links into 2023).

So you have decided you want a physically separate machine that has a local backup of the data on your main ZFS pool.

1 Choose hardware (but carefully)

First question you have to ask yourself is: what hardware do I want to run this on?

There are more options that you can shake a stick at: Odroid HC4, Raspberry Pi 4, an old laptop or a low-power (i3/Celeron) system.

I had a Raspberry Pi 4 lying around, and considering it is a very low power system that will be always-on (and electricity cost here in Denmark is - ahem - high) this seems like a good solution for me.

The Raspberry Pi 4 is the first of the Pi’s that can effectively support this (Gb Ethernet, USB-3.0), although performance won’t be great - but that is an acceptable trade-off for me. Since the PI’s don’t have available SATA ports, you will need a USB3-SATA converter/dock (see section 1.4). For the Odroid HC4 and others that have SATA ports, you can plug into these directly.

Note that using the previous Pi’s (pi2, pi3, pi3B or pi zero) are not recommended, as the performance will be appaling due to the 100Mbit/s Ethernet port and the USB2 connection.

1.1 Select Raspberry Pi 4 variant

The Raspberry Pi 4 comes in 3 memory size variants: 1GB, 2GB, and 4GB. The 4GB version is recommended, as ZFS sure likes memory. I recommend getting an enclusure as well as well as the official 5.25V/3.1A power supply.

1.2 ZFS pool layout

For “USB reasons” (more on that later in section 3) I recommend to stay with a simple 2-disk mirror setup.

1.3. Disks.

Pick your favorite brand and disks (or the cheapest-du-jour) that can support your expected (and growing) amount of data.

1.4. USB-SATA docking station.

There appear to be quite some difference between people’s luck in getting them to work reliably with ZFS. Mine is a StarTech UNIDOCKU33 - YMMV.

Update: the dock unceremoniously died on me after being powered on for 3 years. The usual problem with appliances that die like this is the power supply electrolytic caps that dry out (bulge) and died. Sure enough, same problem. Replacing the caps with fresh ones didn’t bring it back to life, and time for a new dock. @mercenary_sysadmin says that in his experience, the SATA docks are not very reliable and will die on you even when only powered/used intermittently.

1.5. uSDXC card size and type for boot/root media.

My installation of the Ubuntu distro uses about 3.5GB of space. I would not recommend less than 16GB, as upgrading between releases can require more space then you realize. Also having to migrate the system from one uSD card to a higher capacity one is a PITY - and besides, uSD cards are cheap these days.

If you want good performance, make sure to get a “A1” class card, due to the higher IOPS.

Note that the SD-card bus on the Pi is limited to about 40MB/sec, so paying extra for a super-fast card is generally not worth it. If you want blistering performance, get a USB3 SSD and stic your root on that. I went for a 64GB Samsung Evo Plus card.

1.6. Boot-strapping items.

To install the system, you need to attach keyboard, mouse, and hook the HDMI port 1 up to a monitor, so a micro-HDMI cable/adapter is required. Once you have set the system, you can pack these away.

2. Install 64-bit Ubuntu

2.1. Download Ubuntu for Raspberry Pi

Go to Install Ubuntu on a Raspberry Pi | Ubuntu and download the 22.04.3 LTS desktop version. Note that there are no differences between the Pi 3 and the Pi 4 downloads.

Note: the reason for not choosing the official Raspbian distro is that it is a 32-bit distro, which ZFS doesn’t like very well. This creates complications later.

2.2. Flash the uSDXC card.

Download balena Etcher and flash the uSD card with the distro (this is what I did in 2020). Alternatively, follow the Installation Howto.

2.3. Install!

Setup the system to your liking. Remember to update all packages sudo apt update -y && sudo apt dist-upgrade -y.

Either allow the ‘backup’ user to login or create a dedicated user for doing backups. I ended up installing the following extra packages:

apt install debhelper diffutils findutils grep gzip hostname libcapture-tiny-perl lobconfig-inifiles-perl locate lzop mbuffer ncurses-base openssh-server openssh-sftp-server pv smartmontools ssh-import-id zfs-dkms

2.4. Setup `sshd`

I like to limit all ssh access to public-key authentication, and to only have the bare minimum list of users in AllowUsers (i.e. backup and my own user. Do not add root here).

3. Install ZFS, `sanoid` and create the pool

sudo apt install zfs-dkms

This requires a recompile of the kernel (I think), and takes a tremendous amount of time. Reboot

Download and install sanoid from the github repo.

Most (all?) USB-SATA docks “hide” the true identity of the SATA disks from the kernel, so it cannot tell the difference between two identical disks. The way to get around this is to online one disk at a time. Online the first, and create a single-vol non-redundant pool:

zpool create -o ashift=12 tank /dev/sda

Now online the next disk and attach it to the pool to make it a mirror:

zpool attach tank /dev/sda /dev/sdb

Proceed to set attributes etc as recommended by Jim Salter’s blog:

zfs set xattr=sa compression=lz4 atime=off tank

You might also want to set up autoexpand and autoreplace.

I like to set up udev so I get nice links in /dev/disk/by-*. This requires very specific incantations in /etc/udev/rules.d/60-disks.rules in form of templates that allows udevto match the disks to user friendly names. Mine looks like this (serials redacted):

# Backup mirror (2 x 12TB ST12000VN0008)
KERNEL=="sd?", SUBSYSTEM=="block", PROGRAM="/lib/udev/ata_id /dev/%k", RESULT=="ST12000VN0008-2JH101_SERIAL01", SYMLINK+="disk/by-label/tank0"
KERNEL=="sd?", SUBSYSTEM=="block", PROGRAM="/lib/udev/ata_id /dev/%k", RESULT=="ST12000VN0008-2JH101_SERIAL02", SYMLINK+="disk/by-label/tank1"

You might want to use smartctl -A -i to determine the serials. I have put a dymo label with the two human-friendly names on each of the physical disks, so ~~should~~ when the day comes, there is no doubt about which disk is which. To populate the /dev/disk/by-* dirs, run udevadm trigger

4. Setting up `sanoid` on the fileserver and rpi

4.1. non-privileged user setup at the fileserver side

If you want to use a non-privileged user on the fileserver side, you need to assign the user the correct permissions. You’ll need at least this on the fileserver:

zfs allow -u backup destroy,hold,mount,release,send,snapshot volume

On the rpi side, using a non-privileged user makes little sense, since the Linux kernel doesn’t allow you to mount and unmount devices, which bans us from automatically importing the backup pool, creating any new zvols*, and unmounting the pool after completion.

If you really want to use a non-privileged user on the rpi side, these are the permissions you need to delegate:

zfs allow -u backup create,destroy,mount,receive,refreservation,snapshot tank

*) From zfs allow manual: Delegations are supported under Linux with the exception of mount, unmount, mountpoint, canmount, rename, and share. These permissions cannot be delegated because the Linux mount(8) command restricts modifications of the global namespace to the root user. create requires the ‘mount’ permission.

4.2. Sanoid setup

Either use systemd to schedule your snapshots, or stick the following line into /etc/crontab but do NOT run crontab crontab:

* * * * *  root  TZ=UTC /usr/local/sbin/sanoid --cron

Your preference for snapshots is entirely up to you. Here’s my /etc/sanoid/sanoid.conf (I’m probably overdoing things):

############################################################
# /etc/sanoid/sanoid.conf file.
############################################################

[tank/public]
use_template = production

[tank/git]
use_template = production

[tank/home]
use_template = production
recursive = yes
process_children_only = yes


#############################
# templates below this line #
#############################

[template_production]
frequently = 4
hourly = 24
daily = 31
weekly = 8
monthly = 12
yearly = 0
autosnap = yes
autoprune = yes

[template_backup]
frequently = 0
hourly = 0
daily = 90
weekly = 26
monthly = 24
yearly = 6
### replicate snapshots from source, don't generate locally
autosnap = no
autoprune = yes

hourly_warn = 2880
hourly_crit = 3600
daily_warn = 48
daily_crit = 60

[template_scripts]
### limit allowed execution time of scripts before continuing (<= 0: infinite)
script_timeout = 5

[template_ignore]
autoprune = no
autosnap = no
monitor = no
############################################################
# End of file
############################################################

On the backup server, replace use_template = production with use_template = backup as described in the documentation.

Edit1
I learned the hard way that if the production and backup pools have different names (e.g. because you are backing up from multiple sources), then the sanoid.conf file MUST reflect that. Assume you have production pool tank that you sync to backup/tank, then your sanoid.conftemplates would look like this:

[tank/home]
use_template = production
recursive = yes
process_children_only = yes

[backup/tank/home]
use_template = backup
recursive = yes
process_children_only = yes

In fact, this could be seen as an advantage, as you can now have identical sanoid.conf files on both production and backup server, and sanoid will automagically use the right template, due to the available pool(s)
/Edit1

At this point, you should be see sanoid start making snapshots on the fileserver. Check it out like this:

#zfs list -t snapshot -d 1 tank/home
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
tank/home@syncoid_rpi_2020-04-09:09:30:12     0B      -   192K  -
tank/home@syncoid_rpi_2020-04-09:09:37:46     0B      -   192K  -

5. Performance optimizations

Note: what you choose to do in this section will depend very much on what hardware and setup you have (AES primitives built-in? LAN or WAN connection?). So take use this as inspiration for your own tests.

Wait, isn’t it premature to do performance optimizations before you got things going, end-to-end? Actually not, because you want that initial pull to go as fast as possible and a pull of a multi-TiB ZFS volume is going to take a very time - especially with the wrong paramenters. So you want your optimizations in place for that initial pull of a full backup.

TL;DR: On a RPi located on a (Gigabit) LAN, go for --compress=none --sshcipher=chacha20-poly1305@openssh.com. Fortunately, the chacha20-poly1305 is the default cipher for the rpi.

5.1. Compress at your peril!

The Pi4 is (relatively speaking) not a very fast CPU, so syncoid’s default compression (lzop) was actually a performance loss for me. This is because the Pi4 is located on the same LAN as the fileserver, so bandwidth is more expendable than CPU cycles. If your rpi and servers are located on different ends of a WAN, go ahead and benchmark the heck out of the compression algorithms.

Edit1
Turns out that compression is a general lossage on a LAN due to the inefficient gzip algorithm. I found that MobaXterm got throttled to ~13.2MB/sec regardless of cipher, when using compression. Without compression, I saw up to 77 MB/sec with MobaXterm.

I tried a couple of other clients: Cygwin openssh, the built-in Windowss (open)ssh client and the Ubuntu 20.04 on “Windows Subsystem for Linux”. They all had similar problems, although they topped out in excess of 150MB/sec.

TL;DR: scp -o Compression=no is your friend.
/Edit1

5.2. Which cipher goest Thou to?

While many x64 CPUs have dedicated instructions for performing AES computations, the ARM cpu in the rpi does not. Fear not, for here is the cheat sheet you are looking for:

Cipher	time (sec)	BW (MB/sec)
3des-cbc	49	10.9
aes128-cbc	15	35.4
aes192-cbc	17	31.6
aes256-cbc	20	26.7
aes128-ctr	20	26.7
aes192-ctr	23	33.2
aes256-ctr	26	20.5
aes128-gbm	24	22.2
aes256-gbm	30	17.8
chacha20-poly1305	12	44.4

Measurements were made with a 533MB file (558948672 bytes, nVidia v445.75 graphics driver), using scp -c $cipher server:testfile .

With the rpi on the same LAN (GbE) as the fileserver, I was able to complete initial backups of 3.2TiB at an average speed of 36MB/sec. Not too shabby

6. Testing the backup pull works and running the initial pull

Again, the initial creation of the zvols can only be done as root, so you either need to sudo you way out of it, or simply use root on the backup server for pulling backups with syncoid.

To use the non-privileged “backup” user do this:

sudo syncoid --no-privilege-elevation --compress=none --sshkey=~backup/.ssh/ed25519 -r backup@fileserver:tank/home backup/tank/home

In the beginning the --debug flag will be most helpful to sort out why the heck things are breaking. I also fell back to using scp backup@fileserver:file . to ensure that ssh wasn’t playing games with me.

If you are getting ZFS permissions errors, run zfs allow volume and study the zfs man page carefully (trust me, I’m experienced in this problem. Sigh).

7. Add `synoid` to your `crontab`

How often do you want to sync your backups? daily? weekly? I do it weekly, with the following script:

zpool import backup
syncoid --quiet -r --no-privilege-evelation --compress=none --sshcipher=chacha20-poly1305@openssh.com --sshkey=~backup/.ssh/ed25519 backup@fileserver:tank backup
# Wait a couple of txg_sync periods to avoid the zpool export cmd to fail on the pool still being active
sleep 15
zpool export backup

8. Epilogue

Well, that’s it. Remember to test your backup (if it ain’t tested, it ain’t a backup).

kaihp · October 31, 2023, 10:39am

Cipher, part 2:

Testing between the Intel Avoton C2750 and an AMD EPYC3101, the results are slightly different.

Cipher	Speed [MiB/sec]
3des-cbc	12.4
aes128-cbc	142.4
aes192-cbc	118.3
aes256-cbc	115.6
aes128-ctr	141.9
aes192-ctr	155.2
aes256-ctr	146.2
aes128-gcm	158.0
aes256-gcm	145.1
chacha20-poly1305	83.0

Measured on a 10GbE LAN and one switch between them.

Again, do not turn on compression on a LAN. It murders the BW due to the inefficient GZIP compression.

adavis · November 4, 2023, 1:54am

Thanks for the write up. If i may make make a suggestion

sudo apt install zfs-dkms

This doesn’t recompile the kernel, it just compiles the zfs kernel module. But beause you’re using Ubuntu there is already a compiled module available - then you only need to install the userspace utilities. So you could replace that step with this and save a bunch of time!

sudo apt install zfsutils-linux

HankB · December 21, 2023, 5:48pm

Thanks for putting all of this together. I’m running what I consider to be an experimental server on a Pi 4B. I’m using Debian as the base OS and have Gitea running in a Docker container. I’m using two enterprise 6TB HDDs in a ZFS mirror for storage. The Pi 4B boots from SD card and I’ve moved “busy” filesystems such as /var to the ZFS pool.

I’m using a Wavlink dock (model WLAMZ-336A) that has additional USB and charging ports along with SD card slots. I use the charging port to power the Pi 4B (no low voltage warnings) so the entire thing is powered from one wall wort. I did once connect an SSD to one of the USB ports but operation was a little flaky. I did manage to copy stuff over, but it took several tries plugging and unplugging things to get it to work. Other than that, the dock has been solid. It does support UASP too.

The only downside is that during heavy usage, the drives got pretty warm - up to 45°C IIRC. I cut the box it came in to fit as a kind of shroud over the drives and taped a couple small (quiet) fans to it for air flow and it keeps the drives between 30-35°C. It’s a little ad-hoc, but it works so I don’t mess with it.