ZFS / Sanoid in SSH Chroot

Hello,

I use syncoid to raw send encrypted datasets to an off site backup via a push. Like many others, this causes my source server to kernel panic and have corruption in the local snapshots.

I have observed that if I switch to a “pull mode” IE the off-site target initiates the backup, my source server does not crash (still has corrupted snapshots). However, I don’t like the idea of the off site server being able to ssh into my source server. I played with setting up an ssh-chroot environment and copying in the zfs and syncoid binaries and shared libraries, but I was nervous about the implications of this. I do think this solved my security concerns.

If I were to do this are there dangers in the host system updating a shared library, but the ssh chroot environment having an older version?

This is not normal. In your shoes, I would be working on solving this problem. I’m also curious about who “many others” are as I’ve not heard this.

As far as security. I cannot comment on your chroot strategy. For remote backups I’ve switched to “pull” so it is not possible to manipulate the backups on the host being backed up (that I’m aware of.)

Initially this was using SSH with passwords disabled and openings in my firewall restricted to the remote host IP address. I’m now using Tailscale between two hosts running Debian 12.

I did experience hangs a year or two ago that seem to be related to ZFS but upgrading to Bookworm has resolved that. The hangs did interfere with anything that required disk I/O but did not result in a full blown kernel panic. Over a period of a year or so I experienced this hang several times.

2 Likes

I’d love to solve the root cause but AFAIK this is a known issue with encrypted raw sends. Here are a few of the open issues.:

I’ve fully replaced my disks and no change in the behavior. The corruption is only ever in snapshots and I have both local and offsite backups so I havent been too worried. I’m open to trying fixes but these issues seem to not just be on my system.

My main concern with “pull” is if my remote backup system is compromised they have a tunnel into my home server. With “push” there would be no keys for the offsite system to reach back into my LAN.

I was curious about the dangers (if any) of having the same zfs pool in a chroot and native environment since a chroot ssh jail seemed like a good way to wall off the remote.

Thanks for the links. I actually commented on that last one.

I don’t use raw sends but I had seen the “permanent errors” in the snapshots. I never experienced a kernel panic but some of the dumps look familiar.

Full pool backups were the only operation that provoked the errors so I disabled those. A couple days ago there was a suggestion that 2.2.5 or 2.2.6 had the encryption related issues resolved. I’ve turned them back on and so far there are no permanent errors. Still too soon to declare victory though.

1 Like

The only time I’ve experienced corruption (not related to a failing disk) is when using ZFS on a USB-attached disk. The other thing I experienced with these USB disks was various ZFS operations would just hang forever. The machine was still online and functioning, though, just the ZFS commands were stuck, so sounds like a little different from your experience.

For doing backups across the internet, I’ve used Zerotier with great success. It’s quite similar to Tailscale as mentioned by @HankB . This way I can SSH between hosts without having their SSH port exposed to the public internet.