Backup targets for ZFS

Hi,

I was wondering if anyone could recommend some zfs targets to backup to for offside backups, that preferably won’t break the bank?

Currently I’m using restic to backup to rclone-supported backends, but it feels clunky and the restore experience is not that great.

I’ve got around 6tb in by backup archive.

The idea would be some family member and place a small system there, but alas that’s not really a possibility.

Anyone any ideas?

I’ve checked out zfs.rent and rsync.net - I’d love the latter but that is unfortunately quite expensive.

1 Like

I was looking for the same thing and came across this article: My ZFS backup strategy | Rafael Kassner

tl;dr they use piping zfs send to a file and then send off the file to the cloud

I haven’t tried it but in my head one could do:
zfs send to file → rclone file to cloud target

Then restores would be:
rclone file from cloud target → zfs import? from file

Just started using ZFS yesterday so mistakes in my thinking are quite possible.

2 Likes

You can absolutely pipe zfs send to a file, then treat that file the same way you would any tarball-style backup.

I’m tempted to try to put together a community “backup buddies” portal for folks who would like to trade spare capacity on their own network for backup capacity elsewhere and vice versa, complete with instructions on how to use ZFS encryption to remove trust issues (beyond “don’t delete my shit”) on the remote end. That really is, BY FAR, the most cost-effective way to back up your stuff, and direct ZFS replication beats the pants out of schlepping giant tarball-equivalents around!

6 Likes

That would actually be grand!

1 Like

I agree, thus would be awesome. A buddy and I have talked about this but we run zfs and lxcs don’t have access to the zfs Filesystem, so would have to give access to the main proxmox system which we didn’t like.

If you could bake this into syncoid (hint hint) that would be awesome!

You should be able to grant access to the pool from inside an lxc container; that’s functionality Allen Jude and company provided for a client a year or two ago IIRC.

Ability to safely replicate to or from an untrusted system is already available in syncoid, really–it’s the setup work that isn’t, and that’s more of a how to than a write a tool kind of thing.

Essentially the target just needs to create a parent dataset for the source to replicate to, and provide the source a set of user credentials which have ZFS delegated privileges sufficient to replicate in. A quota can also be set on this parent dataset, to restrict the total space available to the source.

Once that’s done, the source just pushes backups to the target, from an already-encrypted dataset, using raw send. The target can receive replication just fine, but cannot itself decrypt the data it receives.

Further reading: Improving Replication Security With OpenZFS Delegation | Klara Inc

Hmmm okay I will look into it. I looked around on the proxmox forums and people were saying it couldn’t be done. I think even the official proxmox user guide says it cannot be done. For me, inside an lxc, none of the zfs commands work even on a privileged container.

It’s ENTIRELY possible that Allan’s work hasn’t made it into whatever repository Proxmox uses yet. Also entirely possible that the Proxmox community doesn’t know about it yet, even it’s there. And an outside chance that I’ve misinterpreted the thrust of the work I’m speaking of; but I don’t think so.

I haven’t used the new features personally, but it looks like they went out with OpenZFS 2.2.0. And it looks like that should be present in Proxmox 8.1 and up:

Ah, yeah Proxmox is just now getting it into their systems, then. I am still on Proxmox 7, that must be why. Thanks!

So potentially incorrect rambling from the slow guy in the crowd on direct ZFS replication:

  • Using the right flags, I can send my snapshots encrypted to a target that I don’t fully trust using the zfs send | recv commands
  • The target that I don’t fully trust would be able to read some metadata but not able to access the encrypted data out right (barring unknown vulns or me providing the keys)
  • This is preferred for a whole host of reasons but mainly speed and general realiablity/ease of automation

Correct. If you encrypt a dataset locally, then replicate it to a remote system using raw send (zfs send -w, or syncoid --sendoptions=w) then it will arrive on the remote system still encrypted, with a key that the remote system does not possess.

An administrator on the remote system can see the names of your datasets and zvols, how large they are, and a few other metrics such as what the recordsize/volblocksize of each is. The administrator of the remote system does not know how many files or folders you have, what their names are, or what their individual sizes are.

I wish there was a way to have a dataset unencrypted locally and encrypt it in the send process. I’m not really keen on encrypting data in my own house and making it that much harder to recover should something happen. I do however want all my off site copies to be encrypted.

As far as I understand the closest thing to “zero access” remote backups with unencrypted sources would be to have an encrypted parent dataset on the target that you unlock and lock before and after the receive step? That would be okay for my own remote host but I would hesitate to do that on a “match made on the internet” system.

I would probably trade a TB or two for that kind of buddy system regardless though and just eat the local encrypted copy :smile: A tip for the write up, all the ZFS and minimal user privileges related things must be in place of course, but in addition I would need the write-up to include some pretty watertight iptables lockdown. I don’t want the guest user to be able to jump-host into my network!

Probably the easiest way to set this up if you don’t want to have to deal with encryption on the source is to have the source unencrypted, a local backup that is encrypted, and then replicate the local, encrypted (but trusted) backup to the remote, encrypted (but untrusted) target.

So then you’ve got source (unencrypted), local backup (encrypted, accessible) and remote backup (encrypted, verifiable but inaccessible without the missing key).

I’ve tried googling this in the past (admittedly quickly and without in depth research), but how can you even send from an unencrypted dataset to an encrypted one? Since encryption is a read-only flag you can only turn it on when the dataset is created, unless I’m misremembering.

Create an encrypted dataset, then replicate your unencrypted dataset in as a child of the encrypted dataset. Assuming you’ve allowed inheritance, and not used raw send, then presto: you’ve now got an encrypted replica of your originally unencrypted dataset.

Huh. Somehow using inheritance like that never occurred to me.

Time to play with this in my test environment.

With inheritance you mean the zfs property inheritance, right?

Interesting way to convert from non-encrypted to encrypted - thanks!

Yeah, that’s what I suspected - thus “eat the local encrypted copy”. Of course using that copy as the local backup is the storage effective way of doing it. Am I overly paranoid if I say I’d prefer not having encrypted local backup and encrypted remote backup be my recovery “chain”? I feel that a corrupted source dataset or zpool would quickly propagate and all backups being encrypted would make recovery that much harder. I get it’s an unlikely scenario but I feel it’s the reason you have the 2 different medium rule in 3-2-1 backups.

Then do it the other way around: encrypted source, non-raw send to local unencrypted backup, raw send (preserving encryption) to offsite backup.

Now if something goes wrong in prod, you’ve got both the unencrypted local and the encrypted remote to work with for recovery.