ZFS Replication to S3-Based or OpenStack Object Storage?

Reference: Z3 ZFS to Amazon S3 Backup Tool GitHub - presslabs/z3: Backup your ZFS snapshots to S3.
Reference: Swift OpenStack Object Storage Swift

Has anyone had any experience with using either of these remote storage backends to back up a ZFS dataset?

I’m assuming that with Swift I’d need to use rclone, and wouldn’t have the benefits of ZFS snapshots, but I’m not sure.

The Z3 tool seems like it would let me use an Amazon S3-compatible service as a replication target directly, but I’d love to hear from anyone who’s actually used it before I start tinkering, as I’d need to buy some S3-compatible storage to test with.

Either of these seem like an interesting option, as remote storage providers directly supporting zfs send seem to be few and far between, and priced quite high relative to non-zfs send-capable storages. (rsync.net offers zfs send-enabled storage, but requires a minimum order of 5 TB of space, which is more than I need and also at least 60 USD a month, which is cost-prohibitive).

No direct experience.

The biggest thing I’d warn you about is that you’re not using S3 like a remote pool, you’re using S3 like a tape drive: you have to first save a full, then save incrementals. And if you restore, you similarly will need to already have a snapshot which matches the “early” snapshot on an incremental you stored in S3; if you don’t, you’ll need to begin with a full and then patch by downloading all the following incrementals one-by-one-by-one. (Z3 might automate a lot of that for you, but it’s still what you end up needing to do.)

This can make it pretty nerve-wracking doing a full restore, since… how often are you DOING a full restore? If I’m backing up a 1PiB pool to another pool with 1PiB of space, I only ever need to do a single full backup; after that, I can do incrementals for a decade and have no issues restoring, because I never “lose the full they’re based on.”

Doing tape-drive style storage like z3 does, you need to do regular full backups, because restoring means beginning with a full and then going one-by-one-by-one through incrementals which are all utterly useless without the full they’re based on.

You probably won’t want to download a full from S3 and then download and apply 3,650 daily incrementals one-by-one in order to do a full restore, but that’s what you’ll have to do if it’s been ten years since your last full. How many incrementals is too many? There’s no one single answer, but it’s going to suck even running through ONE years’ worth of incrementals after a full. But it also sucks to grovel over that entire petabyte on your storage (producing a ton of storage load on your source) AND flinging that entire petabyte across the internet to S3, even if you’re “only” doing it once a month… which could still mean needing a full plus thirty separate daily incrementals, depending on what part of the cycle you’re on when you have a major breakage at the source.

If you have a much smaller workload, a lot of this stuff becomes a lot easier to deal with, obviously. But I’m not real happy about having to do regular full backups of even a single TiB, personally, when the actual data is only changing by a few GiB per day.

2 Likes

Thanks for putting all this together. I’ve got no experience with object storage (S3 or otherwise at all), so I really had no idea what I’d be getting myself in for if I went this route. I was flailing a bit. This is a great summary; the comparison to tape storage is really useful. I don’t even have 2 TB of data yet, but I also haven’t enabled local snapshots, so planning for 3-5 TB of data seems reasonable, since I don’t do video or audio editing.

Honestly, S3-based ZFS replication sounds like way more trouble than it’s worth (for me) just to be able to use snapshots/replication. All the reasons you cite for not wanting to use it really resonate with me. That, and as a newbie, that many caveats and cautions equals an exciting new universe of ways to screw up when trying to back up my critical data. :wink:

Replication is still my ideal solution, but $60 USD/monthly to get a zfs send-enabled rsync.net account just isn’t feasible. I’m looking at Storj now; TrueNAS has a promotion with them for 5 TB/$150 yearly. I know there are other object storage-based options, and I’m looking at them as well, but I’m strongly inclined to go for a solution that (appears to be) tightly integrated with TrueNAS itself. I need to do some more research.

My end goal is to wind up with a reliable, encrypted cloud backup that gets along really well with TrueNAS so I can start decommissioning Code42 Crashplan, which I’m deploying across various housemates’ laptops and desktops and phones and ideally end up saving some money. It’s been very reliable, but restoring from it is a complete pain, it’s stupidly expensive for what it does.

it’ll be a while before I get to that point. I want to set up a cloud backup service on my TrueNAS box and risk some of my own stuff there for a few months to get really comfortable with it as central storage before I start encouraging family members to rely on it for their storage (and their backups).

If you have a friend with a bit of geographic separation, you can always consider becoming backup buddies–set up a nice cheap box with some drives in it in his house, and there’s your replication target, and you can offer the same service to your friend. If you’re worried about privacy, well, that’s what encrypted replication of encrypted datasets are for! :slight_smile:

1 Like

I wonder how much cheap 1U hosting is these days ? Cram a few 22TB drives into a 1U chassis and call it good. Or smaller drives - Supermicro has a 1U that can take 12 drives, and Gigabyte has one that will take 16 drives.

Like, colocation for a server we build ourselves? Or, renting an actual dedicated 1U server that’s pre-assembled?

I’ve thought about those, but I have no idea how to shop for that sort of enterprise service.

Generally speaking, if you’re talking colocation, it’s not worth it. The rates for colocation are set (deliberately, IMO) high enough that it’s literally no cheaper than leasing the entire box from that provider… and if you lease the whole box, they’re on the hook for maintenance and upgrades, whereas if you colo a box any problem you have is a problem you’re hoping the datacenter techs are feeling cooperative about, because there’s always the “this is your box, your problem, not our fault that whatever happened happened or we’re ‘having trouble’ repairing it, etc, etc.”

So basically, if you want to go this route, what you’re looking to do is rent a box from an inexpensive bare metal provider like (for example) PhoenixNAP or Hetzner. Hetzner is about half the cost of anybody else in the biz, but their network arrangement is… bizarre, at best, to the point that I really don’t recommend it for providing public-facing services that might require multiple IP addresses. But it’s fine if you only need a single public-facing IP, and you use eg WireGuard to connect to that.

PhoenixNAP is still inexpensive compared to most of the industry, but somewhere between half again as expensive and twice as expensive as Hetzner. But they give you a perfectly normal subnet that you can feed directly to the machine and, eg, set up a perfectly normal bridge adapter that you use to put VMs on public-facing IPs. Again, this may not be something you need, but it’s something to be aware of if you do.

How much does this kinda thing cost? Well, this server is on PhoenixNAP, and it runs me about $150/mo. That roughly $150/mo gets me a /29 IPv4 subnet (8 IP addresses, meaning 5 usable-by-me IP addresses after you deduct the network, broadcast, and gateway addresses) , 64GIB RAM, a Xeon E-2276G (6 cores / 12 threads at 3.1GHz), and two 1TB SSDs (one Intel NVMe, one Samsung 860 Pro).

I’ve got another customer at Hetzner with a pretty similar hardware loadout, and they’re paying about $90/mo as I recall… but I’m not kidding about how obnoxious that network setup at Hetzner is; I needed to run VMs there (as I do here; we’re actually operating on a VM on the bare metal host I described) and I ended up just saying to hell with public facing AT ALL, because that client was OK with just accessing their stuff via WireGuard to the host only.

Hope this helps.

1 Like

I do want to emphasize that while Hetzner is the only provider I’m aware of that’s as cheap as Hetzner is, there are quite a few options in roughly the same ballpark as PhoenixNAP, so don’t be afraid to shop around. Liquidweb (used to be iWeb) is a bare metal provider I used extensively before the rise of Linode and similar services, and they’re around the same price point as PhoenixNAP in general.

But the other thing is, if you’re bargain-hunting, you REALLY need to shop around, because all these providers will have older machines with less CPU firepower that they will let go for a LOT less than they charge for their more bread-and-butter brand-new systems. What inventory they have on special will be different from week to week, so… yeah, don’t be afraid to shop around, even if you think you already “know” what the best provider is.