ZFS Replication to S3-Based or OpenStack Object Storage?

Reference: Z3 ZFS to Amazon S3 Backup Tool GitHub - presslabs/z3: Backup your ZFS snapshots to S3.
Reference: Swift OpenStack Object Storage Swift

Has anyone had any experience with using either of these remote storage backends to back up a ZFS dataset?

I’m assuming that with Swift I’d need to use rclone, and wouldn’t have the benefits of ZFS snapshots, but I’m not sure.

The Z3 tool seems like it would let me use an Amazon S3-compatible service as a replication target directly, but I’d love to hear from anyone who’s actually used it before I start tinkering, as I’d need to buy some S3-compatible storage to test with.

Either of these seem like an interesting option, as remote storage providers directly supporting zfs send seem to be few and far between, and priced quite high relative to non-zfs send-capable storages. (rsync.net offers zfs send-enabled storage, but requires a minimum order of 5 TB of space, which is more than I need and also at least 60 USD a month, which is cost-prohibitive).

1 Like

No direct experience.

The biggest thing I’d warn you about is that you’re not using S3 like a remote pool, you’re using S3 like a tape drive: you have to first save a full, then save incrementals. And if you restore, you similarly will need to already have a snapshot which matches the “early” snapshot on an incremental you stored in S3; if you don’t, you’ll need to begin with a full and then patch by downloading all the following incrementals one-by-one-by-one. (Z3 might automate a lot of that for you, but it’s still what you end up needing to do.)

This can make it pretty nerve-wracking doing a full restore, since… how often are you DOING a full restore? If I’m backing up a 1PiB pool to another pool with 1PiB of space, I only ever need to do a single full backup; after that, I can do incrementals for a decade and have no issues restoring, because I never “lose the full they’re based on.”

Doing tape-drive style storage like z3 does, you need to do regular full backups, because restoring means beginning with a full and then going one-by-one-by-one through incrementals which are all utterly useless without the full they’re based on.

You probably won’t want to download a full from S3 and then download and apply 3,650 daily incrementals one-by-one in order to do a full restore, but that’s what you’ll have to do if it’s been ten years since your last full. How many incrementals is too many? There’s no one single answer, but it’s going to suck even running through ONE years’ worth of incrementals after a full. But it also sucks to grovel over that entire petabyte on your storage (producing a ton of storage load on your source) AND flinging that entire petabyte across the internet to S3, even if you’re “only” doing it once a month… which could still mean needing a full plus thirty separate daily incrementals, depending on what part of the cycle you’re on when you have a major breakage at the source.

If you have a much smaller workload, a lot of this stuff becomes a lot easier to deal with, obviously. But I’m not real happy about having to do regular full backups of even a single TiB, personally, when the actual data is only changing by a few GiB per day.

3 Likes

Thanks for putting all this together. I’ve got no experience with object storage (S3 or otherwise at all), so I really had no idea what I’d be getting myself in for if I went this route. I was flailing a bit. This is a great summary; the comparison to tape storage is really useful. I don’t even have 2 TB of data yet, but I also haven’t enabled local snapshots, so planning for 3-5 TB of data seems reasonable, since I don’t do video or audio editing.

Honestly, S3-based ZFS replication sounds like way more trouble than it’s worth (for me) just to be able to use snapshots/replication. All the reasons you cite for not wanting to use it really resonate with me. That, and as a newbie, that many caveats and cautions equals an exciting new universe of ways to screw up when trying to back up my critical data. :wink:

Replication is still my ideal solution, but $60 USD/monthly to get a zfs send-enabled rsync.net account just isn’t feasible. I’m looking at Storj now; TrueNAS has a promotion with them for 5 TB/$150 yearly. I know there are other object storage-based options, and I’m looking at them as well, but I’m strongly inclined to go for a solution that (appears to be) tightly integrated with TrueNAS itself. I need to do some more research.

My end goal is to wind up with a reliable, encrypted cloud backup that gets along really well with TrueNAS so I can start decommissioning Code42 Crashplan, which I’m deploying across various housemates’ laptops and desktops and phones and ideally end up saving some money. It’s been very reliable, but restoring from it is a complete pain, it’s stupidly expensive for what it does.

it’ll be a while before I get to that point. I want to set up a cloud backup service on my TrueNAS box and risk some of my own stuff there for a few months to get really comfortable with it as central storage before I start encouraging family members to rely on it for their storage (and their backups).

1 Like

If you have a friend with a bit of geographic separation, you can always consider becoming backup buddies–set up a nice cheap box with some drives in it in his house, and there’s your replication target, and you can offer the same service to your friend. If you’re worried about privacy, well, that’s what encrypted replication of encrypted datasets are for! :slight_smile:

2 Likes

I wonder how much cheap 1U hosting is these days ? Cram a few 22TB drives into a 1U chassis and call it good. Or smaller drives - Supermicro has a 1U that can take 12 drives, and Gigabyte has one that will take 16 drives.

Like, colocation for a server we build ourselves? Or, renting an actual dedicated 1U server that’s pre-assembled?

I’ve thought about those, but I have no idea how to shop for that sort of enterprise service.

Generally speaking, if you’re talking colocation, it’s not worth it. The rates for colocation are set (deliberately, IMO) high enough that it’s literally no cheaper than leasing the entire box from that provider… and if you lease the whole box, they’re on the hook for maintenance and upgrades, whereas if you colo a box any problem you have is a problem you’re hoping the datacenter techs are feeling cooperative about, because there’s always the “this is your box, your problem, not our fault that whatever happened happened or we’re ‘having trouble’ repairing it, etc, etc.”

So basically, if you want to go this route, what you’re looking to do is rent a box from an inexpensive bare metal provider like (for example) PhoenixNAP or Hetzner. Hetzner is about half the cost of anybody else in the biz, but their network arrangement is… bizarre, at best, to the point that I really don’t recommend it for providing public-facing services that might require multiple IP addresses. But it’s fine if you only need a single public-facing IP, and you use eg WireGuard to connect to that.

PhoenixNAP is still inexpensive compared to most of the industry, but somewhere between half again as expensive and twice as expensive as Hetzner. But they give you a perfectly normal subnet that you can feed directly to the machine and, eg, set up a perfectly normal bridge adapter that you use to put VMs on public-facing IPs. Again, this may not be something you need, but it’s something to be aware of if you do.

How much does this kinda thing cost? Well, this server is on PhoenixNAP, and it runs me about $150/mo. That roughly $150/mo gets me a /29 IPv4 subnet (8 IP addresses, meaning 5 usable-by-me IP addresses after you deduct the network, broadcast, and gateway addresses) , 64GIB RAM, a Xeon E-2276G (6 cores / 12 threads at 3.1GHz), and two 1TB SSDs (one Intel NVMe, one Samsung 860 Pro).

I’ve got another customer at Hetzner with a pretty similar hardware loadout, and they’re paying about $90/mo as I recall… but I’m not kidding about how obnoxious that network setup at Hetzner is; I needed to run VMs there (as I do here; we’re actually operating on a VM on the bare metal host I described) and I ended up just saying to hell with public facing AT ALL, because that client was OK with just accessing their stuff via WireGuard to the host only.

Hope this helps.

1 Like

I do want to emphasize that while Hetzner is the only provider I’m aware of that’s as cheap as Hetzner is, there are quite a few options in roughly the same ballpark as PhoenixNAP, so don’t be afraid to shop around. Liquidweb (used to be iWeb) is a bare metal provider I used extensively before the rise of Linode and similar services, and they’re around the same price point as PhoenixNAP in general.

But the other thing is, if you’re bargain-hunting, you REALLY need to shop around, because all these providers will have older machines with less CPU firepower that they will let go for a LOT less than they charge for their more bread-and-butter brand-new systems. What inventory they have on special will be different from week to week, so… yeah, don’t be afraid to shop around, even if you think you already “know” what the best provider is.

1 Like

S3 backup

I am using rclone to copy to AWS and then transition to deep glacier, and also using IDrive (this changes at my whim – I have been with B2 and Storj); glacier is my long-term forever plan. Yes I test restore the odd archive and record results in a spreadsheet. I have a checksum file I test against (the archive itself checksum and pictures within) so I know all pictures are unharmed. All have worked so far. I use 7z to create 5 GB split archives of my pictures by year, then upload each S3 encrypted with rclone. As my pictures and videos increase with new phones, there are many more archives produced so considering if I should continue this. Main reason for archives is easier restore from glacier and less overhead per file, which may result in lower cost storage. IDrive is a simple file-by-file backup. If I lose data from local backups and my off-site hard drive, I will use IDrive first.

Drive

I keep a 2 TB external hard drive at my work as an off-site back. You may like to consider this as 2 TB easy to manage.

My Backup

I have more copies of my pictures. This reduces costs of backing up to the cloud. Approx 500 GB. Little embarrassed to say, I have 6 copies (local 3; offsite 1; cloud 2). Little overkill I suppose but because I am not backing up non-important data to the cloud, such as films, Linux ISOs, software, etc., I can concentrate on storing what’s important.

Hetzner and OneProvider are cheapest I’ve seen with flat rates.
Hetzner has the auction option where you can buy Two HDD’s for example and plus two SDDs if you wish.
OneProvider also has dedicated servers, but if you want to have a proper ZFS pool instead of whatever RAID they provide, search the ones with IPMI where you can install the OS yourself.

2 Likes

Yes seen a few videos about this host on the Tube.

We need a Practical ZFS community dedicated server to send our encrypted datasets to.

@mercenary_sysadmin

I don’t think I’ve got even a faint shot at offering raw storage at a lower cost than rsync.net is now. Not saying it can’t BE done, but doing so would require economies of scale I don’t have access to.

2 Likes

@mercenary_sysadmin Any idea why rsync.net requires a minimum 5 TB order to enable ZFS send/receive (see: rsync.net Cloud Storage for Offsite Backups )? I assume it’s just to cover the overhead supporting the additional features, but I wonder why the minimum is 5 TB, specifically.

I think rsync.net will be the best price for a while. There doesn’t seem to be enough competition for remote zfs replication targets for them to get in a price war with anyone. And really, 60 USD for 5 TB isn’t that bad compared to other options, but i need to cut out some other monthly expenses first.

The minimum isn’t really the 5TiB; the minimum is the $60. When the service first launched, it was the same $60 minimum but it only bought you 1TiB of space.

Essentially, $60/mo is the point at which they’re willing to take somebody on as a customer. Every customer costs you a non-zero portion of your available time and resources, and rsync.net is built on a principle of minimal automated nonsense, maximal real engineers dealing with things, so $60’mo is a pretty reasonable minimal cost for being willing to take on a new customer IMO.

2 Likes

Thanks for explaining this, especially the note about the cost for the old 1TB plan being the same. I know better than that–overhead and especially the cost to pay your employees has to come in somewhere.

But this summer has been too long and hot and my brain was just reading it as $12/TB, which isn’t horrible at all but is also frustrating because I’d rather be paying for rsync.net than nearly 100 USD for Hulu. :stuck_out_tongue:

I killed my Hulu sub earlier this year because after the Disney+ acquisition, they were offering Hulu as an add-on for only $2/mo… which I didn’t qualify for, since I was already a Hulu subscriber.

So I dropped it. It requires… I forget how long, a month? Three months? Before you then become eligible for the $2 add-on deal. But it’s still Byzantine as hell, and might wind up requiring me to drop the Disney+ and re-add it as a new customer as well.

Not a day goes by that I don’t wonder if it’s time to stop giving ANY of them my money and raise the black flag once more. I made the conscious decision to stop pirating to support reasonably-priced easy-to-deal-with streaming services. Now that they have my money, they seem to think that neither “reasonably-priced” nor “easy-to-deal-with” are still necessary, and, welp…

1 Like