Clustering with ZFS

tvcvt · March 3, 2024, 3:34am

Another thread here got me thinking about ZFS and clustering options and I recall @mercenary_sysadmin mentioning in a 2.5 Admins episode that one could use DRBD and ZFS together to share storage among multiple nodes.

I’m curious about the particulars of this setup and would love to hear some details about how that might work.

mercenary_sysadmin · March 3, 2024, 6:17am

DRBD is just a virtual block device (Distributed Redundant Block Device) that can be fed with any arbitrary real block devices on the cluster nodes. There’s no particular magic involved here, you just feed DRBD the devices exposed by ZFS zvols rather than feeding them individual drives or mdraid arrays.

Now, there probably is a wealth of potential tips for performance tuning–but I can’t give you any, because I’ve never used DRBD in anger myself. I’d be very interested in hearing any war stories, though!

quartsize · March 6, 2024, 8:22pm

DRBD creates a block device that acts like a shared disk that multiple nodes can access at the same time. Since ZFS isn’t a “clustered” filesystem and cannot be mounted on multiple hosts simultaneously, if you put ZFS on top of DRBD, you would be limited to an active-passive configuration (per pool), like for backing an NFS server with a standby. If you put ZFS (zvols) below DRBD, then you need another filesystem to put on the created block device, possibly one that supports shared-disk clustering if you’re aiming to mount the filesystem on the multiple nodes simultaneously.

tvcvt · March 8, 2024, 10:08pm

Thanks, both of you, for the responses; it’s really helpful to have that picture of how the layers can be stacked. I’m finding myself curious about these clustered file systems lately. Do you two (or does anyone) have any real-world favorites in this space? I’ve glanced at DRBD and Ceph. Anyone have any stories about using these or others that they’d care to share?

mercenary_sysadmin · March 9, 2024, 1:17am

DRBD is pretty easy to set up, if your only real concern is “verified working,” and all you need is two systems.

Ceph is a multi-headed nightmare suitable for large numbers of individual systems. It distributes a lot further than DRBD does, but you’d better bring your A game, especially if you have individual-task performance concerns. And you’ll need more hardware before it’s worth it.

I set up a toy DRBD system once to play with it, but never went any further than that. I’ve seen Ceph in action, but have never set it up or maintained it myself.

zspec · May 18, 2024, 2:48am

Hi all, i’m trying to crack this same nut.
DRBD looks interesting but I need to research it more before I can say it fits. It’s more setting up two systems and having a raid 1 across different zvols. Which isn’t horrible, per se.

I’ve been reading this article I found by EWWhite about using RedHat HA add ons to create the HA between two zfs setups.

It’s pretty compelling the way he sets it up, but looking at the cost of the RedHat HA addon… not really worth it for home labbing to be honest. Especially if you want to scale this across many machines. I’m assuming you need multiple HA addons but I could be wrong there.

What about Spectrum Scale? I’ve only used it at work when it was GPFS and it was a fiddly nightmare. But it’s now opensource and might be worth exploring?

I have my eye on Ceph but I’m building my tolerance to that level of clustering masochism.

quartsize · May 28, 2024, 1:13pm

looking at the cost of the RedHat HA addon… not really worth it for home labbing to be honest.

The downstream builds Rocky and Alma have the necessary components in their HighAvailability repositories. You then also have the flexibility of not needing to concern yourself with whether you’re using one of Red Hat’s supported configurations.

zspec · May 28, 2024, 2:15pm

That’s awesome, thanks for pointing that out. I didn’t think to check those, I thought that fell into the red hat exclusive category.

DRBD coupled with something like the RedHat HA features could be a pretty solid system. Couple with something like tailscale to add in off site hosts, I could implement my 3-2-1 storage practice pretty easily.

quartsize · May 28, 2024, 2:39pm

This may not be exactly what you meant, but since you mention 3-2-1, be sure that you consider DRBD as redundancy for availability rather than part of a backup scheme, as many of the losses against which backups protect you will instead be faithfully duplicated by DRBD onto the replica!

zspec · May 28, 2024, 2:58pm

That’s a great point, though I guess the same issue would be present in any Raid 1 style mirrored system.
I’m new to zfs but I’m thinking this is where I would look to snapshots to get moment in time backups, like before a delete or corruption event occurred.

Before ZFS I think folks would be SOL if something like this happened. I mean they could goto the tape archive stored offsite and hopefully it has that file or some pervious version of it but there would be dataloss involved.

This is a really good point I didn’t fully consider.