Is there an optimal or max level of vdevs per RAIDZ config?

I’m planning a semi-major layout change for a new build. Lets say I have 20 hard drive bays and I want to fill up all of them and I have a tons of options to choose from.

We have options like:
4x5 Z1
5x4 Z2
2x8 Z2
2x10 Z1
2x10 Z2

I understand the more I split it up I get ‘more redundancy’ as well as easier to expand the pool later on etc for the cost of usable space, but are there other technical reasons to NOT make something crazy like a 2x10 wide Z2 pool? What kind of possible overhead starts getting taken into consideration at higher vdev levels?

Generally speaking, performance scales with vdev count, not with individual drive count. So, long story short, the more vdevs, the higher the performance.

If you want higher storage efficiency (more storage for the same number of drives), you want wider vdevs, preferably optimal width (eg, the number of drives in each vdev is a power of two after deducting parity count: so, 3-wide Z1, and 4, 6, or 10-wide Z2). But expect your performance to decrease along with it, and also don’t necessarily expect to get as much storage out of the deal as it looks like on paper, with wider vdevs–for example, if you store a 4KiB file on a Z2 vdev, it’ll occupy 12KiB for a horrifyingly poor 33% storage efficiency no matter what because undersize blocks go on undersize stripes.

With 20 bays to work with, the most commonly recommended/recommendable layouts are:

10x 2-wide mirrors
6x 3-wide Z1 (with two bays open for auxiliary vdevs, spares, whatever)
5x 4-wide Z2
3x 6-wide Z2 (with two bays open)
2x 10-wide Z2

The above are ranked in order of decreasing performance.

I do not recommend ever going wider than three disks on a Z1 vdev–that’s simply not sufficient redundancy for the number of points of failure present.

1 Like


@mercenary_sysadmin Thanks for that. You summarized in a few paragraphs something that I still didn’t completely understand the first go around after watching about half a dozen YouTube videos. :slight_smile:

I went with the all-mirrors set up, despite the 50 percent storage efficiency. That’s still more than enough space for my needs.

One thing I didn’t consider when I first started is, given the size of modern HDDs and their relatively slow speed, how long it takes to resilver a vdev when a disk needs to be replaced (that is, to sync everything up to the new disk so the vdev is back up to the full level of redundancy that has been chosen).

For example, on a traditional ext4 file system, I once had to replace a drive in a single mirror of 2x14 TB disks. It took 24 hours to resilver, and during that time the entire mirror was vulnerable to failure: I’d have lost everything if I lost the second disk before it was done resilvering.

In a weird way, I’m kind of glad that disk failed when it did. It’d have been a lot less pleasant to learn that lesson with an oversized Z1 or something.

Mirrors resilver faster than any other type of ZFS vdev because they just need to copy all the data from one single disk to the new disk, which decreases the time the data in the mirror is vulnerable. Resilvering itself is an intense operation, so it arguably can put some disks at higher risk of failing – which you don’t want to have happen during resilvering. So, quicker is better.

(RAIDZ1/2/3 has to do a lot more parity calculations, so it can take much longer and is usually more compute intensive and harder on the disks).

An advantage of ZFS is that because it manages both the filesystem and the redundancy, it knows exactly which blocks need to be copied. Time saved depends on how full the pool is. Additionally, part of what its scrubs accomplish is to increase your confidence that something unreadable on a disk is not secretly waiting to impede your ability to restore a pool’s redundancy. Not that you needed persuading! But hopefully your resilvering experience is better with ZFS.

Anyway, when pondering pool geometry, if I can’t have a mirror pool, I tend to re-read ZFS RAIDZ stripe width, or: How I Learned to Stop Worrying and Love RAIDZ, remembering that it’s from 2014, though, and like Jim, I would not use raidz1 with more than three disks of today’s sizes, let alone five (maybe not even yesterday’s, remembering that Xserve RAID, shudder), even with 512-byte sectors, tiny recordsizes, and no compression.

While I’m remembering things, I’ll also say I personally avoid making widths a multiple of the raidz level plus one, having once spent too much time trying to figure out why I seemed to have a hot spot on every fourth disk (so not, for example, a 6-wide Z2), but if that were actually worth worrying about, I imagine I would have seen someone else mention it, and I never have.

1 Like