Well, mostly. But instead of “mechanical workload” let’s go with 'individual workload."
Essentially, your pool performance is derived from the individual performance of each of your drives, combined with the characteristics of the pool topology you choose. Adding a larger number of total drives (“spindles”) CAN increase overall pool performance, but it does not necessarily do so, and usually to nowhere near the degree people expect it to.
I arrived tentatively at the number 18 to figure in two backup drives. I’ve been pointed in the direction of Z3 (what I use currently),
This is a bit ambiguous, but it sounds like you mean you’re only looking at a single vdev plus some SPARE
s, which would be a mistake with this many drives. You can build a pool out of essentially any number of drives and/or vdevs, but given that you aren’t locked into a specific set of hardware yet, it makes sense to buy your hardware in order to build the best pool, rather than building the best pool you can out of the hardware you’re already locked into.
This means pick topology first, then worry about how many drives you need (and what capacities) later. The usual choices here are as follows:
- 2n mirrors – highest possible performance, single redundancy, 50% SE
- 3n Z1 – very high performance, single redundance, 67% SE
- 4n Z2 – high performance, dual redundancy, 50% SE
- 6n Z2 – moderate performance, dual redundancy, 67% SE
- 10n Z2 – acceptable performance, dual redundancy, 80% SE
I don’t really recommend Z3 at this scale.
The “SE” above refers to Storage Efficiency, the ratio of total capacity vs usable capacity. Worth noting: that’s a maximum storage efficiency, not a guarantee. If you store a lot of small blocks, you won’t get the efficiency shown above: for example, if you need to store a single 4KiB file on a 10-wide Z2 with recordsize=1M
, that single 4KiB file will actually take up 3MiB (one 1MiB block of data, and two 1MiB blocks of parity).
While you absolutely can do different widths than the ones above, you should be aware that when it comes to incompressible data, they won’t perform optimally. For example, a 10-wide Z2 needs to store each block on 8 data disks and two parity disks. Recordsize is always a power of 2, which means that recordsize/8
is an even number, which means it doesn’t need padding. An 11-wide Z2 would try to store the same 1MiB of data divided among nine disks, which doesn’t add up evenly, which means “padding” to make up the difference–wasted space on disk, and wasted performance coming back off the disk.
So, now that you know all that, the question becomes which topology appeals to you the most? Whichever one, you build your pool out of those building blocks: an array of 2n mirrors, 3n Z1, 4n Z2, 6n Z2, or 10n Z2 vdevs. As many vdevs total as you need to meet your capacity goals; and each additional vdev means additional performance capability as well.
Hold up, didn’t I just get done telling you that adding disks doesn’t often help? Yes, but that’s in terms of the disk count itself, not the vdev count. Four 2n mirrors outperforms three 2n mirrors. Four 10n Z2 outperforms three 10n Z2. You get the idea. What doesn’t fly is thinking that 20 disks in two 10n Z2 vdevs will outperform 10 disks in 2n mirror vdevs–because even though the total drive count of the mirrored pool is only half that of the Z2 pool, the underlying topology is immensely higher performing for the mirrors, and therefore the much smaller final pool is faster than the larger pool is.