Many uses for SSDs in a zpool -- can they share the same drives?

taliaferro · October 13, 2023, 3:09pm

Good morning folks; I have a pool comprised of spinning hard drives, and I know there are places I can put SSDs in my pool to improve performance:

SLOG: buffer newly written blocks to the SSD so we can acknowledge the write to the client fater, then flush that buffer to the HDDs at our leisure
L2ARC: utility is debated, but allows you to cache frequently used blocks on faster storage automatically based on the number of cache misses
Special Metadata VDEV: store filesystem metadata on separate SSD instead of alongside the files
bonus: separate, faster pool for cache data (e.g. an application wants a separate “cache” directory for data it wants to have available quickly, so maybe that should go on a separate, all-SSD pool?)

My problem is – if I bought a pair of SSDs to mirror for each one of those use cases it’d get very expensive and I’d start to run out of PCIe lanes. So I’m thinking about buying just two or three NVMe SSDs, and partitioning them (e.g. partition 1 on each drive is the L2ARC, partition 2 is for metadata, partition 3 is SLOG and partition 4 is the separate, faster pool.)

Are there any performance/longevity pitfalls I should consider in that approach? Are there any of those cases that really need to be their own separate drives?

mercenary_sysadmin · October 16, 2023, 10:16pm

if I bought a pair of SSDs to mirror for each one of those use cases it’d get very expensive and I’d start to run out of PCIe lanes.

Well, you don’t need a mirror for all those use cases. CACHE vdevs (“l2arc”) do not need any sort of redundancy. LOG vdevs (SLOG) are generally recommended to be redundant, but the only thing they store of value is dirty writes–essentially, if your system crashes and your LOG vdev dies, you just lose the last few seconds of dirty sync writes on that drive; essentially, it’s the same thing as your crash happening a few seconds before it really happened.

I am generally a big fan of using your SSDs for a separate, high performance all-flash pool where the high performance workloads go. Using SSDs as LOG, CACHE, and SPECIAL saws off some of the roughest edges of a rust pool, but it does not transform it into a pool that behaves as though it were not a rust pool.

raidz99 · April 3, 2024, 9:43am

Thanks for the detailed answer! I’m not sure if I undestand it correctly, so I hope I can hijack with two questions.
I once made a test (or “probe”?) with Proxmox defaults (just use disks, nothing on SSD), on a machine with suffiicent RAM, and altough fio got better IIRC I didn’t see practical impacts. Possibly I tested badly (or the system works well by default). So I didn’t use anything on SSD for disk pools. (The things were it matters most are on SSD pools anyway, like you suggest.)
My questions:

If the machine has plenty RAM, is there any sense in using CACHE vdevs (“l2arc”)?

Do you mean that LOG, CACHE, and SPECIAL on SSD for spinnng rust pools help a lot (although cannot do magic either) and should always be used if possible,
or
do you mean that it may help a little (but not much) and there is usually no need to worry?

mercenary_sysadmin · April 3, 2024, 1:56pm

I mean that you’re usually looking at something like ten percent improvement from attaching a support vdev to a rust pool, when you’re deploying it correctly and have a suitable workload. If you’re a storage professional designing a pool to service tens, hundreds, or thousands of users, sometimes that extra ten percent is the difference between “this performs great” and “this is unusable trash,” but I find that the folks who aren’t storage professionals tend to think that support vdevs will elevate a rust pool to behave “nearly” as well as a fully solid state pool… When SSDs range from triple the throughput of rust drives under the best conditions, and several orders of magnitude less latency under more difficult conditions.

So, back to that whole “you’re really only looking at maybe ten percent” thing. That’s sort of a half ass rule of thumb; the reality is far too complex to sum up in a sound bite. Probably the biggest potential exception is the LOG vdev. If you’re running rust drives on an HBA that doesn’t offer power loss protection itself, adding a proper LOG vdev on a well selected SSD (not just whatever trash is lying around) can improve throughput on sync write workloads by several hundred percent… The catches there being that it needs to be a proper low-latency, high-endurance SSD with no other job than being a LOG, and even then it only affects sync writes–not async writes (the vast majority of most write workloads), and not reads.

If you’re trying to run a database engine on rust, you absolutely want a LOG vdev… The question is, why are you trying to run a DB engine on rust in the first place? It’s still going to perform like absolute garbage compared to the same DB running on even (most) cheap SSDs, so…

Hopefully you get the idea. Support vdevs, properly deployed, absolutely can and do make worthwhile increases to the performance of a pool. But what they won’t ever do is change the performance to even close to the degree that going all solid state does.

raidz99 · April 11, 2024, 11:04am

Wow, thank you so much for your detailed, exhaustive and helpful answer (again). Yes, I think I got the idea and I really love the knowledge I can gain reading your explanations.