CPU recommendation for NVMe pool

spacerunner5 · March 21, 2024, 1:31pm

Hey guys,

We plan building a central NVMe storage server for high performance network attached storage. The system should act as central storage server for multiple VM hosts (linux). We plan on moving all VM storage including root (system) disks to this machine.

It will be a quite expensive experiment and one of the questions comes to the CPU used.

Planned layout:

all gen4 pcie
12xNVMe stripped into 4xRAIDZ1
expandable to 24xNVMe

Main focus is on read latency / read iops.

The question now is:

More cores or higher per core throughput / frequency? The platform choice is between epic 7003/7003 and xeon socket 4189.

best & thanks,
stefan

mercenary_sysadmin · March 21, 2024, 3:35pm

PCIe lanes are king, if you expect to get the maximum benefit out of all that NVMe. Which generally means you want to go Epyc. The 7003 series offers double the PCIe lanes of the LGA 4189 Xeons (128 vs 64), so that’s pretty much gonna be a no-brainer between those two general families.

multiple VM hosts

Moving on from the question of how to get the maximum throughput performance out of your NVMe storage, which is mostly predicated on PCIe lanes, it’s time to tackle the question of “fewer but faster CPU threads” or “more but slower CPU threads.”

If this box is to be a central repository for the storage belonging to multiple VM hosts, it’s going to be servicing a hell of a lot of parallel requests, which means more threads will beat faster threads.

Mind you, that’s mostly a consideration for which model you pick inside whichever family you go with, because Epyc 7003 isn’t slower per thread than LGA 4189 Xeons in the first place. So, this would be a consideration between eg Xeon Gold 6312U (24 cores at 2.4/3.6GHz) vs Xeon Gold 6314u (32 cores at 2.3/3.4GHz). It would not be a consideration between Xeon Gold 6314u and Epyc 7513–each of which has 32 cores, but with higher Passmark scores on the AMD side for both multi-threaded AND single-threaded workloads.

spacerunner5 · March 21, 2024, 4:46pm

Hi Jim,

Thank you so much for your fast reply.

The idea behind this is to completely separate storage from compute in a small cluster and get everything ready for e.g. VDI infrastructure and some server VMs, e.g. exchange.

Supermicro offers 2 interesting 24 (22) NVMe systems, either on AMD or Intel. However, the intel box requires 2 CPUs by nature due to pci lanes. Both systems guarantee 4xPCIe Gen4 lanes per drive.

Exactly - there will be no virtualization on this host.

Next big question is on how to get the data in and out. Our idea is to go with 40Gb Ethernet or maybe take the 56Gb infiniband challenge.

Next is how to mount that stuff, but obviously only NFS and iSCSI come to mind.

However, how would you think core count would relate to the number of drives in example setup as above? (3x4xRaidZ1 / 6x4xRaidZ1)

I‘m glad for any input. And last but not least - thanks for sanoid/syncoid

(Which I use since years and it will be the sync tool to the much slower secondary backup storage host)

mercenary_sysadmin · March 21, 2024, 5:51pm

It’s not really the number of drives that you need to relate to core count, it’s the number of similultaneous TCP flows. Each TCP flow is limited to a single CPU thread, so the more cores, the better. Having to time share a hardware thread between multiple TCP flows isn’t a very big deal at 1Gbps, but once you hit (much less exceed) 10Gbps, that VERY much matters.

You should also generally expect higher performance out of a single socket with the same number of PCIe lanes and hardware threads than you get out of a similar system with the same number of threads and lanes, but spread out over multiple sockets. NUMA issues are annoying as hell to troubleshoot, and the performance impact when workloads break wrong across the sockets is VERY significant.

mercenary_sysadmin · March 21, 2024, 5:55pm

One final note:

(3x4xRaidZ1 / 6x4xRaidZ1)

That’s going to be a definite bottleneck. People often think “my super fast nvme will compensate for my striped parity raid” but they’re almost always wrong about that.

Mirrors are going to be your best bet for performance, but even if you refuse to go with mirrors, you don’t want odd sized raidz arrays. 3-wide Z1, 4-wide Z2, or 6-wide Z2 are the relatively sensible options here. Any will outperform either of the Z1 topologies you mentioned, and any will be at least slightly more resilient to failure as well.

spacerunner5 · March 28, 2024, 6:58am

Just wanted to drop a note to say thanks again. I’m currently doing lots of research and planning on the topic.

Especially the RaidZ topic you mentioned is indeed quite surprising to me. I did not imagine it had such a dramatic impact. Since ssd prices are growing these days, I was focused on Z1. Also, I do run a 4x Z1 NVMe on a production server since 6 years now and never “felt” (might be wrong!) it had performance issues.

As soon as I get the hardware I will do benchmarking on the empty pool first. If you are interested in some layout comparison benchmarks and/or have an interesting/related fio command - just let me know. I’ll be glad to test a variety of pool / DS layouts.

Have a nice eastern weekend,
Stefan