Model recommendation for NVME mirror NVME disks in Late 2024 (New or Used Enterprise, VM and Database Storage, 1-2 TB)?

SinisterPisces · September 19, 2024, 6:11pm

Hello,

My TrueNAS server has one PCIe 4.0x4 slot. I don’t think it supports bifurcation, though I need to confirm that.

I found this PCIe 3.0x4 card with a PCIe switch on board, which holds 2 NVMEs. https://www.amazon.com/gp/product/B0BCFZQZLR/ref=ox_sc_saved_image_2?smid=A36DOQ8QSJXCYP&psc=1

(I’m not sure what card(s) exist in a PCIe 4.0x4 form-factor, but I suspect they’d be substantially more expensive. Given that my NAS only has a 2x10 GbE connection, PCIe 3.0x2 per disk (2 GB/s) is likely fast enough and will run cooler. I think it’d at least be fast enough to set up the pool and get it configured and into production so I can test it and see if it’s worth it to hunt down a more expensive PCIe 4.0x4 solution.)

I’d like to add an NVME mirror in the 1-2 TB total storage range for VM/LXC/Database storage. Right now, I have one Proxmox node, and all the VM data is stored on the same NVME mirror where the OS lives. That’s fine for the VM and LXC containers themselves, since I have backups, and I’d be willing to go on that way for a while, but it also means for that for the moment, I’m doing things like storing my MariaDB database inside a ZVOL-based virtual SCSI disk that belongs to the database VM.

I don’t love that. I’d much rather have persistent storage for critical data outside the VMs and LXCs themselves. I feel like I’m setting myself up for annoying things to happen.

Any suggestions for NVME disks in the 1-2 TB range that have endurance suitable for VM storage? I’ve had good luck with Sabrent so far (I haven’t murdered any of their NVME when using them for VM storage), but I’m curious what else is out there. I’d be willing to look at used enterprise NVME as well. (Firecuda/Ironwolf/etc.)

lilymartin · September 20, 2024, 5:36am

Hi @SinisterPisces , Great discussion, For 1-2 TB NVMe options’ have you considered brands like Samsung or Western Digital? They have solid endurance ratings for VMs. Also; checking out enterprise options could yield some good deals on used drives…

adaptive_chance · October 2, 2024, 7:51am

Anything turn up? I’m leaning toward used datacenter drives after reading so many stories about the relentless enshittification of consumer desktop SSDs over the past 3-4 years. Power-loss protection, support for namespaces, and clueful firmware that’s been thoroughly tested (in theory) by the server OEMs are the main reasons. I’d rather have a gently used enterprise-grade drive vs. a brand new consumer part personally. I just haven’t figured out which one yet…

SinisterPisces · October 2, 2024, 6:44pm

I haven’t bought anything yet.

I’m definitely going to do used enterprise drives. I’ve always had good experience with used enterprise SATA HDDs and SAS SSDs from reputable enterprise liquidators on eBay, so I’m fine with the risk of buying used (especially when the seller offers a warranty).

The boot mirror in my server is actually a pair of used PCIe 3.0x4 Ironwolf Pro 480 GB NVMe.

I still haven’t decided on a used enterprise model yet for the pair I’m looking to buy, but based on the below, I’ve decided 2 GB is the minimum, and 4 GB would be better. That said, realistically, I can’t afford a 4 TB enterprise NVMe, let alone a pair of them.

Honestly, I’m rethinking my whole NVME strategy at this point. I’ve once more managed to end up with a server with too few PCIe slots and NVME slots, so I’ve decided to change things up a bit:

The PCIe 3.0x4 dual-slot card will be used for my boot mirror (a pair of 480 GB PCIe 3.0 FireCudas). Running my OS at 3.0x2 speeds (assuming each NVMe gets half performance from the switch, which is probably oversimplifying the real-world results) should be fine.
That leaves me with two open PCIe 4.0 NVME slots, one of which runs at PCIe 4.0x4 (8 GB/s) speeds, and the other of which is downgraded to PCIe 3.0x2 (2 GB/s).
2.1 So, the best I can do on a mirror here is ~ 2 GB/s writes, and at minimum 2 GB/s reads (not sure if I’d read faster given that the drives will be running at different speeds. I doubt it.)
2.2. And I probably won’t get close to 2 GB/s raw write performance, as without a slog drive, I’ll be avoiding async workloads, so the sync writes will impose a penalty.
2.3. In practical terms, I don’t think that penalty will matter that much: the bottleneck will be network thoroughput. I have 2x10 GbE NICs in an LACP bond, and would most likely be accessing VM storage/database storage via NFS (or iSCSI?). So realistically, I’d not see a single connection to the NVME pool try to go faster than 10 Gb/s (1 GB/s).

EDIT 1: My assumption here, of course, is that I need to avoid async workloads since I don’t have a slog drive. I need to do a separate thread to confirm that, but I think an effective 3.0x2 NVME mirror pool is perfectly fine for home server use.

EDIT 2: From my brief googling, all the 4 TB enterprise NVMe on eBay appears to be U.2 form factor, and even new prosumer 4 TB NVME is stupid expensive.

I think I might end up going with a pair of 2 TB Sabrent Rocket 4 Plus 2 TB disks, if I can’t come up with a better used enterprise option. Rocket 4 Plus SSD - Sabrent . These have a 1.4 PB TBW enduirance, which should be more than sufficient for a home server (right?).

Their disks are high-endurance and I’ve always had great luck with the Rocket 4 line in other uses.

EDIT 3: I need to look up the Gen 4 FireCuda drives, but there’s a sale on the 2 TB ones here that makes them look very attractive ($130 each): https://www.bhphotovideo.com/c/product/1827097-REG/seagate_zp2000gm3a063_2tb_firecuda_530r_internal.html

jay_tuckey · October 3, 2024, 1:02am

If you are using SSDs as the primary storage you likely won’t need a slog drive. slog is mostly used for speeding up fsync operations on a slow disk, eg:

DB writes out some pages without fsync
As DB closes transaction it does an fsync to ensure data is reliably written.
ZFS can fsync to a spinning disk, taking a few ms, or to a slog SSD taking a fraction of a ms

Here you can see the SSD slog makes the fsync much faster, allowing the DB to go onto it’s next operation faster. However, if your storage is SSDs anyway, a slog won’t gain you fsync speed and won’t give you a significant performance boost. Only time you’ll gain much perf with SSDs is if you have sata SSDs with a nvme slog drive, but it’s still going to be much less perf boost than what a slog+spinning disks gives you.

jay_tuckey · October 3, 2024, 1:41am

Regarding endurance, it entirely depends on what the VM’s are going to be doing. If you are running a file-server to mostly read big video files, you’ll need hardly any write endurance. If you are running a database VM it will need a more, but depends on how many writes go to the DB.

As some pointers, I run a proxmox box that runs jellyfin and also nextcloud off of a mid-range consumer SSD without any issues with write endurance. The nextcloud instance is the most write-heavy, as it’s database gets lots of little writes for file changes and modification dates etc.

I run a second proxmox host on a little fanless PC, using a no-name chinese mSata SSD, and that machine hosts an OPNsense router and a Home Assistant VM, as well as a few web server containers. The OPNsense router uses a ramdisk for logging, so generates very few writes, but the Home Assistant VM is constantly doing small writes as it records things like temperature/humidity measurements to it’s sqlite database:

This box has been running for a couple years without issues, and reports 145000 GB written to the SSD. It says it has 22 on the SSD_Life_Left SMART attribute, but not sure whether that means it’s 22% used or 22% left, I’m guessing 22% used.

Anyway, sorry about the ramble. It depends how much you expect to write to the disk, but if you are not expecting excessive writes I would say a consumer SSD is fine. Main benefit of enterprise gear is you get super consistent latency, but for my use cases this hasn’t been important.

SinisterPisces · October 3, 2024, 4:08pm

@jay_tuckey , thanks for your lengthy comments. This is super-helpful stuff. I think this is another one of those cases where a lot of the documentation and online discussion that gets surfaced by google is old enough that it assumes SSDs-as-primary-storage are not a viable option, either because of price or because the endurance isn’t there to survive the intended use. That’s not really the case with modern NVMe SSDs. I think that’s where I started to get twisted up thinking I needed a slog with NVMe.

It sounds like, in general, sync writes aren’t going to impose a meaningful penalty for home server use cases when running on an NVMe pool, unless you’re doing something that needs every bit of performance you can get. Is that an accurate statement? If so, I really wish that was emphasized more in the most popular ZFS docs. (Then again, that’s more of a problem of Google tending to surface stuff from 2013 when you search for ZFS anything.)

Is it correct to think of the benefit of NVMe/U.2/U.3 storage (and to a lesser extent SATA/SAS SSDs) less about raw speed (though that’s important) and more about IOPS/random access/lower seek time?

I really appreciate the examples of the operations the slog is meant to speed up on spinning rust (that don’t really need to be sped up on NVMe), and your experience with write endurance in your own setup. You’re doing a lot of things I’d like to be doing eventually, so I’m a lot more optimistic that I’m actually going to end up with a working setup now and didn’t buy the wrong components (again).

Since I’m wanting to use the NVMe mirror pool for VM/LXC storage and database storage (and perhaps a few other things that prove to perform poorly on spinning rust, if that’s an issue), from what you wrote I should be just fine with an NVMe mirror, even if it’s constrained to PCIe 3.0x2 speeds.

(I’m going to attempt to run a Minecraft server off my 4x HDD mirror (8 disks) just to see what happens. I can always move the storage to the NVME later.)

This box has been running for a couple years without issues, and reports 145000 GB written to the SSD. It says it has 22 on the SSD_Life_Left SMART attribute, but not sure whether that means it’s 22% used or 22% left, I’m guessing 22% used.

Once i started messing with NVMe in Linux, I had to actually start learning how NVMe worked. One of the most aggravating things about that whole process was realizing how non-standard the SMART data can be across brands. HDDs aren’t horrible, but can still be odd–though I think that’s a function of not needing to deep dive into the attributes that much. The data I need to figure out if there’s something wrong with an HDD is pretty standardized at this point.

But endurance is really important for solid-state storage, so it’s not great that every manufacturer seems to choose to express it differently. I usually end up spending a lot of time hunting through documentation to try to figure out what their attributes mean. But I’m always happy to at least end up with drives that expose endurance data at all. I’ve ran into a few that smartctl couldn’t pull anything meaningful off of and that I had to boot into Windows to use the manufacturer’s utility to check. Those were infuriating.

Maybe you could try that with the one you’re not sure about? That 22 percent number would have me wary.

Main benefit of enterprise gear is you get super consistent latency, but for my use cases this hasn’t been important.

Thanks for pointing that out. That’s not worth the price premium, IMHO. I’m pretty confident I can get good deals on SATA/SAS enterprise SSDs ~2 TB or less at this point, but used enterprise NVMe are not there yet, and the bigger capacity stuff on the used market is all u.2 and PCIe card form factor.

jay_tuckey · October 4, 2024, 12:15am

Yep, I find the same thing, but now all-SSD storage is just starting to become viable in cost.

In my experience all SSD storage setups have been plenty fast enough for sync writes for a home use-case. Even old SATA SSD’s have been plenty fast enough.

Yep, correct. Raw speed is also much higher on the pcie-based SSD’s, but if you are going over a network you will hit the network’s throughput limit before maxing out the SSD’s performance. However, you will get the benefits of the higher IOPS and faster random access even over the network.

Yeah, this can be super annoying, where every brand has different attributes, and even within a brand sometimes the attributes mean different things. It makes it very difficult to know if you are reading an attribute correctly.

This attribute is on my Chinese no-name SSD, so I can’t find any tooling to tell me more info. I have regular backups of the data going off to another machine, though, so I’m not too concerned about it. If it fails it won’t take long to rebuild that machine.

Yeah, I could see the latency guarantees being important if you were, say, running a 100-user Nextcloud server for an office business. With 100 users you’ll be getting lots of continuous small writes to the database and storage, as users update documents and the like. At the same time, you want consistent latency so when a user is browsing the web interface of Nextcloud it’s snappy and responsive, and doesn’t randomly hang occasionally.

adaptive_chance · October 4, 2024, 3:04am

I agree with the other commenter; I don’t think I’d bother putting a SLOG drive on a flash pool for general purpose use. The exception being something latency-sensitive enough to benefit from shaving off microseconds in which case I’d use a small Optane drive in this role.

Personally I’m using a SLOG with spinners in my general purpose system – invariably people will say ‘you don’t need SLOG unless you have something that does sync I/O.’
My Brothers in Christ, Windows will push syncs into storage during everyday operations working with files and whatnot. Just the other day I’ve discovered the built-in ‘Windows 7 legacy’ backup app fires syncs at storage like a PEZ candy dispenser. I know this because I’m a huge nerd running continuous iostat – when writes go to my SLOG I see it…

Edit: forgot to mention the LACP thing and 10Gb networking. AFAIK the best options for leveraging multiple paths to storage are either multichannel SMB (something I have no experience with) or multipathing at the iSCSI level. I was dismayed to recently discover the Windows client iSCSI initiator doesn’t support multipathing nor multiple connected sessions (MCS). The buttons are there but they don’t work. MS was too lazy to rewrite the UI.

Apparently there’s a way to hack a couple of Windows Server DLLs into a Windows client to restore iSCSI multipathing but I’ve yet to find directions for it.