Mix of NVME LBA Sizes in a Single Mirror Pool (ashift=12)

SinisterPisces · March 1, 2025, 5:06pm

Hello,

I’ve got an HP Pro NVME and a Crucial NVME, both 4 TB. The reason I mixed brands is I was looking for great deals on eBay on new 4 TB NVME, which are not cheap. I did find fell-off-the-back-of-the-truck prices, so, excellent.

The Crucial supports 4K sector size (LBA), which can be selected via the nvme CLI tools or smartctl. The HP only has 512K as an option.

Normally, I select the largest/most performant LBA for an NVME before I put it in a ZFS pool with ashift=12, but in this case that’d mean mixing a 4K device with a 512K device. Is there a reason not to do this? My instinct is to do it, because I want both devices at their most performant standing alone before I squish them together into a vdev.

mercenary_sysadmin · March 1, 2025, 5:44pm

If you can put them both into 512B mode, that’s better than mixing them. If you can’t do that, then sure, mix them as is with ashift=12.

SinisterPisces · March 1, 2025, 6:34pm

Thanks! They both support 512 bite mode, so I’ll go with that.

Is using 512 instead of 4096 bytes likely to be noticeable in the real world at all?

mgerdts · March 1, 2025, 8:28pm

By using ashift=12 you should never be reading or writing smaller than 4k. It won’t matter if one is presented as 512n and the other as 4kn.

If you care about space efficiency, particularly with compression or raidz, ashift=9 with both drives formatted for 512n will be better. You are unlikely to see a performance difference between 512n and 4kn.

If you do a lot of small random writes (on the order of .2 drive writes per day on a client drive) you may want to stick to 4k to minimize write amplification.

SinisterPisces · March 1, 2025, 9:15pm

These are 4TB NVME, so 0.2 DWPD would be … 800 GB per day? I’ll keep an eye on it, but I’m not close to that interesting yet. I’m going to put a database server on it, but my use case is self-hosted things. I think I should be fine setting them both to 512K.

I’m putting them in a mirror and was going to use the default compression settings. Mostly because I don’t really know enough to mess with the default compression settings. Should I use ashift=9 in that case?

Honestly, part of my learning strategy at this point is to change as few defaults as possible so I can see why things go wrong. I fall in the over-optimization-before-use trap too easily.

My use cases involve iSCSI disk storage for things like a Windows 11 game drive, SQL database storage (web server, web apps, Minecraft server, etc.), TrueNAS’ system dataset, etc.

For large multi-GB/TB downloads and mass storage, I was going to download to an HDD pool, and probably store there as well. But for DB and iSCSI and anything else where HDD I/O latency would be noticeable, I was going to use the NVME.

I would’ve gotten a pair of 2 TB NVME, but I found these on clearance sale at the price of 2 TB disks, and figured more space meant higher endurance.

mercenary_sysadmin · March 2, 2025, 3:41am

In the real world, most NVMe drives that can support 512B sectors perform best with ashift=9.

This surprised me at first, but it’s been pretty much universal with every NVMe M.2 drive I’ve had on the bench, to the point that I don’t really question it much anymore.

If I had to guess, I’d guess that 512B works better on NVMe drives than on SATA drives, because NVMe supports orders of magnitude more command queues than SATA does and more sectors gives the onboard controller on the drive more chances to optimize for parallelism on the bare media.

But idk the why for sure, all I know for sure is the result.

SinisterPisces · March 2, 2025, 6:05am

That is good to know and … honestly sort of aggravating, from the POV of trying to teach myself ZFS. Inexplicable results aren’t fun.

So, since both of these NVME in 512B mode according to smartctl, I should leave them as-is and figure out how to build an ashift=9 pool in TrueNAS? (It’s supposed to auto-detect…)

I’ve internalized at this point from Ye Olde ZFS Documentation Scrolls that SSDs should use ashift=12 and 4K LBAs when possible, because otherwise I’d risk damaging/shortening the life of my SSDs due to write application (?). All the guides I’ve read are very uncompromising in recommending ashift=12.

It seems like that doesn’t apply with NVME?

It’s late and I hope this doesn’t sound like I’m trying to argue; I’m definitely not. But now I’m very confused about the whole ashift=12 recommendation that gets repeated everywhere.

Proxmox, for instance, defaults to ashift=12 when installing the OS on a ZFS pool on NVME and SATA/SAS SSDs. It doesn’t even ask. (You can override it.)

(Aside: My main storage array is a 4-way mirror (8x14TB spinning disks). All the disks are in 4K mode, and the pool reports as ashift=12, as expected. From what you said above, I think this is still the correct thing to be doing for 4K LBA HDDs?)

mercenary_sysadmin · March 2, 2025, 5:46pm

It doesn’t, but don’t get too worked up about this in either direction. Ashift=12 still works just fine on NVMe M.2 SSDs–it just isn’t necessarily the absolutely best performing configuration.

If you’re tuning a pool to the absolute maximum to get the most consistent low latency possible for a heavily used production database, the difference might matter. For a generic server, it really won’t.

SinisterPisces · March 3, 2025, 2:39am

Thanks for the explanation, and for lasso’ing me away from the over-optimization pit again.

Right now, both NVME are in 512B mode since they both support that, so I’ll just set them up as a mirror VDEV in a single mirror pool and let TrueNAS do what it wants with ashift.

I was much more worried about prematurely killing my drives than anything else, and from what you said that’s not a concern at all.

I’m actually not worried about performance tuning at all on this pool. These are PCIe 4.0x4 NVME, but only one slot is 4.0x4. The other one is like … 3.0x2. So, I’ll be getting throttled by the hardware well before ZFS pool optimizations could make a huge difference.

From what you’ve said, I should be fine as-is. I’ll be storing a database mostly for self-hosted web apps, running Minecraft server, and some other stuff, but I’m still a home/small office user. Min-maxing performance isn’t something I’m worried about yet.

SinisterPisces · March 10, 2025, 4:47pm

Update: After all that, I figured I should mention what I actually ended up doing.

I created the NVME pool via the TrueNAS web GUI, which automates (and doesn’t let you customize) the details of the ZFS pool creation.

# zpool get ashift
NAME         PROPERTY  VALUE   SOURCE
QuickDrawer  ashift    12      local
Tank         ashift    12      local
boot-pool    ashift    12      local

(Aside: I like the default color scheme on the pre-formatted text block. Orange on black was always my favorite terminal type. )

I’m not going to try to change it now, mostly because TrueNAS doesn’t want you running manual ZFS create commands and I don’t want to give the iX support forum any reasons not to help if something goes strange.

I ended up with about 3.51 TiB (3.85 TB in HDD marketing speak ) of usable space on the pool, which is more than enough. I need to do some testing, but I’m also pretty confident that the PCIe lane limitations will bottleneck the pool before the ashift=9 vs ashift=12 performance optimizations could have kicked in … not that I probably would have ever noticed anyway outside of benchmarks, from our previous discussion.

@mercenary_sysadmin Do you think it’s worth it to file a bug report/feature request on the fact that TrueNAS created the pool and set the ashift without querying the disks’ LBA size (which were both 512k)? That could be intended behavior on their part: uniform pool creation would, I suppose, be simpler to troubleshoot than optimized pool creation.

mercenary_sysadmin · March 11, 2025, 4:26pm

TrueNAS won’t use an ashift below 12. That’s a design decision on their part, not a bug.