Understanding Relationship Between zVol Block Size and iSCSI Logical Block

Hello,

Finally, after 10,000 years, I’m getting ready to actually try experimenting with zVols and iSCSI shares.

I’m a bit confused about how the zVol Block Size and iSCSI Extent Block Size, and how they fit together.

I’m using TrueNAS for this. The defaults I’m talking about are the TrueNAS defaults. I don’t know if bare bones ZFS on Linux would have different defaults.

Here’s what I think is true:

  1. zVol Block Size in the TrueNAS zVol Creation Widget: This is volblocksize in ZFS.
  2. Extent Block Size in the iSCSI Share Wizard: This is the LBA size/logical sector size that the iSCSI block device will present itself as when an initiator connects to it.

Is that correct?

If so, I don’t quite understand when/why I’d want the zVol and the extent to have different block sizes. Isn’t ZFS at its best when I/O fits the volblocksize (or, I suppose, recordsize when working with datasets) to minimize write amplification?

I feel pretty confident with 1-3 in the example below, but things start going fuzzy around 4.

Example: An iSCSI block device configured for use as a storage disk in Windows 11.

  1. Windows 11 uses NTFS.
  2. NTFS has a default sector size/LBA of 4K.
  3. So, the extent block size should be 4K to match NTFS’s expectations. In the TrueNAS iSCSI Block Device share wizard, this is the “modern OS” preset’s extent block size.
  4. On a system using mirrors, volblocksize/“Block Size” in the TrueNAS zVol creation window defaults to 16k. I think this is just the default ZFS volblocksize for mirrors, and isn’t anything special TrueNAS is doing?
  5. So … how do I reconcile those?
    5.1. Is there some reason a zVol intended to be storage for an iSCSI extent with block size 4k shouldn’t also have a 4k volblocksize?
    5.2. If there is, there is a hole in my understanding. What am I missing?

Thanks!

Ntfs default cluster size of 4K is for volumes smaller than 16TB, which may or may not apply.

But more importantly, 4KiB clusters eat a LOT of IOPS. You rarely want this in a VM, for the same reason you rarely (read: damn near never) want 4KiB clusters / blocks / volblocks for a VM using 4KiB native sectors: very few workloads actually fragment data that heavily, not even databases.

Also–and again, much like ext4–data under ntfs is stored primarily in extents, not just clusters. Extents are a range of contiguous clusters which are read or written in a single IOP. These tend to average closer to 64KiB.

Even Microsoft SQL server typically defaults to 64KiB extents: Pages and Extents Architecture Guide - SQL Server | Microsoft Learn

This means that you generally want to match your block size, or volblocksize, to roughly match the typical extent size, not cluster size. So, 64KiB.

This does mean that you’ll get a bit of read and write amplification on the occasional very small extent–or on EXTREMELY fragmented ntfs filesystems–which will in turn decrease performance in those cases, and which also tells you that you still shouldn’t run virtualized filesystems extremely full, even if the host storage has plenty of room–because if you do, the guest will be forced to allocste storage that normally would be in large extents in fragmented individual clusters!

You don’t always get the absolute best performance out of an exact match between guest level extent (or other IOP) size and host level blocksize. But that’s usually a very good starting point, and I typically wouldn’t even recommend bothering trying anything smaller than half the typical extent or IOP size, or larger than double.

Half the typical IOP size will prioritize latency at the expense of IOPS and throughput. Double the typical IOP size will prioritize throughput and IOPS efficiency at the expense of small operation latency. Pick your poison, and, hopefully… Employ a royal taster, before committing to a great big gulp in production. :cowboy_hat_face:

1 Like

(I’m almost certainly never going to have use for a zVol with a size >= 16 TiB …
I hope. But that’s good to know.)

Thanks for such a detailed reply. :slight_smile:

I didn’t really understand what an extent was before I started looking into TrueNAS’ iSCSI tools. I’m not thrilled that I once more stared at ZFS for too many seconds and it decided, in response, to reveal more complexity to me than I thought was there already, but a single extent having a 1:1 relation to a single IOP makes sense now that you’ve explained how it works. That’s actually simpler than some of the other stuff I’ve had to learn about low-level storage operations … it just wasn’t clearly documented anywhere at all.

This means that you generally want to match your block size, or volblocksize, to roughly match the typical extent size, not cluster size. So, 64KiB. …

You don’t always get the absolute best performance out of an exact match between guest level extent (or other IOP) size and host level blocksize. But that’s usually a very good starting point, and I typically wouldn’t even recommend bothering trying anything smaller than half the typical extent or IOP size, or larger than double.

Half the typical IOP size will prioritize latency at the expense of IOPS and throughput. Double the typical IOP size will prioritize throughput and IOPS efficiency at the expense of small operation latency.

What a convenient default setup to remember. :slight_smile:

I’m definitely not deep enough in this to want to attempt to min-max my I/O performance; good general purpose defaults are what I’m after. I’m less interested in best-speed-all-the-time and more interested in avoiding a miserably slow experience brought on by an off-the-wall bad configuration that I didn’t know better than to avoid.

This does mean that you’ll get a bit of read and write amplification on the occasional very small extent–or on EXTREMELY fragmented ntfs filesystems–which will in turn decrease performance in those cases […]

Is manually defragging NTFS (or ext4, I suppose) even a thing we can do anymore? On SSDs, and especially those with garbage collection/wear leveling/discard support, I seem to recall that we’re not supposed to try manually defragging drives here in the future.

If that’s true, I’m guessing the solution to an extremely fragmented NTFS filesystem is to rysnc it to a fresh block device, or something similar? Hopefully I won’t run into that.

Regarding TrueNAS in particular, it seems to once again be a bit more complicated than bare metal ZFS.
I’ve been looking at this in the context of an alpha release for a Proxmox plugin that connects to a TrueNAS server to provide ZFS over iSCSI, and opened an issue here re: specifying the extent block size along side the ZFS block size: Default ISCSI Extent Block Size when Creatiing a TrueNAS ZFS over iSCSI Storage? · Issue #2 · boomshankerx/proxmox-truenas · GitHub

On TrueNAS, we can set the ZFS block size for a zVol to whatever we want when the zVol is created. But per the issue above, when creating an iSCSI share, the TrueNAS GUI limits the available extent block sizes, and the largest is 4K.

I’m not sure yet if the TrueNAS API–which is what’s used to create the extent–allows specifying an extent block size that’s not presented in the GUI. I think it should, but ¯_(ツ)_/¯ . If it doesn’t, I’ll attempt a feature request/bug report.

Assuming in the worst case that we can’t use a bigger logical block size than 4K, would we still want to set the volblocksize to 64k, or something else?