Troubleshooting poor performance of zvols over iscsi

I’m having a weird problem regarding pool performance. I decided one day to add another disk to my existing 2-way mirror and expected zvol read performance to get better. I also removed L2ARC and SLOG (both m.2 nvme) for testing purposes only to see that none of these changes made any difference to zvol performance (read and writes were unaffected). I’m using this zvol over ISCSI and I use targetcli to manage the settings on the iscsi side. I tested to set sync=disabled and sync=always for the tested zvol only but I observed absolutely no difference in write performance over iscsi to that zvol. I tested creating a ramdisk backstore through targetcli and using that I get full 10g network bandwidth on writes and reads so the problem is not on the network side. Any ideas where to look next?

What’s the volblocksize on that zvol, and what workload are you testing it with?

If you’ve got a small volblocksize (which most do), you’re probably hitting IOPS limits. And if you’re testing with a single-process read, you’re probably not able to get much in the way of prefetch to take advantage of additional vdevs (you would see the performance increase with higher parallelism in the storage workload).

Yes, I was using a single ZVOL used through iSCSI for these tests. I did notice that most of this problem was related to the volblocksize of the zvol, whick I subsequently increased to 32k and formatted the zvol with NTFS using allocation unit size of 32k respectively. After these changes the performace almost doubled and I started to experience benefits from the slog.

P.S. Can you tell if it is possible to disable iSCSI LIO target write through cache on the zvol backstore such that the system would use just the zfs arc? I’m using targetcli for the configuration (as probably most do).

Sorry; I don’t have extensive iSCSI experience, so I can’t help way down in the weeds on that side of things.

Increasing volblocksize to 32K was a good move, and would have been even without the NTFS tuning. How are you testing performance after your changes?

I’m running read and write tests using crystal disk mark on Windows box (physical box with 10g nic). The current setup is meant to serve as generic disk server and running light virtualization workloads including ubiquiti device controller as container, PfSense firewall etc. Virtualization workloads are running from separate 2tb 2-way nvme mirror pool. Cpu is 32 core epyc 7551P @ 2GHz. It currently has 128gb ram of which 110gb I have allocated for the zfs arc. Arc hit rates show consistent high 80% and now I’m relatively happy with the performance (at least for single host scenario). I’m possibly moving the iSCSI storage workload to a dedicated storage server soon(ish) with a cpu with better single core performance and possibly 256gb ram and intel P3700 slog. For the storage itself I’m planning of running possibly 2 3 way 8tb sas disks. I’m also using Mellanox 40gb nic on the server side currently and in the future as it shows excellent performance and compatibility. The actual use for this storage includes running bigger vm disks and some virtual disks for physical windows hosts for mixed use.

So, IIRC CrystalDiskMark does a lot of single-process tests. You’re not going to see those improve much, if any, from adding or expanding vdevs. You see big improvements from topology changes with multi-process workloads, for the most part.

If you pump iodepth up high enough, you’ll start to see improvement on single-process workloads as well–but it comes at a cost; high iodepth increases latency as well as throughput.

You don’t generally get to control the iodepth of your real workloads, mind you; that’s generally something set by the developers of the software you’re using and not often something you can monkey with easily yourself. It’s easy enough to manipulate for your test workloads if you’re using a proper storage benchmark like fio, though, in which case you’d generally want to set iodepth to a figure that roughly the way your normal workload behaves.