I’m having a weird problem regarding pool performance. I decided one day to add another disk to my existing 2-way mirror and expected zvol read performance to get better. I also removed L2ARC and SLOG (both m.2 nvme) for testing purposes only to see that none of these changes made any difference to zvol performance (read and writes were unaffected). I’m using this zvol over ISCSI and I use targetcli to manage the settings on the iscsi side. I tested to set sync=disabled and sync=always for the tested zvol only but I observed absolutely no difference in write performance over iscsi to that zvol. I tested creating a ramdisk backstore through targetcli and using that I get full 10g network bandwidth on writes and reads so the problem is not on the network side. Any ideas where to look next?
What’s the volblocksize on that zvol, and what workload are you testing it with?
If you’ve got a small volblocksize (which most do), you’re probably hitting IOPS limits. And if you’re testing with a single-process read, you’re probably not able to get much in the way of prefetch to take advantage of additional vdevs (you would see the performance increase with higher parallelism in the storage workload).
Yes, I was using a single ZVOL used through iSCSI for these tests. I did notice that most of this problem was related to the volblocksize of the zvol, whick I subsequently increased to 32k and formatted the zvol with NTFS using allocation unit size of 32k respectively. After these changes the performace almost doubled and I started to experience benefits from the slog.
P.S. Can you tell if it is possible to disable iSCSI LIO target write through cache on the zvol backstore such that the system would use just the zfs arc? I’m using targetcli for the configuration (as probably most do).
Sorry; I don’t have extensive iSCSI experience, so I can’t help way down in the weeds on that side of things.
Increasing volblocksize to 32K was a good move, and would have been even without the NTFS tuning. How are you testing performance after your changes?
I’m running read and write tests using crystal disk mark on Windows box (physical box with 10g nic). The current setup is meant to serve as generic disk server and running light virtualization workloads including ubiquiti device controller as container, PfSense firewall etc. Virtualization workloads are running from separate 2tb 2-way nvme mirror pool. Cpu is 32 core epyc 7551P @ 2GHz. It currently has 128gb ram of which 110gb I have allocated for the zfs arc. Arc hit rates show consistent high 80% and now I’m relatively happy with the performance (at least for single host scenario). I’m possibly moving the iSCSI storage workload to a dedicated storage server soon(ish) with a cpu with better single core performance and possibly 256gb ram and intel P3700 slog. For the storage itself I’m planning of running possibly 2 3 way 8tb sas disks. I’m also using Mellanox 40gb nic on the server side currently and in the future as it shows excellent performance and compatibility. The actual use for this storage includes running bigger vm disks and some virtual disks for physical windows hosts for mixed use.
So, IIRC CrystalDiskMark does a lot of single-process tests. You’re not going to see those improve much, if any, from adding or expanding vdevs. You see big improvements from topology changes with multi-process workloads, for the most part.
If you pump iodepth
up high enough, you’ll start to see improvement on single-process workloads as well–but it comes at a cost; high iodepth increases latency as well as throughput.
You don’t generally get to control the iodepth
of your real workloads, mind you; that’s generally something set by the developers of the software you’re using and not often something you can monkey with easily yourself. It’s easy enough to manipulate for your test workloads if you’re using a proper storage benchmark like fio
, though, in which case you’d generally want to set iodepth to a figure that roughly the way your normal workload behaves.
Recently I’ve been testing zvol
performance on a zfs
SSD raid0
stripe.
- I compared performance of
qcow2
/zvol
&lvm
+ the variousvolblocksize
options on thezvols
:16K
/32K
/64K
/128K
- I also tested
virtio-blk
versusvirtio-scsi
- The
zvol
performance is so good I’m going to stop passing through my 2ndnvme
to Windows & instead put theos
on azvol
on thenvme
- the configuration below showed basically bare metal
read
performance in the builtinwinsat
tool :
> Disk Random 16.0 Read 1155.78 MB/s 8.9
> Disk Sequential 64.0 Read 8193.54 MB/s 9.9
> Disk Sequential 64.0 Write 5185.20 MB/s 9.7
I am more interested in speed than redundancy. The 2 x SSDs are a striped pool for a SSD Steam Library but are also partitioned to provide special
devices for my main mirrored pool
on spinning sata
+ new sata
stripe for Windows / Steam
- I created the striped (
raid0
) SSDpool
with:
zpool create -f -o ashift=12 -m /mnt/ssd1 ssd1 \
ata-INTEL_SSDSC2KG019T8_PHYG9170009C1P9DGN-part4 \
ata-SAMSUNG_MZ7KH1T9HAJR-00005_S47PNE0M508088-part4
- I created a
-s
(sparse)zvol
with a64K
volblocksize:
zfs create -o volblocksize=64K -s -V 3.16TB ssd1/windows
On the zvol
I created:
16MB
partition typeMicrosoft reserved
(type10
infdisk
)- balance of the space as partition type
Microsoft basic data
(type11
infdisk
) - inside my Windows vm in Disk Management I formatted the
zvol
(which I think uses a64K
cluster size by default) - tested performance with the built in Windows tool
winsat
First round of testing:
- SSD =>
qcow2
/zvol
/lvm
- Existing
SCSI
whole device passthrough of a Toshiba Enterprise SATA as a comparision (which is in the process of being migrated to azvol
on a SATA stripe with a stripedspecial
device on the SSD (I have approx9.9 petabytes
of writes left on the SSDs)
2 x 1.92TB Enterprise SSD tests
===============================
QCOW2:
========================
Z:\>winsat disk -drive Q
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive Q -ran -read'
> Run Time 00:00:00.41
> Running: Storage Assessment '-drive Q -seq -read'
> Run Time 00:00:01.16
> Running: Storage Assessment '-drive Q -seq -write'
> Run Time 00:00:00.73
> Running: Storage Assessment '-drive Q -flush -seq'
> Run Time 00:00:00.45
> Running: Storage Assessment '-drive Q -flush -ran'
> Run Time 00:00:00.42
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 59.18 MB/s 6.7
> Disk Sequential 64.0 Read 3156.85 MB/s 9.3
> Disk Sequential 64.0 Write 2859.21 MB/s 9.2
> Average Read Time with Sequential Writes 0.097 ms 8.8
> Latency: 95th Percentile 0.172 ms 8.9
> Latency: Maximum 2.063 ms 8.8
> Average Read Time with Random Writes 0.099 ms 8.9
> Total Run Time 00:00:03.33
-------------------------------------------------------------------------------------------
LVM STRIPE
========================
Z:\>winsat disk -drive R
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive R -ran -read'
> Run Time 00:00:00.13
> Running: Storage Assessment '-drive R -seq -read'
> Run Time 00:00:01.47
> Running: Storage Assessment '-drive R -seq -write'
> Run Time 00:00:01.33
> Running: Storage Assessment '-drive R -flush -seq'
> Run Time 00:00:00.66
> Running: Storage Assessment '-drive R -flush -ran'
> Run Time 00:00:00.66
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 433.61 MB/s 8.2
> Disk Sequential 64.0 Read 757.50 MB/s 8.3
> Disk Sequential 64.0 Write 672.60 MB/s 8.2
> Average Read Time with Sequential Writes 0.222 ms 8.6
> Latency: 95th Percentile 0.416 ms 8.7
> Latency: Maximum 3.967 ms 8.6
> Average Read Time with Random Writes 0.242 ms 8.8
> Total Run Time 00:00:04.34
ZVOL:
=========================
Z:\>winsat disk -drive V
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive V -ran -read'
> Run Time 00:00:00.11
> Running: Storage Assessment '-drive V -seq -read'
> Run Time 00:00:01.19
> Running: Storage Assessment '-drive V -seq -write'
> Run Time 00:00:00.75
> Running: Storage Assessment '-drive V -flush -seq'
> Run Time 00:00:00.70
> Running: Storage Assessment '-drive V -flush -ran'
> Run Time 00:00:00.47
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 844.00 MB/s 8.6
> Disk Sequential 64.0 Read 2533.42 MB/s 9.1
> Disk Sequential 64.0 Write 2757.42 MB/s 9.2
> Average Read Time with Sequential Writes 0.097 ms 8.8
> Latency: 95th Percentile 0.133 ms 8.9
> Latency: Maximum 0.421 ms 8.9
> Average Read Time with Random Writes 0.097 ms 8.9
> Total Run Time 00:00:03.31
SATA HARD DRIVE PASSTHROUGH
===========================
Z:\>winsat disk -drive E
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive E -ran -read'
> Run Time 00:00:07.44
> Running: Storage Assessment '-drive E -seq -read'
> Run Time 00:00:03.25
> Running: Storage Assessment '-drive E -seq -write'
> Run Time 00:00:03.92
> Running: Storage Assessment '-drive E -flush -seq'
> Run Time 00:00:06.39
> Running: Storage Assessment '-drive E -flush -ran'
> Run Time 00:00:07.41
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 2.13 MB/s 4.3
> Disk Sequential 64.0 Read 136.18 MB/s 7.0
> Disk Sequential 64.0 Write 155.47 MB/s 7.1
> Average Read Time with Sequential Writes 6.563 ms 5.5
> Latency: 95th Percentile 19.058 ms 4.7
> Latency: Maximum 90.300 ms 7.7
> Average Read Time with Random Writes 7.283 ms 5.2
> Total Run Time 00:00:28.52
With zvol
the clear performance winner I then experimented with various blocksizes (16K
/ 32K
/ 64K
/ 128K
):
- TLDR:
64K
block sizes won as I let Windows format the drive with the default NTFS cluster size of64K
16K BLOCKSIZE
=============
PS C:\WINDOWS\system32> winsat disk -drive G
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive G -ran -read'
> Run Time 00:00:00.30
> Running: Storage Assessment '-drive G -seq -read'
> Run Time 00:00:08.41
> Running: Storage Assessment '-drive G -seq -write'
> Run Time 00:00:33.24
> Running: Storage Assessment '-drive G -flush -seq'
> Run Time 00:00:00.72
> Running: Storage Assessment '-drive G -flush -ran'
> Run Time 00:00:00.45
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 870.43 MB/s 8.7
> Disk Sequential 64.0 Read 4390.17 MB/s 9.5
> Disk Sequential 64.0 Write 2799.89 MB/s 9.2
> Average Read Time with Sequential Writes 0.084 ms 8.8
> Latency: 95th Percentile 0.123 ms 8.9
> Latency: Maximum 0.561 ms 8.9
> Average Read Time with Random Writes 0.090 ms 8.9
> Total Run Time 00:00:43.22
32K BLOCKSIZE
=============
PS C:\WINDOWS\system32> winsat disk -drive H
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive H -ran -read'
> Run Time 00:00:00.11
> Running: Storage Assessment '-drive H -seq -read'
> Run Time 00:00:01.16
> Running: Storage Assessment '-drive H -seq -write'
> Run Time 00:00:00.76
> Running: Storage Assessment '-drive H -flush -seq'
> Run Time 00:00:00.42
> Running: Storage Assessment '-drive H -flush -ran'
> Run Time 00:00:00.41
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 917.66 MB/s 8.7
> Disk Sequential 64.0 Read 4330.72 MB/s 9.5
> Disk Sequential 64.0 Write 2933.63 MB/s 9.2
> Average Read Time with Sequential Writes 0.084 ms 8.8
> Latency: 95th Percentile 0.144 ms 8.9
> Latency: Maximum 0.278 ms 8.9
> Average Read Time with Random Writes 0.087 ms 8.9
> Total Run Time 00:00:02.97
64K BLOCKSIZE
=============
PS C:\WINDOWS\system32> winsat disk -drive I
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive I -ran -read'
> Run Time 00:00:00.11
> Running: Storage Assessment '-drive I -seq -read'
> Run Time 00:00:01.14
> Running: Storage Assessment '-drive I -seq -write'
> Run Time 00:00:00.75
> Running: Storage Assessment '-drive I -flush -seq'
> Run Time 00:00:00.45
> Running: Storage Assessment '-drive I -flush -ran'
> Run Time 00:00:00.42
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 918.52 MB/s 8.7
> Disk Sequential 64.0 Read 4442.18 MB/s 9.5
> Disk Sequential 64.0 Write 2990.89 MB/s 9.2
> Average Read Time with Sequential Writes 0.086 ms 8.8
> Latency: 95th Percentile 0.115 ms 8.9
> Latency: Maximum 0.375 ms 8.9
> Average Read Time with Random Writes 0.081 ms 8.9
> Total Run Time 00:00:03.00
128K BLOCKSIZE
==============
PS C:\WINDOWS\system32> winsat disk -drive J
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive J -ran -read'
> Run Time 00:00:00.11
> Running: Storage Assessment '-drive J -seq -read'
> Run Time 00:00:01.22
> Running: Storage Assessment '-drive J -seq -write'
> Run Time 00:00:00.74
> Running: Storage Assessment '-drive J -flush -seq'
> Run Time 00:00:00.76
> Running: Storage Assessment '-drive J -flush -ran'
> Run Time 00:00:00.47
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 873.73 MB/s 8.7
> Disk Sequential 64.0 Read 2704.19 MB/s 9.2
> Disk Sequential 64.0 Write 2791.04 MB/s 9.2
> Average Read Time with Sequential Writes 0.099 ms 8.8
> Latency: 95th Percentile 0.137 ms 8.9
> Latency: Maximum 0.605 ms 8.9
> Average Read Time with Random Writes 0.096 ms 8.9
> Total Run Time 00:00:03.41
As a final test:
- compare
zvol
performance onvirto-blk
versusvirtio-scsi
on64K volblocksize
(NB: in my case the underlying NTFS filesystem is using the default64K
cluster size) - TLDR:
virto-blk
is still faster by about9-10%
:
VIRTIO BLK
==========
PS C:\WINDOWS\system32> winsat disk -drive G
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive G -ran -read'
> Run Time 00:00:00.26
> Running: Storage Assessment '-drive G -seq -read'
> Run Time 00:00:01.11
> Running: Storage Assessment '-drive G -seq -write'
> Run Time 00:00:00.66
> Running: Storage Assessment '-drive G -flush -seq'
> Run Time 00:00:00.28
> Running: Storage Assessment '-drive G -flush -ran'
> Run Time 00:00:00.28
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 1155.78 MB/s 8.9
> Disk Sequential 64.0 Read 8193.54 MB/s 9.9
> Disk Sequential 64.0 Write 5185.20 MB/s 9.7
> Average Read Time with Sequential Writes 0.048 ms 8.9
> Latency: 95th Percentile 0.116 ms 8.9
> Latency: Maximum 0.324 ms 8.9
> Average Read Time with Random Writes 0.046 ms 8.9
> Total Run Time 00:00:02.75
VIRTIO SCSI
===========
PS C:\WINDOWS\system32> winsat disk -drive H
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-drive H -ran -read'
> Run Time 00:00:00.11
> Running: Storage Assessment '-drive H -seq -read'
> Run Time 00:00:01.11
> Running: Storage Assessment '-drive H -seq -write'
> Run Time 00:00:00.74
> Running: Storage Assessment '-drive H -flush -seq'
> Run Time 00:00:00.41
> Running: Storage Assessment '-drive H -flush -ran'
> Run Time 00:00:00.39
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 1048.02 MB/s 8.9
> Disk Sequential 64.0 Read 4305.40 MB/s 9.5
> Disk Sequential 64.0 Write 2939.35 MB/s 9.2
> Average Read Time with Sequential Writes 0.078 ms 8.8
> Latency: 95th Percentile 0.132 ms 8.9
> Latency: Maximum 1.088 ms 8.9
> Average Read Time with Random Writes 0.083 ms 8.9
> Total Run Time 00:00:02.86
- Hopefully these results are useful to others, for details of the
special
devices see Level1 - NB:
special
devices cannot be removed from a RAIDZ pool - yet another reason to always use mirrored vdevs - During my research I saw an
SLOG
helps withiscsi
performance on the TrueNAS forum (have hit my link limit here) - I’m also going to test / move my
SLOG
fromssd
=>zram
(my system runs on a UPS)
I love this kind of stuff. It’s exactly what I would be doing if only I had more motivation.
Unfortunately I could follow only about 10% because I can’t deduce your setup.
What’s running on the bare metal?
Windows is a VM on this metal?
Windows had a nvme device passed thru but now it connects to storage via ??? A passed-thru KVM block device? This block device is a zvol?
LVM is sitting underneath your ZFS? You tried a QCOW2 on this ZFS filesystem with LVM underneath it?
How is the nvme device wired-up now?
The longer I look at your post the more I don’t understand.
For about 4 years I’ve been running a Windows Gaming vm under Arch Linux with a pci passed through nvme + 3.5tb of a 4tb Toshiba Enterprise SATA whole disk passed through via virtio-scsi
with 6
or now 8
queues
configured. On the Hitman 3 Dubai benchmark I would see 200fps
with a RTX 2070 super & no stutters (since I upgraded from a Ryzen 5900x => Ryzen 5950x)
I’m running out of room on the 3.5tb SATA so I bought 2 x 1.92tb SSD (intel d3 s4610 / samsung sm883) + another 4tb Toshiba sata - the plan was a 3tb SSD stripe + 7tb SATA stripe (with 1tb SATA left for Linux) - & using 2 x 100g partitions from each SSD as a special
device (not configured yet) - for my main SATA mirror & the new SATA stripe
I partitioned the SSDs to separately try:
-
plain
lvm
as a partition -
qcow2
on azfs
dataset -
plain
zvol
(which seemed to be the fastest using the builtinwinsat
benchmark tool) -
the partitioning for testing:
Partitioning (LVM + ZFS testing):
---------------------------------
SSD = 1920 GB (type 148 = zfs / type 44 = LVM)
----------------------------------------------------
part1 = 20 GB => ZFS SLOG (Raid0)
part2 = 100 GB => ZFS SPECIAL Device ( 3 way mirror)
----------------------------------------------------
part3 = 270 GB => LVM cache (data & metadata Raid0)
part4 = 699 GB => LVM STRIPE (Raid0)
part5 = 699 GB => ZFS STRIPE (Raid0) / ZVOL & dataset
----------------------------------------------------
Last night I started copying my Steam library from SATA => SSD & saw terrible speeds - I suspect from the zvol
being sparse
? The Hitman 3 benchmark worsened from 200fps => 100fps with stutters.
So I did some more testing below (TLDR: don’t create sparse zvol
with -s
if performance is important)
After a bit more testing I’ve got acceptable performance again (200
fps in Hitman 3)
Steam downloads
virtio-blk
=60mb
avg /100mb
maxvirtio-scsi
(with 8iothread
queues) = the same asvirtio-blk
except also after a while125mb
avg /250mb
max WRITE (from queues ?) /430mb
READ (when Hitman 3 was finishing installing)
ZVOL configuration
- When Windows creates a NTFS filesystem it does not align the disk sectors in the same way Linux does (
2048
by default):
fdisk /dev/zvol/ssd1/windows
Disk /dev/zvol/ssd1/windows: 2.98 TiB, 3276544671744 bytes, 6399501312 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 65536 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: gpt
Disk identifier: 95A45633-38E3-4236-A04D-41EE98435A40
Device Start End Sectors Size Type
/dev/zvol/ssd1/windows1 34 32767 32734 16M Microsoft reserved
/dev/zvol/ssd1/windows2 32768 6399498239 6399465472 3T Microsoft basic data
Partition 1 does not start on physical sector boundary.
- I created the
zol
as a normal device (notsparse
): zfs create -o volblocksize=64K -V 2.98TB ssd1/windows
(the default option nowadays inzfs
iscompression=on
which gives youlz4
compresson)- I partitioned the
zvol
manually with a singleMicrosoft basic data
partition (i.e without with aMicrosoft reserved
partition of16mb
- which I don’t need as I’m not creating dynamic volumes in Windows) :
$ fdisk /dev/zvol/ssd1/windows
Disk /dev/zvol/ssd1/windows: 2.98 TiB, 3276544671744 bytes, 6399501312 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 65536 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: gpt
Disk identifier: 95A45633-38E3-4236-A04D-41EE98435A40
Device Start End Sectors Size Type
/dev/zvol/ssd1/windows1 2048 6399500287 6399498240 3T Microsoft basic data
virtio-scsi configuration
- attached the
zvol
tovirtio-scsi
(add a Controller VirtIO SCSI tovirt-manager
) - add the disk as type
scsi
- set the passed through
zvol
as an SSD (rotation_rate='1'
) :
<target dev='sda' bus='scsi' rotation_rate='1'/>
- configure
queues
on thevirtio-scsi
<controller type="scsi" index="0" model="virtio-scsi">
<driver queues="8" iothread="1"/>
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</controller>
iothread
configuration
- my CPU configuration (as it relates to the
iothread
)
<vcpu placement="static">16</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="8"/>
<vcpupin vcpu="1" cpuset="24"/>
<vcpupin vcpu="2" cpuset="9"/>
<vcpupin vcpu="3" cpuset="25"/>
<vcpupin vcpu="4" cpuset="10"/>
<vcpupin vcpu="5" cpuset="26"/>
<vcpupin vcpu="6" cpuset="11"/>
<vcpupin vcpu="7" cpuset="27"/>
<vcpupin vcpu="8" cpuset="12"/>
<vcpupin vcpu="9" cpuset="28"/>
<vcpupin vcpu="10" cpuset="13"/>
<vcpupin vcpu="11" cpuset="29"/>
<vcpupin vcpu="12" cpuset="14"/>
<vcpupin vcpu="13" cpuset="30"/>
<vcpupin vcpu="14" cpuset="15"/>
<vcpupin vcpu="15" cpuset="31"/>
<emulatorpin cpuset="0-3"/>
<iothreadpin iothread="1" cpuset="4,20"/>
<vcpusched vcpus="0" scheduler="rr" priority="1"/>
<vcpusched vcpus="1" scheduler="rr" priority="1"/>
<vcpusched vcpus="2" scheduler="rr" priority="1"/>
<vcpusched vcpus="3" scheduler="rr" priority="1"/>
<vcpusched vcpus="4" scheduler="rr" priority="1"/>
<vcpusched vcpus="5" scheduler="rr" priority="1"/>
<vcpusched vcpus="6" scheduler="rr" priority="1"/>
<vcpusched vcpus="7" scheduler="rr" priority="1"/>
<vcpusched vcpus="8" scheduler="rr" priority="1"/>
<vcpusched vcpus="9" scheduler="rr" priority="1"/>
<vcpusched vcpus="10" scheduler="rr" priority="1"/>
<vcpusched vcpus="11" scheduler="rr" priority="1"/>
<vcpusched vcpus="12" scheduler="rr" priority="1"/>
<vcpusched vcpus="13" scheduler="rr" priority="1"/>
<vcpusched vcpus="14" scheduler="rr" priority="1"/>
<vcpusched vcpus="15" scheduler="rr" priority="1"/>
<iothreadsched iothreads="1" scheduler="fifo" priority="98"/>
</cputune>
- I experimented with a
zram
device of5gb
as a sharedSLOG
(I run on a UPS) - pools don’t import automatically after a reboot - so am creating a SSD stripe for theSLOG
(to be used for my main SATA pool & the new SATA stripe) - performance is now good enough & I’ve had no problems moving everything to the new striped SSD pool:
VIRTIO-SCSI 8 queues 64K volblocksize
=====================================
PS C:\WINDOWS\system32> winsat disk -drive G
Windows System Assessment Tool
....
> Disk Random 16.0 Read 1187.85 MB/s 9.0
> Disk Sequential 64.0 Read 8900.57 MB/s 9.9
> Disk Sequential 64.0 Write 4473.51 MB/s 9.5
> Average Read Time with Sequential Writes 0.048 ms 8.9
> Latency: 95th Percentile 0.061 ms 8.9
> Latency: Maximum 12.453 ms 7.9
> Average Read Time with Random Writes 0.042 ms 8.9
> Total Run Time 00:00:02.69
-
TODO:
- Move the Windows vm from
nvme
passthrough =>zvol
. Thezvol
performance is good enough / I don’t login to bare metal Windows - SLOG /
special
devices onnvme
then become possible for the new SSD pool - Running Linux as
btrfs
raid1 / Windows vm as mirroredzvol
onnvme
also becomes possible
- Move the Windows vm from
In my tests I noticed if you let Windows create the NTFS partition the disk sectors are misaligned.
I also used volblocksize
of 64K
(to match the default NTFS cluster size) - but seem to be getting stable 97% of bare metal speeds.
In situations like this the only option is to remove layers & verify. If you can prove the zvol
is getting 97%
of bare metal speeds - you know the problem is in a higher layer.
- also interesting - nvme
512
versus4096
sectors onzfs
:
512b NVME block size: ~46k IOPS, ~1700MB/s bandwidth
4k NVME block size: ~75k IOPS, ~1800MB/s bandwidth
- setting
4k
sectors seems like a free5-10%
performance improvement & gives better wear leveling (NB: this is a destructive operation & can be changed withnvme-cli
from a live usb)
Years ago I had a dtrace script that would sniff out a high number of “torn” reads/writes across cluster boundaries. It made it easy to see what was truly happening across the various layers. I haven’t been able to find it again.
Also, when I was managing NetApps years ago I recall a command-line utility on the filers that would watch a volume for a few seconds then print a simple histogram of I/O sizes and whether or not they were aligned. I suspect something like that would be easy to implement for someone who knows dtrace well.