I know this has been discussed previously here, but I am trying to compare the performance benefits of using raw files for VM disks in Proxmox vs ZVOLs which are the default when using a ZFS storage type. I am under the impression from Jim’s Klara article and recent 2.5 Admins episode that raw files should give better performance than ZVOLs. I am trying and failing to replicate those testing results.
Here are the details of the test I ran:
PVE 8.4.1
ZFS pool: 4x SSDs in 2 mirrored vdevs
ZVol Setup
Created dataset ssdtank/vm-testing with the relevant properties:
root@proxmox:~# zfs get recordsize,compression,xattr,atime ssdtank/vm-testing
NAME PROPERTY VALUE SOURCE
ssdtank/vm-testing recordsize 64K local
ssdtank/vm-testing compression on default
ssdtank/vm-testing xattr sa inherited from ssdtank
ssdtank/vm-testing atime off inherited from ssdtank
Created Proxmox storage:
Storage Type: ZFS
Block Size: 64k
Thin Provision: No
Raw File Setup
Created dataset ssdtank/vm-testing-dir with the exact same dataset properties:
root@proxmox:~ zfs get recordsize,compression,xattr,atime ssdtank/vm-testing-dir
NAME PROPERTY VALUE SOURCE
ssdtank/vm-testing-dir recordsize 64K local
ssdtank/vm-testing-dir compression on default
ssdtank/vm-testing-dir xattr sa inherited from ssdtank
ssdtank/vm-testing-dir atime off inherited from ssdtank
Created Proxmox storage:
Storage Type: dir
Preallocation: Off
VM Setup
I created 2 VMs with the exact same settings, but using the 2 different storage backends:
OS: Debian 13 (Trixie)
Qemu Agent: Yes
Disk Settings: all default, size: 32GB
CPU: 4x cores, type: host
Memory: 2G
All Debian installation defaults, all files in one partition
No desktop environment just “standard system utilities”
Benchmark
Since I’m optimizing for a basic “general” workload with these VMs, I used a mix of read/write and a block size of 64k
Well, I don’t know and you aren’t directly showing what proxmox is actually doing with the VM that you created “directory storage” for, so I don’t know.
I’ve been told that it’s not that easy to get proxmox to use raw files in the first place. I suspect you might actually be testing zvol vs zvol there.
Can you directly show me the VM storage back end? You should be able to get a literal directory listing of the dataset that supposed raw file backed VM lives in.
That does look correct. The performance advantage of raw vs zvol applies on both Linux and FreeBSD, but I don’t use virtio-scsi, I just use virtio. That might be the difference.
Where in the config file might it say “virtio”? Would you be able to post an example config file? That would be very helpful. I’m having a hard time finding info on QEMU storage drivers.
Whew, okay I did some more digging at here’s what I found.
virtio-scsi vs virtio-blk
In Proxmox, the default SCSI controller in the UI is “VirtIO SCSI (Single)” which translates to a KVM argument of -device virtio-scsi-pci,.... Though it is not in the UI as an option, another controller is available. If you create a VM, but don’t start it, then edit /etc/pve/qemu-server/ID.conf you can change the value of scsihw to virtio-blk. Once the VM is started, the KVM argument will be -device virtio-blk-pci,....
There’s not much info online about the difference between these 2 types of devices except for this QEMU article so I decided to add that as a testing dimension.
Setup
For this round of testing, I used the same ZFS datasets, VM/OS settings, and fio benchmark as I listed in the top post.
For “raw” storage, I used the “directory” Proxmox storage and chose a .raw file - not .qcow2 or .vmdk.
Results
test
storage
controller
Read Speed
Read IOPS
Write Speed
Write IOPS
1
ZFS
virtio-scsi
18.5 MB/s
295
58.0 MB/s
884
2
ZFS
virtio-blk
18.0 MB/s
274
53.9 MB/s
821
3
Raw
virtio-scsi
16.8 MB/s
256
50.3 MB/s
767
4
Raw
virtio-blk
16.1 MB/s
224
48.0 MB/s
731
Again, it seems like using ZFS storage (which puts a raw file inside a ZVol) gives the best performance. Both Jim and Allan agree that raw files are faster than ZVols and I believe them, which is why it’s annoying that I cannot replicate those results in my own testing.
It’s been a minute since I spun up a proxmox VM. If you’re really curious and you want to try some more, consider maybe booting temporarily from a vanilla Ubuntu installer, importing the pool, and testing in the live environment?
At this point I’m curious also, because I’ve never seen zvols perform well, in at this point decades of testing on multiple OSes and hypervisors.
Those results look very low across the board for a pool with two mirror vdevs of SSDs, though. There may simply be a nastier bottleneck hitting before the difference between raw files and zvols can make a difference.
Are you running proxmox on bare metal, or is there some other layer (eg TrueNAS) in play?
Numjobs=1, iodepth=1. This is not a very reasonable benchmark for testing a storage stack, because it doesn’t model what you actually do with it very well–and it removes any possibility of reordering for more efficient operations, which is a key portion of getting the performance your full storage stack actually delivers in real world use.
Try numjobs=8, iodepth=8 if you want to get closer to replicating my results. I’d have to go back to my notes from creating the article to find the exact settings I used, but that’s generally about the level of concurrency I test with.
If you think that’s a bit too busy for your system, I’d still advise at least numjobs=4, iodepth=4.
Those Blues are holding you back horribly, btw. This is on a system with a single mirror vdev of Kingston DC600m, using the same fio parameters you used, including numjobs=1,iodepth=1:
Run status group 0 (all jobs):
READ: bw=135MiB/s (141MB/s), 135MiB/s-135MiB/s (141MB/s-141MB/s),
io=8095MiB (8488MB), run=60001-60001msec
WRITE: bw=402MiB/s (421MB/s), 402MiB/s-402MiB/s (421MB/s-421MB/s),
io=23.5GiB (25.3GB), run=60001-60001msec
You might be tempted to call foul because those Kingstons are enterprise grade SSDs, but… DC600 are actually slower than fast consumer SATA SSDs on single process workloads like these; the DC600s focus is on hardware QOS that smooths latencies out when the device is under extremely heavy, massively parallel load… Which it does at the expense of raw single process throughput like this fio job represents.
In the longer term, I’ll definitely have to look at upgrading these drives - this pool was originally built as a quick “what can I make from cheap MicroCenter drives” and never got properly rebuilt.
I re-ran the test with 4 and 8 for jobs and iodepth and am still seeing ZVol-backed VMs as the most performant in every measure, though the raw file configuration is now 2nd place.
@mercenary_sysadmin perhaps this is a topic for another thread, but do you have any SSD recommendations with a bit lower price per TB? The DC600M looks like it goes for around $160/TB. I’d be looking for 2TB usable in 2.5" SATA form factor