I know this has been discussed previously here, but I am trying to compare the performance benefits of using raw files for VM disks in Proxmox vs ZVOLs which are the default when using a ZFS storage type. I am under the impression from Jim’s Klara article and recent 2.5 Admins episode that raw files should give better performance than ZVOLs. I am trying and failing to replicate those testing results.
Here are the details of the test I ran:
- PVE 8.4.1
- ZFS pool: 4x SSDs in 2 mirrored vdevs
ZVol Setup
- Created dataset
ssdtank/vm-testing
with the relevant properties:
root@proxmox:~# zfs get recordsize,compression,xattr,atime ssdtank/vm-testing
NAME PROPERTY VALUE SOURCE
ssdtank/vm-testing recordsize 64K local
ssdtank/vm-testing compression on default
ssdtank/vm-testing xattr sa inherited from ssdtank
ssdtank/vm-testing atime off inherited from ssdtank
- Created Proxmox storage:
- Storage Type: ZFS
- Block Size: 64k
- Thin Provision: No
Raw File Setup
- Created dataset
ssdtank/vm-testing-dir
with the exact same dataset properties:
root@proxmox:~ zfs get recordsize,compression,xattr,atime ssdtank/vm-testing-dir
NAME PROPERTY VALUE SOURCE
ssdtank/vm-testing-dir recordsize 64K local
ssdtank/vm-testing-dir compression on default
ssdtank/vm-testing-dir xattr sa inherited from ssdtank
ssdtank/vm-testing-dir atime off inherited from ssdtank
- Created Proxmox storage:
- Storage Type: dir
- Preallocation: Off
VM Setup
I created 2 VMs with the exact same settings, but using the 2 different storage backends:
- OS: Debian 13 (Trixie)
- Qemu Agent: Yes
- Disk Settings: all default, size: 32GB
- CPU: 4x cores, type: host
- Memory: 2G
- All Debian installation defaults, all files in one partition
- No desktop environment just “standard system utilities”
Benchmark
Since I’m optimizing for a basic “general” workload with these VMs, I used a mix of read/write and a block size of 64k
fio --name=rand-rw-64k --ioengine=posixaio --rw=randrw --rwmixread=25 --bs=64k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based
Results
ZVol Performance
read
write
Raw File Performance
read
write
is there something influencing these results that I’m not aware of? Is this a limitation of Linux vs FreeBSD?
Well, I don’t know and you aren’t directly showing what proxmox is actually doing with the VM that you created “directory storage” for, so I don’t know.
I’ve been told that it’s not that easy to get proxmox to use raw files in the first place. I suspect you might actually be testing zvol vs zvol there.
Can you directly show me the VM storage back end? You should be able to get a literal directory listing of the dataset that supposed raw file backed VM lives in.
It’s very likely I messed something up.
VM Config
root@proxmox:/ssdtank/vm-testing-dir/images/108# cat /etc/pve/qemu-server/108.conf
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
cpu: host
ide2: local:iso/debian-13.0.0-amd64-netinst.iso,media=cdrom,size=754M
memory: 2048
meta: creation-qemu=9.2.0,ctime=1755217325
name: vmtest-raw
net0: virtio=BC:24:11:66:EF:D1,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: vm-testing-dir:108/vm-108-disk-0.raw,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=4b96ddc7-befe-417a-8d61-13660d7a1496
sockets: 1
vmgenid: c7035fb5-12bf-4a12-ad53-d6b13c8020b7
[PENDING]
boot: order=scsi0;net0
delete: ide2
Directory Storage
(if there’s a better name for this, I’m all ears. The Proxmox manual just refers to it as a “storage”)
Raw File
root@proxmox:~# ls -alh /ssdtank/vm-testing-dir/images/108/
total 4.9G
drwxr----- 2 root root 3 Aug 14 19:22 .
drwxr-xr-x 3 root root 3 Aug 14 19:22 ..
-rw-r----- 1 root root 32G Aug 14 22:12 vm-108-disk-0.raw
root@proxmox:~# qemu-img info /ssdtank/vm-testing-dir/images/108/vm-108-disk-0.raw
image: /ssdtank/vm-testing-dir/images/108/vm-108-disk-0.raw
file format: raw
virtual size: 32 GiB (34359738368 bytes)
disk size: 4.88 GiB
Child node '/file':
filename: /ssdtank/vm-testing-dir/images/108/vm-108-disk-0.raw
protocol type: file
file length: 32 GiB (34359738368 bytes)
disk size: 4.88 GiB
1 Like
That does look correct. The performance advantage of raw vs zvol applies on both Linux and FreeBSD, but I don’t use virtio-scsi, I just use virtio. That might be the difference.
Hmm, I don’t see just “virtio” as an option in the web UI
I’ll play around with the different options and see if one of them offers better performance.
you need to choose “virtio block” as bus/device type when adding a new harddisk.
Where in the config file might it say “virtio”? Would you be able to post an example config file? That would be very helpful. I’m having a hard time finding info on QEMU storage drivers.
Whew, okay I did some more digging at here’s what I found.
virtio-scsi vs virtio-blk
In Proxmox, the default SCSI controller in the UI is “VirtIO SCSI (Single)” which translates to a KVM argument of -device virtio-scsi-pci,...
. Though it is not in the UI as an option, another controller is available. If you create a VM, but don’t start it, then edit /etc/pve/qemu-server/ID.conf
you can change the value of scsihw
to virtio-blk
. Once the VM is started, the KVM argument will be -device virtio-blk-pci,...
.
There’s not much info online about the difference between these 2 types of devices except for this QEMU article so I decided to add that as a testing dimension.
Setup
For this round of testing, I used the same ZFS datasets, VM/OS settings, and fio
benchmark as I listed in the top post.
For “raw” storage, I used the “directory” Proxmox storage and chose a .raw
file - not .qcow2
or .vmdk
.
Results
test |
storage |
controller |
Read Speed |
Read IOPS |
Write Speed |
Write IOPS |
1 |
ZFS |
virtio-scsi |
18.5 MB/s |
295 |
58.0 MB/s |
884 |
2 |
ZFS |
virtio-blk |
18.0 MB/s |
274 |
53.9 MB/s |
821 |
3 |
Raw |
virtio-scsi |
16.8 MB/s |
256 |
50.3 MB/s |
767 |
4 |
Raw |
virtio-blk |
16.1 MB/s |
224 |
48.0 MB/s |
731 |
Again, it seems like using ZFS storage (which puts a raw file inside a ZVol) gives the best performance. Both Jim and Allan agree that raw files are faster than ZVols and I believe them, which is why it’s annoying that I cannot replicate those results in my own testing.
1 Like
It’s been a minute since I spun up a proxmox VM. If you’re really curious and you want to try some more, consider maybe booting temporarily from a vanilla Ubuntu installer, importing the pool, and testing in the live environment?
At this point I’m curious also, because I’ve never seen zvols perform well, in at this point decades of testing on multiple OSes and hypervisors.
Those results look very low across the board for a pool with two mirror vdevs of SSDs, though. There may simply be a nastier bottleneck hitting before the difference between raw files and zvols can make a difference.
Are you running proxmox on bare metal, or is there some other layer (eg TrueNAS) in play?
I’m running Proxmox on bare metal, specifically:
- Mobo: MSI B450 GAMING PLUS MAX
- CPU: AMD Ryzen 5 1600
- RAM: Corsair Vengeance LPX DDR4 16GB (2x8GB)
- Storage: 4x 1TB WD Blue SATA SSD
- GPU: Sparkle Intel Arc A380 6GB
SSDs are connected directly to the motherboard. Build log is here and gives more details if useful.
I think that’s the topology of my pool:
root@proxmox:~# zpool status ssdtank
pool: ssdtank
state: ONLINE
scan: scrub repaired 0B in 00:24:05 with 0 errors on Sun Aug 10 00:48:08 2025
config:
NAME STATE READ WRITE CKSUM
ssdtank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WDBNCE0010PNC_19416F800207 ONLINE 0 0 0
ata-WDC_WDBNCE0010PNC_200559800082 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WDBNCE0010PNC_200608800160 ONLINE 0 0 0
ata-WDC_WDBNCE0010PNC_200608802113 ONLINE 0 0 0
OHHHHH, I just caught it:
Numjobs=1, iodepth=1
. This is not a very reasonable benchmark for testing a storage stack, because it doesn’t model what you actually do with it very well–and it removes any possibility of reordering for more efficient operations, which is a key portion of getting the performance your full storage stack actually delivers in real world use.
Try numjobs=8, iodepth=8 if you want to get closer to replicating my results. I’d have to go back to my notes from creating the article to find the exact settings I used, but that’s generally about the level of concurrency I test with.
If you think that’s a bit too busy for your system, I’d still advise at least numjobs=4, iodepth=4.
Those Blues are holding you back horribly, btw. This is on a system with a single mirror vdev of Kingston DC600m, using the same fio parameters you used, including numjobs=1,iodepth=1:
Run status group 0 (all jobs):
READ: bw=135MiB/s (141MB/s), 135MiB/s-135MiB/s (141MB/s-141MB/s),
io=8095MiB (8488MB), run=60001-60001msec
WRITE: bw=402MiB/s (421MB/s), 402MiB/s-402MiB/s (421MB/s-421MB/s),
io=23.5GiB (25.3GB), run=60001-60001msec
You might be tempted to call foul because those Kingstons are enterprise grade SSDs, but… DC600 are actually slower than fast consumer SATA SSDs on single process workloads like these; the DC600s focus is on hardware QOS that smooths latencies out when the device is under extremely heavy, massively parallel load… Which it does at the expense of raw single process throughput like this fio
job represents.