Apologies if this has already been asked. I could not find a clear answer in my search.
I am in the process of setting up some VMs and wanted to set these up using some of the best practices I’ve read about online.
I have read two excellent articles written by the Mercenary Admin which were very helpful but could not find the answer to my question. Or did not understand enough to glean the correct answer.
I am not clear regarding the use of “.raw” files vs “zvol” files
and the distinction.
I know that zvols have a block-device you can use for a VM:
zfs create -s -V 10G tank/vm_zvol
There is also the qemu image file; which seems to be the raw file referenced in many places:
qemu-img create -f raw vmimage.img 10G
Is the “vmimage.img” image file above considered a raw file?
Is this the file that can also be created with the “.raw” extension
I see references to the “.raw” file online in some places, particularly using Proxmox.
In my case, I will not be using Proxmox, so I would like to definitively know is the qemu “.img” image file is the “raw” file being referenced or not. Or, if I am missing something.
Actually, I’ve already reviewed this post you mention but again, my question is trying to understand if the .raw file is in fact the raw file being referenced in may places and is related to QEMU and not zvol?
qemu-img create -f raw file.raw 20G is one way to create a raw file, but it’s not the most common. All a raw file really is is just what it says on the tin–a file of a given size that KVM can then mount loopback and perform random I/O inside it.
Most frequently, people create raw files with the truncate command, like so: truncate -s 20G file.raw. This, specifically, creates a sparse file–one that already has the size (20GiB, in this case) set, but isn’t actually taking up any of that space on disk until it’s been written to.
Sparse files create instantly without generating any storage load, and there’s no point in trying to actually “preallocate” sectors in ZFS–it’s copy on write, you can’t actually do that–so there’s no reason to reach for a more complex tool like fallocate which can create the file and preallocate all its sectors, without needing to do a brute force write to those sectors.
Finally, you have the absolute stone axe approach: dd if=/dev/zero bs=1G count=20 of=file.raw. This literally just writes 20GiB worth of zeroes into as many sectors as that takes, then calls that a file.
Note that under ZFS, even the dd approach isn’t actually “preallocation”–yes, those sectors have all been markes as used with those zeroes, but since it’s a copy-on-write filesystem, you won’t be overwriting those same sectors when you actually begin using the virtual drive. (And if that’s not complicated enough, ZFS specifically will recognize a stream of pure zeroes and create a special “hole” file, essentially the same idea as a sparse file except that only ZFS understands that a hole file is sparse-allocated; the rest of the operating system thinks it’s a perfectly normal file.)
Anyway. All of that is the raw file format, which technically is no file format–just a sufficiently large empty file, which may be sparse, fast-preallocated, or entirely-written already.
A ZVOL isn’t a file format at all, it’s a special type of ZFS dataset that presents as a block device (on Linux; a character device, on FreeBSD) which can be accessed directly using tools intended to work with raw disks or disk partitions. This sounds like the perfect fit for VMs–except that it grossly underperforms raw storage, and has a few other really annoying-ass quirks generally better avoided.
Finally. .qcow2 is an actual file format which must be created with the qemu-img create -f qcow2 command. This is a file used for random access I/O as a virtual drive, just like the raw drive, the zvol, or an actual drive would be–but unlike the raw format or the zvol, it can’t be mounted directly with the host’s built in mount tools; qcow2 files must be manipulated either with the qemu-img toolset, or with the KVM hypervisor itself.
Qcow2 doesn’t perform as well as raw–initially, at least; for some reason it speeds up dramatically the closer the storage gets to its maximum on-disk size. Some people instinctively avoid using it with ZFS because nested copy-on-write sounds like a bad idea, but as long as you match its cluster_size parameter to your ZFS recordsize, it performs quite well in my experience–and sparse files are a little dangerous compared to qcow2; we’ve had a few bugs over the years that led to corruption only if sparse files were present. (A qcow2 is “sparse” in the sense that it doesn’t occupy its maximum size at creation time, but it’s not sparse from the perspective of the OS–the KVM hypervisor and qemu-img toolset themselves are what expand or contract the size of the file, in the same way any other application would, as opposed to relying on the filesystem itself to manage sparse files properly.)
Note that I am not saying that you shouldn’t use sparse files–to the best of my knowledge, the ZFS community has gotten all the bugs worked out of sparse files, including replication of them (which was the big hitch in the giddyup on at least one of the sparse file bugs in earlier years).
The real major reason to at least consider qcow2 files is that some KVM features can’t operate properly without them–specifically, if you want hibernation of the guest at the host level (meaning the guest doesn’t even have to know it was ever suspended to disk in the first place, it just sees its RTC jump forward in time after it’s resumed) or KVM snapshots (not ZFS snapshots!), you’ll need qcow2 storage, because the ephemeral storage for system state (including vRAM and a bunch of “hardware” state data) is tied to that format.
With that said… I’ve been doing ZFS+KVM for longer than virtually anybody else out there; I was running it in prod way WAY back in the Ubuntu 10.04 days, and I’ve never actually used either of those KVM features (KVM snapshots including system state, or host-managed guest hibernation) in all that time.
I still uusally reach for qcow2, though. I just don’t typically have performance problems on the systems I build, the way I build and tune them–so I default to qcow2 “just in case”, and I reach for raw files when I need the absolute maximum performance out of something really gnarly, like a big database, or a VM that needs to saturate a 10Gbps network (or come as close as it can to doing that) doing file sharing.