Today my article on the new OpenZFS Block Reference Table went live at Klara Systems. This new feature allows for file-level reuse of blocks, not just dataset-level! It can be a game-changer, but there are some caveats. Read on:
The fact that the deduplication is lost on send is enough of a bummer and potential foot-shooting-device that I will probably steer clear of this. If your backup scheme is not zfs based or you have “fuck-you” amounts of storage on the backup and fat pipes going there then I can see great benefits. Especially as you pointed out for VM workloads and seeding gold images.
It’s cool that ZFS is getting a nice trickle of new, useful features. And it’s not a problem that not all of them are for everyone.
I couldn’t see it specifically in the article or the release notes, but this is pool-level right, not dataset-level? As in the deduplication works across datasets.
Also, stumbled upon the corrective receive in the release notes. Surely that would be a great article as well?
I thought BRT is still not on by default (as of 2.2.0) in any of the fortcoming releases until today as still defined as not stable right ?
I don’t believe it’s on by default anywhere yet, although it’s available via a tunable in FreeBSD 14.1 (which is where I tested it).
It was on by default in the private pkg
release of OpenZFS I tested fast dedup on recently, but I’m not sure if that reflects anything in base either of 14.1-RELEASE or FreeBSD CURRENT. Might, might not, I just don’t personally know.
Also, stumbled upon the corrective receive in the release notes. Surely that would be a great article as well?
Corrective receive had some pretty massive limitations and outright bugs in it last time I tested it. I’m not sure if those have been fixed yet.
It’s pool level as far as ZFS itself is concerned, but there can be hiccups getting the host operating system to be willing to do it across datasets using cp
. To the best of my understanding it works automatically under FreeBSD but there are still problems getting modern Linux kernels to use it; the details are a little esoteric and I am not certain of the current status on Linux as I haven’t tested the feature there yet.
See Implement copy_file_range as alternative to ioctl(FICLONE) · Issue #60 · cargo-bins/reflink-copy · GitHub for (a little) detail.
Ah, bummer then. I guess I’ll hope to never need it!
Right. I guess that’s another reason I’ll put a pin in it for a few years to see where it’s ended up. Realistically I’m better off staying conservative on new ZFS features (even by Ubuntu LTS standards - I’ve been bitten before) and let others get the bugs sorted out.
Will the BRT feature have an impact on VAAI XCOPY performance?
I’ve observed VMware ESXi leveraging OpenZFS block (via SCST iSCSI) with VAAI Status: supported
:
ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: supported
When migrating VMs across VMFS volumes (hosted on the aforementioned iSCSI zvols) I’ve seen XCOPY at work by observing essentially zero iSCSI traffic during the transfer – clearly storage was relocating blocks on behalf of the host.
But I was dismayed to also notice very high disk activity suggesting OpenZFS was in fact making a physical copy of these blocks. To this layperson it sounds like BRT provides a foundation to effect XCOPY primitives entirely within metadata. Thoughts?