New OpenZFS feature: Block Reference Table

mercenary_sysadmin · October 9, 2024, 9:55pm

Today my article on the new OpenZFS Block Reference Table went live at Klara Systems. This new feature allows for file-level reuse of blocks, not just dataset-level! It can be a game-changer, but there are some caveats. Read on:

hernil · October 10, 2024, 6:59am

The fact that the deduplication is lost on send is enough of a bummer and potential foot-shooting-device that I will probably steer clear of this. If your backup scheme is not zfs based or you have “fuck-you” amounts of storage on the backup and fat pipes going there then I can see great benefits. Especially as you pointed out for VM workloads and seeding gold images.

It’s cool that ZFS is getting a nice trickle of new, useful features. And it’s not a problem that not all of them are for everyone.

I couldn’t see it specifically in the article or the release notes, but this is pool-level right, not dataset-level? As in the deduplication works across datasets.

Also, stumbled upon the corrective receive in the release notes. Surely that would be a great article as well?

waltar · October 10, 2024, 8:20am

I thought BRT is still not on by default (as of 2.2.0) in any of the fortcoming releases until today as still defined as not stable right ?

mercenary_sysadmin · October 10, 2024, 1:38pm

I don’t believe it’s on by default anywhere yet, although it’s available via a tunable in FreeBSD 14.1 (which is where I tested it).

It was on by default in the private pkg release of OpenZFS I tested fast dedup on recently, but I’m not sure if that reflects anything in base either of 14.1-RELEASE or FreeBSD CURRENT. Might, might not, I just don’t personally know.

mercenary_sysadmin · October 10, 2024, 1:39pm

Also, stumbled upon the corrective receive in the release notes. Surely that would be a great article as well?

Corrective receive had some pretty massive limitations and outright bugs in it last time I tested it. I’m not sure if those have been fixed yet.

mercenary_sysadmin · October 10, 2024, 1:50pm

It’s pool level as far as ZFS itself is concerned, but there can be hiccups getting the host operating system to be willing to do it across datasets using cp. To the best of my understanding it works automatically under FreeBSD but there are still problems getting modern Linux kernels to use it; the details are a little esoteric and I am not certain of the current status on Linux as I haven’t tested the feature there yet.

See Implement copy_file_range as alternative to ioctl(FICLONE) · Issue #60 · cargo-bins/reflink-copy · GitHub for (a little) detail.

hernil · October 10, 2024, 2:55pm

Ah, bummer then. I guess I’ll hope to never need it!

hernil · October 10, 2024, 2:58pm

Right. I guess that’s another reason I’ll put a pin in it for a few years to see where it’s ended up. Realistically I’m better off staying conservative on new ZFS features (even by Ubuntu LTS standards - I’ve been bitten before) and let others get the bugs sorted out.

adaptive_chance · October 11, 2024, 9:13pm

Will the BRT feature have an impact on VAAI XCOPY performance?

I’ve observed VMware ESXi leveraging OpenZFS block (via SCST iSCSI) with VAAI Status: supported:

ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: supported

When migrating VMs across VMFS volumes (hosted on the aforementioned iSCSI zvols) I’ve seen XCOPY at work by observing essentially zero iSCSI traffic during the transfer – clearly storage was relocating blocks on behalf of the host.

But I was dismayed to also notice very high disk activity suggesting OpenZFS was in fact making a physical copy of these blocks. To this layperson it sounds like BRT provides a foundation to effect XCOPY primitives entirely within metadata. Thoughts?