ZFS Block Cloning "Real" Space usage?

nickf · August 11, 2023, 7:58pm

root@bigbertha[~]# zpool get bcloneused,bclonesaved,bcloneratio nvme
NAME  PROPERTY     VALUE         SOURCE
nvme  bcloneused   0             -
nvme  bclonesaved  0             -
nvme  bcloneratio  1.00x         -
root@bigbertha[~]# zpool get bcloneused,bclonesaved,bcloneratio nvme
NAME  PROPERTY     VALUE         SOURCE
nvme  bcloneused   96.6G         -
nvme  bclonesaved  96.6G         -
nvme  bcloneratio  2.00x         -

So I moved some files over to this dataset and I copied them through a SMB share, so two copies. Great! I can see that the bcloneused and bclonesaved values have updated.

This returns values before considering block cloning:

root@bigbertha[~]# zfs list -o name,used,avail,refreserv,usedsnap,usedds,usedrefreserv,usedchild /mnt/nvme/cloner
NAME          USED  AVAIL  REFRESERV  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
nvme/cloner   193G  2.01T       none        0B     96K             0B       193G

Is calculating the actual used space a function of used - bcloneused ?

Is there a property or something that reports how much space is actually used while considering savings from block cloning and maybe dedupe/both? I can imagine this getting pretty complicated to calculate as variables stack…

robn · August 13, 2023, 1:28am

You’re mixing a couple of things here. bcloneused and bclonsaved are pool-wide stats. You want to compare them to allocated to get a sense of the total savings.

used and referenced and etc are dataset properties, and are showing the “apparent” usage charged to the dataset. This is what’s used for quota calculations etc, which (I guess) is why it counts each clone - if someone makes multiple copies of a file, it doesn’t matter how they’re stored underneath; they don’t get magically more space just because a clone happened.

(I could make the argument in the other direction, especially if explicit cloning is requested. I wonder if we’ll see requests for an option for that in the future).

Anyway, the pool clone stats are more understandable once you clone a file a couple of times:

# dd if=/dev/random of=/tank/file bs=128k count=4
4+0 records in
4+0 records out
524288 bytes (524 kB, 512 KiB) copied, 0.00164617 s, 318 MB/s

# zpool get allocated,bcloneused,bclonesaved,bcloneratio tank
NAME  PROPERTY     VALUE         SOURCE
tank  allocated    630K          -
tank  bcloneused   0             -
tank  bclonesaved  0             -

# clonefile -c /tank/file /tank/clone1
using FICLONE
file offsets: src=0/524288; dst=0/524288
# clonefile -c /tank/file /tank/clone2
using FICLONE
file offsets: src=0/524288; dst=0/524288
# clonefile -c /tank/file /tank/clone3
using FICLONE
file offsets: src=0/524288; dst=0/524288

# zpool get allocated,bcloneused,bclonesaved,bcloneratio tank
NAME  PROPERTY     VALUE         SOURCE
tank  allocated    698K          -
tank  bcloneused   512K          -
tank  bclonesaved  1.50M         -
tank  bcloneratio  4.00x         -

So four “copies” of the file. Only one copy truly exists (bcloneused=512K), the additional three copies are clones, so we saved 3*512K (bclonesaved=1.50M).

zdb -b will show cloned counts alongside everything else too.

But mostly I think you don’t need to care too much. The existing numbers are still “right”: actual amount of the pool used, and apparent space used on each dataset.

nickf · August 13, 2023, 2:52pm

I do understand you here. I would say, however, from a user perspective knowing the amount of space any individual dataset is “actually using” helps drive decision making.

From your explanation here, it seems that may not be possible which is fine…It’s just disappointing. This is such a cool feature and it’s highly unfortunate that you have to do a bunch of math and understand the innerworkings of ZFS in order to even see and understand the benefits. Insert refquotas and refreservations (may not matter here IIRC, block cloning doesn’t work with ZVOLs, need to wait for fast dedupe?) and other things and it gets even more complicated for folks to understand.

Thanks for your help

Also to be fair here, this is not unique to Block Cloning. Same, same, but different with Dedupe which reports very similarly.

robn · August 13, 2023, 11:43pm

Yeah, I get it. It is hard though, because “actually using” means different things depending on how you look at it, which the mix of used/referenced x logical/physical x usedby x quotas kinda already points to. Adding pool-level services complicates it further, like, how do you do per-dataset accounting for clones across datasets? What if the source is a snapshot? Or for dedup, as you say?

That said! None of this is necessarily hard to do; maybe its just that no one thought of it yet (it certainly hadn’t occurred to me that per-dataset clone/dedup accounting might be useful). If there’s a way we could present this info, or even just a tedious bit of math that ZFS could do for you, that might be pretty straightforward to add. Please do open a feature request (or we can just play with it here).

(Both block cloning and dedup are front-of-mind for me at the moment so its a good time to snipe me into small quality-of-life improvements )