root@bigbertha[~]# zpool get bcloneused,bclonesaved,bcloneratio nvme
NAME PROPERTY VALUE SOURCE
nvme bcloneused 0 -
nvme bclonesaved 0 -
nvme bcloneratio 1.00x -
root@bigbertha[~]# zpool get bcloneused,bclonesaved,bcloneratio nvme
NAME PROPERTY VALUE SOURCE
nvme bcloneused 96.6G -
nvme bclonesaved 96.6G -
nvme bcloneratio 2.00x -
So I moved some files over to this dataset and I copied them through a SMB share, so two copies. Great! I can see that the bcloneused and bclonesaved values have updated.
This returns values before considering block cloning:
root@bigbertha[~]# zfs list -o name,used,avail,refreserv,usedsnap,usedds,usedrefreserv,usedchild /mnt/nvme/cloner
NAME USED AVAIL REFRESERV USEDSNAP USEDDS USEDREFRESERV USEDCHILD
nvme/cloner 193G 2.01T none 0B 96K 0B 193G
Is calculating the actual used space a function of used - bcloneused ?
Is there a property or something that reports how much space is actually used while considering savings from block cloning and maybe dedupe/both? I can imagine this getting pretty complicated to calculate as variables stack…
You’re mixing a couple of things here. bcloneused and bclonsaved are pool-wide stats. You want to compare them to allocated to get a sense of the total savings.
used and referenced and etc are dataset properties, and are showing the “apparent” usage charged to the dataset. This is what’s used for quota calculations etc, which (I guess) is why it counts each clone - if someone makes multiple copies of a file, it doesn’t matter how they’re stored underneath; they don’t get magically more space just because a clone happened.
(I could make the argument in the other direction, especially if explicit cloning is requested. I wonder if we’ll see requests for an option for that in the future).
Anyway, the pool clone stats are more understandable once you clone a file a couple of times:
# dd if=/dev/random of=/tank/file bs=128k count=4
4+0 records in
4+0 records out
524288 bytes (524 kB, 512 KiB) copied, 0.00164617 s, 318 MB/s
# zpool get allocated,bcloneused,bclonesaved,bcloneratio tank
NAME PROPERTY VALUE SOURCE
tank allocated 630K -
tank bcloneused 0 -
tank bclonesaved 0 -
# clonefile -c /tank/file /tank/clone1
using FICLONE
file offsets: src=0/524288; dst=0/524288
# clonefile -c /tank/file /tank/clone2
using FICLONE
file offsets: src=0/524288; dst=0/524288
# clonefile -c /tank/file /tank/clone3
using FICLONE
file offsets: src=0/524288; dst=0/524288
# zpool get allocated,bcloneused,bclonesaved,bcloneratio tank
NAME PROPERTY VALUE SOURCE
tank allocated 698K -
tank bcloneused 512K -
tank bclonesaved 1.50M -
tank bcloneratio 4.00x -
So four “copies” of the file. Only one copy truly exists (bcloneused=512K), the additional three copies are clones, so we saved 3*512K (bclonesaved=1.50M).
zdb -b will show cloned counts alongside everything else too.
But mostly I think you don’t need to care too much. The existing numbers are still “right”: actual amount of the pool used, and apparent space used on each dataset.
I do understand you here. I would say, however, from a user perspective knowing the amount of space any individual dataset is “actually using” helps drive decision making.
From your explanation here, it seems that may not be possible which is fine…It’s just disappointing. This is such a cool feature and it’s highly unfortunate that you have to do a bunch of math and understand the innerworkings of ZFS in order to even see and understand the benefits. Insert refquotas and refreservations (may not matter here IIRC, block cloning doesn’t work with ZVOLs, need to wait for fast dedupe?) and other things and it gets even more complicated for folks to understand.
Thanks for your help
Also to be fair here, this is not unique to Block Cloning. Same, same, but different with Dedupe which reports very similarly.
Yeah, I get it. It is hard though, because “actually using” means different things depending on how you look at it, which the mix of used/referenced x logical/physical x usedby x quotas kinda already points to. Adding pool-level services complicates it further, like, how do you do per-dataset accounting for clones across datasets? What if the source is a snapshot? Or for dedup, as you say?
That said! None of this is necessarily hard to do; maybe its just that no one thought of it yet (it certainly hadn’t occurred to me that per-dataset clone/dedup accounting might be useful). If there’s a way we could present this info, or even just a tedious bit of math that ZFS could do for you, that might be pretty straightforward to add. Please do open a feature request (or we can just play with it here).
(Both block cloning and dedup are front-of-mind for me at the moment so its a good time to snipe me into small quality-of-life improvements )