How to get a count of the different recordsizes on a zpool

jay_tuckey · October 17, 2024, 2:06am

Does anyone know how to get a count of how many records on a volume are 1M vs 128K vs 64K etc? It would be nice to know to help calculate dedup efficiency.

zeefizz · October 17, 2024, 5:16am

Try zdb -DDD zpool

This is likely to take a long time depending on the size of your pool.

You can start with a single D and then go all the way to (I think) 5 Ds.

karl · October 17, 2024, 11:31am

I too had this problem.

I received a data stream with 1M record size, but due to delegated permissions of the receiving user (I cannot use root over ssh) the process was unable to set the record size to 1M, instead opting for default. I guessed that it would have transferred as the 1M size, but failed to set the property of the new dataset due to permissions issue. I went off to research how to find the current record size but with limited time felt it was too difficult.

zdb — display ZFS storage pool debugging and consistency information

And the D flags

-DD
Display a histogram of deduplication statistics, showing the allocated (physically present on disk) and referenced (logically referenced in the pool) block counts and sizes by reference count.
-DDD
Display the statistics independently for each deduplication table.
-DDDD
Dump the contents of the deduplication tables describing duplicate blocks.
-DDDDD
Also dump the contents of the deduplication tables describing unique blocks.

Source

jay_tuckey · October 18, 2024, 7:34am

The zdb -DD <pool> command doesn’t have the info I’m after:

> zdb -DD poolb
DDT-sha256-zap-duplicate: 477751 entries, size 449 on disk, 145 in core
DDT-sha256-zap-unique: 1106910 entries, size 461 on disk, 148 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    1.06M    987G    932G    932G    1.06M    987G    932G    932G
     2     355K    316G    276G    276G     776K    689G    599G    599G
     4    95.8K   74.4G   66.0G   66.0G     468K    355G    313G    313G
     8    13.0K   9.33G   8.00G   8.01G     126K   88.9G   76.4G   76.5G
    16    2.35K    611M    472M    474M    45.5K   11.7G   8.91G   8.95G
    32      161   50.3M   36.4M   36.5M    6.62K   2.17G   1.55G   1.55G
    64       25   5.36M   2.73M   2.77M    1.97K    369M    191M    194M
   128        4    988K    178K    188K      725    145M   26.4M   28.3M
   256        6   2.02M   2.02M   2.03M    2.15K    663M    663M    666M
   512        1    512B    512B      4K      526    263K    263K   2.05M
    1K        2      1K      1K      8K    2.10K   1.05M   1.05M   8.38M
    2K        1    512B    512B      4K    2.89K   1.44M   1.44M   11.6M
 Total    1.51M   1.36T   1.25T   1.25T    2.45M   2.09T   1.89T   1.89T

dedup = 1.51, compress = 1.11, copies = 1.00, dedup * compress / copies = 1.67

It’s good info on how many blocks are dedup’d, but doesn’t tell me anything about the size of the blocks.

jay_tuckey · October 18, 2024, 7:55am

Ok, I wrote a little script to decode the lines that 4 D’s or 5 D’s outputs, lines like this:

index 102a4a57d9721 refcnt 2 single DVA[0]=<0:74ade79000:100000> [L0 deduplicated block] sha256 uncompressed unencrypted LE contiguous dedup single size=100000L/100000P birth=26733L/26733P fill=1 cksum=2a4a57d97213bec:fa87b2e2d221d020:e76669454a546032:8b9662771d79f00d

In this line I believe the size is represented as size=<before compression in hex>L/<after compression in hex>P, this example block being 0x100000 bytes (aka 1MiB) in size, before and after, as it’s uncompressed.

My understanding is there is one line for each block in the dedup table with 4 D’s, and one for every block with 5 D’s, but I’m not actually 100% sure on this so I wouldn’t trust my guess. However, the output of my mostly deduped zpool with 5D’s is (trimmed for length):

b'a8200L'       8
b'df600L'       8
b'bb000L'       8
b'fda00L'       9
b'dbe00L'       9
b'a0800L'       9
...
b'1000L'        2609
b'e00L'         2719
b'c00L'         3543
b'800L'         4114
b'a00L'         4470
b'600L'         5158
b'400L'         6046
b'200L'         11254
b'100000L'      1404995

As you can see, 0x100000 (1MiB) blocks are by far the most common, with 1.4 million of them, which is what I would expect as this pool is mostly pictures and videos. Next most common are all the small block sizes, like 0x200 (512 bytes), 0x400 (1024 bytes), etc.

Script is:

#!/usr/bin/env python3
import subprocess, re, collections
# index 102ca703768b3 refcnt 3 single DVA[0]=<0:6002d00000:100000> [L0 deduplicated block] sha256 uncompressed unencrypted LE contiguous dedup single size=100000L/100000P birth=956L/956P fill=1 cksum=2ca703768b310af:bd95efb760577eb1:9fbd3ffae69d2b0e:d18504de45e66b20
p = subprocess.Popen(['zdb','-DDDDD','poolb'], stdout=subprocess.PIPE)
c = collections.Counter()
for i, line in enumerate(p.stdout):
    #if i > 1000: break
    if m := re.match(rb'^index.*size=([^/]*)/(\S*).*', line):
        c[m[1]] += 1
for k,v in sorted(c.items(), key=(lambda i: i[1])):
    print(f'{str(k):8}\t{v}')

zeefizz · October 18, 2024, 10:45am

Hmmm… my bad (I think) - I suspect I got “side-tracked” by your mentioning dedupe and so jumped straight into the dedupe tables and their dump ( -DD…)

Try zdb -Lbbbs zpool (I just ran it quite quickly against a pretty empty pool) and it does give you the block size histogram which is probably exactly what you need.

With a little scripting you should get what you can derived out of the dump.

Also, just as an aside, this will/should(?) also help you estimate/calculate the metadata usage in the pool in case you wan to use a special dev.

I hope this time it really helps you.

jay_tuckey · October 19, 2024, 2:50am

That has worked great, thanks!

The block size histogram has exactly the info I was after.

Block Size Histogram

  block   psize                lsize                asize
   size   Count   Size   Cum.  Count   Size   Cum.  Count   Size   Cum.
    512:  44.2K  22.1M  22.1M  44.2K  22.1M  22.1M      0      0      0
     1K:  34.2K  41.3M  63.3M  34.2K  41.3M  63.3M      0      0      0
     2K:  34.1K  91.6M   155M  34.1K  91.6M   155M      0      0      0
     4K:   418K  1.66G  1.81G  98.7K   451M   606M   132K   529M   529M
     8K:  82.6K   840M  2.63G  46.0K   521M  1.10G   464K  4.01G  4.53G
    16K:  54.8K  1.17G  3.80G   127K  2.29G  3.39G  66.9K  1.33G  5.86G
    32K:  53.9K  2.40G  6.20G  42.0K  1.88G  5.27G  56.0K  2.43G  8.28G
    64K:  64.0K  5.75G  12.0G  42.5K  3.82G  9.09G  65.7K  5.85G  14.1G
   128K:  88.3K  16.2G  28.2G   302K  40.3G  49.4G  89.0K  16.3G  30.4G
   256K:   165K  61.4G  89.5G  56.7K  21.1G  70.5G   165K  61.5G  91.9G
   512K:   194K   133G   223G  49.1K  35.5G   106G   194K   133G   225G
     1M:  1.69M  1.69T  1.91T  2.04M  2.04T  2.14T  1.69M  1.69T  1.91T
     2M:      0      0  1.91T      0      0  2.14T      0      0  1.91T
     4M:      0      0  1.91T      0      0  2.14T      0      0  1.91T
     8M:      0      0  1.91T      0      0  2.14T      0      0  1.91T
    16M:      0      0  1.91T      0      0  2.14T      0      0  1.91T

As expected, most of my blocks (2.04M before dedup, 1.69M after) are 1M in size, with then a scattering of other sizes.