I’ve made a small script to group the file sizes. On the left is the first, full snapshot, on the right is the first snapshot of the incremental which already caused problems.
autosnap_2024-06-01_00:00:10_monthly autosnap_2024-07-03_21:41:30_monthly
≤ 1 bytes: 18,828 ≤ 1 bytes: 19,527
≤ 2 bytes: 1,185 ≤ 2 bytes: 1,185
≤ 4 bytes: 2,677 ≤ 4 bytes: 2,569
≤ 8 bytes: 3,173 ≤ 8 bytes: 3,467
≤ 16 bytes: 7,333 ≤ 16 bytes: 8,108
≤ 32 bytes: 20,696 ≤ 32 bytes: 21,332
≤ 64 bytes: 85,449 ≤ 64 bytes: 87,202
≤ 128 bytes: 100,749 ≤ 128 bytes: 104,293
≤ 256 bytes: 160,033 ≤ 256 bytes: 166,926
≤ 512 bytes: 234,470 ≤ 512 bytes: 234,486
≤ 1024 bytes: 307,388 ≤ 1024 bytes: 307,700
≤ 2048 bytes: 432,154 ≤ 2048 bytes: 431,440
≤ 4096 bytes: 345,148 ≤ 4096 bytes: 336,496
≤ 8192 bytes: 347,258 ≤ 8192 bytes: 347,291
≤ 16384 bytes: 217,849 ≤ 16384 bytes: 213,929
≤ 32768 bytes: 155,891 ≤ 32768 bytes: 154,934
≤ 65536 bytes: 111,177 ≤ 65536 bytes: 111,188
≤ 131072 bytes: 74,892 ≤ 131072 bytes: 74,321
≤ 262144 bytes: 45,665 ≤ 262144 bytes: 45,394
≤ 524288 bytes: 22,326 ≤ 524288 bytes: 22,762
≤ 1048576 bytes: 18,803 ≤ 1048576 bytes: 19,363
≤ 2097152 bytes: 11,589 ≤ 2097152 bytes: 11,622
≤ 4194304 bytes: 4,644 ≤ 4194304 bytes: 4,885
≤ 8388608 bytes: 3,958 ≤ 8388608 bytes: 4,032
≤ 16777216 bytes: 7,699 ≤ 16777216 bytes: 7,709
≤ 33554432 bytes: 4,066 ≤ 33554432 bytes: 4,040
≤ 67108864 bytes: 549 ≤ 67108864 bytes: 543
≤ 134217728 bytes: 308 ≤ 134217728 bytes: 308
≤ 268435456 bytes: 135 ≤ 268435456 bytes: 136
≤ 536870912 bytes: 79 ≤ 536870912 bytes: 80
≤ 1073741824 bytes: 53 ≤ 1073741824 bytes: 49
≤ 2147483648 bytes: 16 ≤ 2147483648 bytes: 16
≤ 4294967296 bytes: 11 ≤ 4294967296 bytes: 11
≤ 8589934592 bytes: 7 ≤ 8589934592 bytes: 7
≤ 17179869184 bytes: 2 ≤ 17179869184 bytes: 2
≤ 34359738368 bytes: 2 ≤ 34359738368 bytes: 2
≤ 68719476736 bytes: 1 ≤ 68719476736 bytes: 1
That seems most likely to be from small files–although if, for instance, you had been experimenting with tiny values of txg_commit_interval and related tunables after the first snapshot was taken, that might also account for it.
I did indeed change the txg_commit_interval on both source (60) and target (30), and knowing me, I also did some experiments on the source set like setting the interval to like 1 s or maybe even lower. But I can’t find anything in my shell history. Also, if the value was very low, it probably was only for a few minutes up an hour max. I’m also pretty sure those experiments were all done before June, which is when the first snapshot was taken.
However, from time to time I do some experiments with ZFS by creating a few files and creating a test pool from it, and sometimes I create the experimental files in my home directory instead of a special dataset that is not snapshotted. I then do some things you are not supposed to do, just to see what happens, eg. creating a volume in a test pool and adding that volume to the same pool.
So it could be possible that some of those experiments made it into a snapshot, however, I’m pretty sure I did most of those experiments before June, so they shouldn’t be in the snapshot. Also those experiments were like 30-60 minutes long.