My questions:
- Does the below error suggest a duplicate entry in the spacemap or a duplicate free range allocation? (I have only a basic knowledge of ZFS internals)
- Did using zfs send/recv from one pool to another pool copy the duplicate allocation? (Is this because send/recv copies at the block level?)
- Now that I can import the pool after setting some ZFS tunables, will ZFS’ condensing operations ‘fix’ this duplication?
Background:
After a day of power outages (guess who has now set up a UPS) my server would panic when it was importing one of my pools.
panic: Solaris(panic) zfs: adding existent segment to range tree (offset=offset=2d8035c000 size=1000)
<and more of a backtrace>
Six passes of Memtest showed that my memory was fine, so I assumed that the power outages had somehow caused a data error.
I was eventually able to import the pool readonly=on
and use zfs send/recv to re-create the pool on two new disks.
This new pool (new disks, connected to the motherboard SATA instead of the HBA) worked for a day and then started showing the same error.
I was able to import the pool after setting these sysctl options:
vfs.zfs.spa.load_verify_data=0
vfs.zfs.spa.load_verify_metadata=0
vfs.zfs.recover=1
vfs.zfs.zil.replay_disable=1
I could then run a scrub that returned no errors, but noticed that when I came across a similar error when I ran zdb:
sudo zdb -AAA -b FastPool
Password:
Traversing all blocks to verify nothing leaked ...
loading concrete vdev 0, metaslab 91 of 116 ...WARNING: zfs: removing nonexistent segment from range tree (offset=2d8035c000 size=1000)
loading concrete vdev 0, metaslab 115 of 116 ...
96.8G completed (6760MB/s) estimated time remaining: 0hr 00min 00sec leaked space: vdev 0, offset 0x2d8035e000, size 4096
No leaks (block sum matches space maps exactly)
bp count: 4010548
ganged count: 0
bp logical: 154767922688 avg: 38590
bp physical: 103859914752 avg: 25896 compression: 1.49
bp allocated: 108874661888 avg: 27147 compression: 1.42
bp deduped: 0 ref>1: 0 deduplication: 1.00
bp cloned: 0 count: 0
Normal class: 108874633216 used: 44.09%
Embedded log class 12288 used: 0.00%
additional, non-pointer bps of type 0: 327594
Dittoed blocks on same vdev: 390548
Dittoed blocks in same metaslab: 2