Yesterday I got a message after a scrub, ZFS Message ID: ZFS-8000-8A, saying that a permanent error was found in one of my ZVOLs. The permanent error detected was shown as <nagato/vms/lab-client1@**0x1**>
by zpool status -v nagato
, and my question is what does the “0x1” in the brackets signify? A cursory google search tells me that’s supposed to be a block address of the error (relative to the zvol? the zpool?), but after playing around I found that the actual I/O errors occurred ~20GiB worth of LBAs into the zvol, so I’m at a loss.
I say this because I went to dd
it out to cold storage with conv=noerror
before deleting it, and it definitely had no problem reading the early block(s) off the zvol. My kernel log shows this error from that dd
session: Buffer I/O error on dev zd16, logical block 4431390, async page read
, which is a great deal larger than 0x1
. Hence my confusion.
Apologies for being vague, the offending zvol has been destroyed and restored from backup, so I don’t have the exact error on hand anymore. For those curious this system, on battery backup, weathered a pretty unremarkable thunderstorm. Apparently that was enough for the 6/6 disks in this system to show a handful of checksum errors each after scrubs. Thankfully this seems to be the only permanent error in these pools. (FYI: the same storm also killed a drive in another server, a few miles away, so hard that HP’s RAID controller wouldn’t even POST! Of course, due to the lack of true checksumming, the controller trying to work around that URE happened to nuke the WIndows registry with the “reconstructed” block. Fun stuff. I guess I underestimated that particular storm!)
1 Like
I would have used zdb(8), although IIRC I only ever used it for metadata; never for data.
See, for example, https://github.com/openzfs/zfs/discussions/10479#discussioncomment-232850.
So if I run this command on a similar zvol with 0x1
it seems to show me a sort of summary of the whole volume:
[root@nagato ~]# zdb -dd nagato/vms/lab-dc1 0x1
Dataset nagato/vms/lab-dc1 [ZVOL], ID 7228, cr_txg 5334, 25.2G, 2 objects
Object lvl iblk dblk dsize dnsize lsize %full type
1 4 128K 8K 25.2G 512 128G 23.39 zvol object
What’s interesting is even with “more -d’s” I don’t see any indirect blocks or other objects pointing me towards individual blocks belonging to the zvol; I guess I’ll have to learn a little bit more about how zvols are stored on-disk. (My intuition was that it would allocate a volblocksize
record at a time and point me to corruption in individual records; these datasets were all created with the -s
sparse flag, so there is no reservation or anything.)
P.S: it turns out I had a DIMM fail after the storm! The checksum errors started coming back, so I started testing the hardware. I’m quite happy with the fact that ZFS managed to keep the large majority of my data safe from the failing hardware. Thankfully this dataset was non-critical, otherwise I would have spent more time digging into it. I kind of figured there had to be a common denominator, since getting checksum errors on three different interfaces (SAS HBA, on-board SATA, and NVMe) seemed awfully suspicious!
2 Likes
Mounted where?
(Is that a nonsensical question? I never worked with a ZFS volume.)
They show up as block devices, so for instance in lsblk
:
zd0 230:0 0 128G 0 disk
├─zd0p1 230:1 0 549M 0 part
└─zd0p2 230:2 0 127.5G 0 part
zd16 230:16 0 128G 0 disk
├─zd16p1 230:17 0 100M 0 part
├─zd16p2 230:18 0 16M 0 part
├─zd16p3 230:19 0 127.3G 0 part
└─zd16p4 230:20 0 604M 0 part
zd32 230:32 0 512G 0 disk
├─zd32p1 230:33 0 16M 0 part
└─zd32p2 230:34 0 381G 0 part
zd48 230:48 0 128G 0 disk
├─zd48p1 230:49 0 529M 0 part
├─zd48p2 230:50 0 99M 0 part
├─zd48p3 230:51 0 16M 0 part
└─zd48p4 230:52 0 127.4G 0 part
However I usually access them by name. (On Linux there is symlinks, not sure about other platforms.) So this one would be accessible at /dev/zvol/nagato/vms/lab-dc1-*
:
[root@nagato vms]# pwd
/dev/zvol/nagato/vms
[root@nagato vms]# ls | grep lab-dc1
lab-dc1
lab-dc1-part1
lab-dc1-part2
So the interesting thing, and part of my confusion about why the error didn’t point me to a specific block, is I don’t believe all the blocks are allocated when the zvol is created. For example my VM (kvm with virtio-scsi for storage) can pass through TRIM commands from the guest filesystem. Those deallocated blocks go back to the “available” space in the pool. (Unless they’re referenced by snapshots, of course.)
1 Like
the output is basically [objsetid]:[objectid] (pre the updated error log format, idk what that looks like yet).
they both get pretty printed if it knows what they are.
object 1 for a zvol is, well, the whole thing. (I mean, not quite, but you get the point.)
Dataset monolith/sorted/backups/dropbox_is_dumb [ZVOL], ID 210252, cr_txg 714311371, 26.0G, 2 objects, rootbp DVA[0]=<5:4423dbb000:1000> DVA[1]=<5:9e748e4000:1000> [L0 DMU objset] skein uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=727750413L/727750413P fill=2 cksum=0f3bb1bb16a0edf7:179a4577a7d88a59:737f5ab74d6514ce:b750f0c895488b15
Object lvl iblk dblk dsize dnsize lsize %full type
1 4 128K 8K 26.0G 512 58.0G 48.72 zvol object (K=inherit) (Z=inherit=zstd-3)
dnode flags: USED_BYTES
dnode maxblkid: 7602176
Indirect blocks:
0 L3 DVA[0]=<5:4ac8d94000:1000> DVA[1]=<5:12f1d26000:1000> [L3 zvol object] skein lz4 unencrypted LE contiguous unique double size=20000L/1000P birth=727750413L/727750413P fill=3704046 cksum=3caf5b472324ef40:46095f4c79678011:4dee96fd3dbdc7d5:5e281ebf5c02b6e7
0 L2 DVA[0]=<5:4ac8d87000:d000> DVA[1]=<5:12f1d19000:d000> [L2 zvol object] skein lz4 unencrypted LE contiguous unique double size=20000L/d000P birth=727750413L/727750413P fill=975375 cksum=8785c27702a7183c:38dbeab367cf4044:306cb26d746afd6e:1dc5e51d1370c777
0 L1 DVA[0]=<5:4a05124000:2000> DVA[1]=<5:1221e28000:2000> [L1 zvol object] skein lz4 unencrypted LE contiguous unique double size=20000L/2000P birth=727750413L/727750413P fill=538 cksum=cedcd928909397e3:ec08ef7cbeb0eced:cfa7577f076d0b41:d8a5900c5d5376d3
0 L0 DVA[0]=<0:3513818000:1000> [L0 zvol object] skein zstd unencrypted LE contiguous unique single size=2000L/1000P birth=727750413L/727750413P fill=1 cksum=476042498532bb85:65397f839566536d:33462e9bcb88e398:c9e7ea4d489edd74
2000 L0 DVA[0]=<4:ecf27737000:1000> [L0 zvol object] skein zstd unencrypted LE contiguous unique single size=2000L/1000P birth=727746852L/727746852P fill=1 cksum=da8ddef8e696296c:3adc17fea6459554:8aca8c79a3791b22:d5c39efe62117efe
4000 L0 DVA[0]=<0:fbf28825000:1000> [L0 zvol object] skein zstd unencrypted LE contiguous unique single size=2000L/1000P birth=727730394L/727730394P fill=1 cksum=502e8b3590fe68e8:331e0aaff6bee6d4:74844113b225f605:29d3f31178d8bd85
6000 L0 DVA[0]=<0:f38971a9000:1000> [L0 zvol object] skein zstd unencrypted LE contiguous unique single size=2000L/1000P birth=725205943L/725205943P fill=1 cksum=f4a7d9a66c956910:8cdc033680ceaf3e:74ff13e1d0f932b1:5aff330d9cdc5ca1
8000 L0 DVA[0]=<0:5f0b1256000:1000> [L0 zvol object] skein zstd unencrypted LE contiguous unique single size=2000L/1000P birth=714384624L/714384624P fill=1 cksum=b1ce0faa22a0ac90:3bf5c7036deefefb:44566856c2875826:962e44ab32f09028
a000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
c000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/4fP birth=714311418L
e000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
10000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
12000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
14000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
16000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
18000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/50P birth=714311418L
1a000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
1c000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
1e000 L0 EMBEDDED [L0 zvol object] et=0 zstd size=2000L/52P birth=714311418L
(etc).
Even with just enough -d
, it should print some subset of that information for you, but that came from zdb -dddddd -bbbbbb pool/zvol 1
.
(And for anyone curious about the name, it’s from when Dropbox briefly tried enforcing only using ext4 on Linux clients.)
I’m pretty sure with the new error log feature it can be more precise in where an error is in zpool status
, but I don’t actually remember what that looks like…
1 Like