I have a disk that is used inside an incus (formerly lxd) VM (qemu). I can no longer get the VM to boot properly, it drops me into an initramfs. I can run fsck /dev/sda2 -y and that completes and indicates it found things to fix. But then it errors out and indicates it cannot write.
11 ref count is 156, should be 135. Fix? yes
Inode 1584842 ref count is 69, should be 56. Fix? yes
Pass 5: Checking group summary information
Free blocks count wrong (11599040, counted=11602313).
Fix? yes
Free inodes count wrong (6998947, counted=7000323).
Fix? yes
Error writing file system info: Input/output error
rootfs: ***** FILE SYSTEM WAS MODIFIED *****
(initramfs)
Are there ways I can approach fixing this drive outside of incus?
I can see my pool and it was in a degraded state but I ran zpool scrub and then zpool clear. Then zpool list indicates things are “good”.
xrd@biggpu:~$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
default2 186G 64.0G 122G - 372G 62% 34% 1.00x ONLINE -
incus-default 29.5G 904K 29.5G - - 0% 0% 1.00x ONLINE -
However, the disk itself isn’t fixed, which isn’t surprising.
Are there steps I can use to get at this disk outside of incus/lxc?
Could I copy the disk into another disk and then try to repair that one? I saw some other discussions where this error was indicative of a full disk but I don’t see that here and don’t know how to enlarge that specific disk inside the zpool.
xrd@biggpu:~$ sudo zpool status default2 -v
pool: default2
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 02:56:15 with 133 errors on Sun Dec 1 00:13:09 2024
config:
NAME STATE READ WRITE CKSUM
default2 ONLINE 0 0 0
/media/xrd/734204/737204/var/snap/lxd/common/lxd/disks/default.img ONLINE 0 0 2.40K
errors: Permanent errors have been detected in the following files:
<0xfc0d>:<0x1>
Fsck is only for ext, not for zfs. ZFS isn’t vulnerable to the type of inconsistency that fsck can detect and fix. I don’t know anything about your distro, so I can’t help any further there. You could maybe try booting from a thumb drive, and running fsck against your OS from there?
Permanent corruption detected in zpool status means exactly what it says it does. If you don’t have backup to restore from, you’re stuck with whatever you can copy off of the pool as-is.
Yeah, a hex address rather than a file name means that it’s either a metadata block, or possibly a block in a zvol. Not sure about that latter; I don’t use zvols in production so I haven’t had as much chance to observe their behavior in rare corruption events.
Not sure what I did, but I was able to run fsck again and the drive came up. Thanks for your help. I backed up my data but it seems to be running fine now. Very strange.