Zpool throws kernel panics on import, any way to fix it?

I’m running ZFS on Ubuntu 22.04 on several systems, and that means

$ zfs --version
zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.2.2-0ubuntu9.4

I recently started to get kernel panics when importing a zpool, and googling around I found both that I’m not completely alone in this, and I also found for example this comment, it’s an old one but it indicates that the basic operational concept of ZFS is to fail loudly and completely when it detects errors, and tell the user to recreate the zpool from backups.

Is that still the case (i.e., that zpool re-creation from scratch is the only way), or is there some standard way to try to repair a zpool that apparently has metadata problems (depending on what type of metadata problem, obviously)? (Bad data that is found by a scrub has never caused me problems, but ZFS seems to be lousy at handling bad metadata.)

I also have a specific question: How do I mount an encrypted filesystem from a read-only zpool at a temporary mountpoint? Because when I try to import a zpool as readonly, all unencrypted filesystems appear instantly. I can load encryption keys in the readonly zpool. But when I try to mount them, I get the “cannot mount '/.../.../...': failed to create mountpoint: Read-only file system” error message, even when using “zfs mount -o readonly=on -o mountpoint=/tmp/.....”.

Some more details:
One of the systems is a single-board computer with a couple of SATA interfaces, one native and some others on an M.2 PCIe-to-SATA expander board. The zpools on that one recently stopped working completely, first by suspending the pool due to a lot of I/O errors, and then by not booting up at all after a reboot (getting stuck on an “import zpools based on cache” step). I don’t know exactly what went wrong when, but I did replace the SATA controller board because the old one broke. However, that should not have affected the zpool connected directly to the mainboard. The problem according to the kernel log is this on import:

VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 5)
PANIC at dmu.c:1144:dmu_write()
Showing stack for process 9595
CPU: 0 PID: 9595 Comm: txg_sync Tainted: P O 6.8.0-90-generic #91~22.04.1-Ubuntu
… (and something relating to “space_map_write” → “dmu_write”, which I haven’t looked into)

, which I can trace to here:

First I thought it was the SBC and/or the SATA expander board, but I moved the disk to my main desktop system, and it gave me the same error.

I can, however, import the zpool in readonly mode (the zpool shall be readonly, not the filesystems contained on it, it’s an easy syntax error to make), so the data on it is probably not completely gone.
(But I’m not going to, since this is a backup disk with almost no “original” data on it, and all primary sources for the backups are still up and running. And repairing a zpool, even if possible, seems more risky than recreating it from scratch.)

I also found this comment in a bug report that had the exact same error message, but that one was marked as completed for release 0.8.0. Don’t know if it ever was, though, did not see a commit message.

You’ve got mismatched versions of the kernel module and the userspace utilities, and that can cause unpredictable problems. I happen to be running 22.04 on the system I’m replying to you from, and this is what you should see:

me@elden:~$ zfs --version
zfs-2.2.2-0ubuntu9.4
zfs-kmod-2.2.2-0ubuntu9.4

This should be a simple apt update ; apt dist-upgrade fix, as long as the issue isn’t preventing you from getting to your root filesystem itself.

If you can’t get the system to boot, or otherwise get access to the root filesystem, you’ll need to boot into a USB installer’s live desktop, mount your system’s root filesystem and chroot into it, then apt update and apt upgrade to get the newer version of the userspace utilities installed.

Once you have got your utilities and zfs to the same version, if you cannot import the pool, then you need to boot and try to diagnose what is wrong with the pool.

  1. Use lsblk to check that the disks making up the pool can be seen and have the right layouts.
  2. Use zdb on the disk or zfs partition (as appropriate) to check what the state is for the zfs labels. Check that all disks / partitions in the pool have consistent labels.
  3. Use zdb to see what the TXG number is for each disk / partition in the pool and check whether they are the same.
  4. Try doing a zpool import with read-only set and with the -n and -f/-F flags. The -n flag says NOT to import but rather to try to tell you whether a real import would work or not.

You don’t want to try to import read-write because you don’t want the TXGs to change.

With this information it should be possible to see whether:

A. The pool can potentially be imported read/write.
B. If not whether the pool can be imported read-only so that you can copy the data elsewhere and reformatted.
C. Even if -f/-F doesn’t work - you may still be able to import with a significant rollback to an older TXG - you will lose some of the most recent changes but you may get integrity back again. This can take a while to work as well as it needs to read a lot of data to work out what to do.

This should be a simple apt update ; apt dist-upgrade fix, as long as the issue isn’t preventing you from getting to your root filesystem itself.

I had noticed the version mismatch, and I would have preferred to have them identical, that’s why I posted it. I have googled around and it seems many people find out they also have that mismatch.

But I have no idea how you managed to solve that part by apt dist-upgrade-ing. I could not make it work. An apt show zfsutils-linux -a showed 2.1.2-1ubuntu3 in jammy/main, and 2.1.5-1ubuntu6~22.04.4 in both jammy-updates/main and in jammy-security/main. I think I would have had the 2.1.5 version in both kernelspace and userspace if I used the old kernel version that shipped with 22.04 (GA (general availability)), but I’m using the HWE (hardware enablement) kernel.

On this page, I can see that if I use some apt hacking/policy settings/other stuff that I do so rarely that I don’t remember the syntax, I can probably include the Noble (24.04 LTS) version of zfsutils-linux. Or just download the binary using (in my case) this link. Still, it feels a bit too fiddly for comfort, I’d prefer not having to worry about kernel module version getting updated “behind my back” so to speak.

Luckily, my Ubuntu is booting on ext4, not ZFS, so booting it is not a problem.

With this information it should be possible to see whether:

A. The pool can potentially be imported read/write.
B. If not whether the pool can be imported read-only so that you can copy the data elsewhere and reformatted.
C. Even if -f/-F doesn’t work - you may still be able to import with a significant rollback to an older TXG - you will lose some of the most recent changes but you may get integrity back again. This can take a while to work as well as it needs to read a lot of data to work out what to do.

Thanks for the hints. As I wrote in my original post, the pool could be imported in read-only mode (but I could not mount the encrypted datasets, not even when supplying a temporary mountpoint). I could, however, send/receive raw snapshots of those datasets, with a few exceptions (namely the quick-and-dirty ones that sanoid wasn’t taking regular snapshots of).

Anyway, it is kind of moot now anyway. I wiped the zpools, am right now in the process of filling the disks with garbage from /dev/urandom, and scrubbing. No errors yet, so it looks to me the hardware is OK. I only lost some backup data, not any significant primary data, so while my fault tolerance is now one level lower than usual, it will only be in that state until I re-enable the nightly backups.

I would however like to feel more confident that the problems don’t reappear, so I will try to get the userspace and kernelspace versions to match, even if I have to compile it from source myself.

A kind-of different question: It it possible to mirror/replicate zpool metadata on another much smaller disk/vdev, if I don’t have large disks enough to have the whole zpool as a mirror or RAIDZ? And obviously to use that kind of metadata backup to salvage zpools that behave like the ones in my original post?

Sure; that’s called the special vdev. It does exactly what you’re asking for: directly stores all metadata on the (presumably very fast, low latency, high write endurance) drives you designate to that vdev.

Note that the special, if used, is a single point of failure for the entire pool, and therefore must be at least as redundant as the storage vdevs. This means if you’re running RAIDz1 or 2-way mirrors for storage, you need a 2-way mirror special–and if you’re running RAIDz2 storage, you need a three-way mirror for your special.

I’m not entirely sure what you mean, here. If you lose one side of a special’s two-way mirror, you don’t lose any metadata. If you lose all sides of the special, you lose all the metadata, and the pool with it.

If the question is whether a mirrored special would have avoided whatever issue you’re having now, that’s not possible to answer with no more information than I have. It seems an unlikely thing to promise–if you lose either any storage vdev or the special vdev, the pool is lost, period.

So if you’ve got a single-disk rust storage vdev and a single-disk special, you have two SPOFs (Single Points of Failure)–the rust disk, and the SSD you used for the special. If you’ve got a single-disk rust storage vdev and a mirrored special, you’ve still got one SPOF–the single disk storage vdev itself.

The better answer here is two-fold: since you’re at the edge of your budget, you don’t want redundancy (which extends uptime) as your first priority, you want backup (which prevents disasters). This still means buying another drive the same size as the one you’ve got–but instead of attaching it as a mirror, you should install it in a machine (ideally, a separate one–this can be a cheap used $100 box off Amazon or $50 on eBay, if you don’t have one lying around) and replicate from your source system to your backup system regularly.

Please note: I’m not pooh-pooh-ing your budget concerns; I’ve been brokity-broke-broke personally, so I don’t get uppity about that kind of thing. I’m just trying to give you advice on how best to allocate the budget you have as you go forward, so you don’t waste your money along the way. <3

(And on that note: if the case here is that you’ve got, say, a 20TB drive you can’t afford to buy a second one of, but most of that 20TB is “viewable Linux ISOs” that you could restore by simply downloading them again… you might also consider making that its own single-disk pool, and only using that pool to store things you can afford to lose / re-download again. But you still need to properly back up the stuff you can’t just download again.)

Thanks for the info (crucial to know that a special is part of the pool, not just an add-on for redundancy) - And your concern, even if you misunderstood my situation a bit.

My zpools appeared to be to have been corrupted due to bad metadata on them, so a single-disk vdev for the data and mirrored specials would likely have helped me.

As for my usecase, I’m using this single-board computer and HDD as a backup for basically all other ZFS-using devices I have, using sanoid/syncoid/nagios to handle my very messy bidirectional data flows - But the most important data (photos, important codes, …) is stored in multiple copies at multiple locations. Having this system break means that I’m only down to a single point of failure for the stuff that isn’t important.

By the way, just trying to download deb packages for Noble started a cascading tree of broken dependencies, even unrelated to ZFS, so it would definitely be less of a bother to just upgrade to 24.04 LTS than using Canonical packages from the wrong release. Either that or compile 2.2.2 using this tag, or something even newer using the DKMS version of the kernel module.

It would have in this very specific case, sure–assuming enough of the actual data on the drive is still intact to make importing the pool worthwhile, which is possible but a pretty far-out thing to assume, if you follow me.

But if you lose, say, 10% of the blocks on your storage vdev, that’s probably going to take out 50% of your actual storage–the blocks that are corrupt aren’t going to confine themselves neatly to individual files, so you’ll get quite a lot more files corrupt than individual blocks corrupt, none of which can be restored.

Another thing worth considering, that you might not have been aware of: the metadata blocks of a pool are already stored copies=2 by default. So the odds that most of the data blocks on the drive would be fine if only you still had the metadata blocks available… doesn’t strike me as likely. Possible, sure. Plausible, even… but not likely. :slight_smile:

Point taken, and no way to know for sure now that I’ve wiped the pools in order to feel confident that the hardware is still working as intended. Thanks for the input.

1 Like

To summarize: I had broken pools, cannot say for sure why since they have since been wiped (which in turn shows that the HDDs are physically OK) - But one possible reason could have been version mismatch between kernel module and userspace utilities.

I solved the version mismatch by doing what is says here, except I used noble instead of mantic, according to this page.
Note that the update process will throw up an interactive dialog when configuring libc6, so don’t script this. I guess this will work for now, until Canonical bumps the kernel module version.

So now, I get this:

$ uname -a
Linux SYSTEMNAME 6.8.0-90-generic #91~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 15:20:45 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy

$ zfs version
zfs-2.2.2-0ubuntu9
zfs-kmod-2.2.2-0ubuntu9.4

Close enough…