PSA: The Ubuntu 22.04 HWE upgrade to the 6.5 kernel is a mess for ZFS

If you are on Ubuntu 22.04 and have been running the HWE stack to enable newer kernels you have probably been bumped or prompted to bump from the 6.2 to the 6.5 kernel recently. If you are running ZFS - especially on root - I would strongly advice against moving.

It mostly comes down to two bugs. One annoying and one actually breaking the system completely.

Heres the bug rapport for the boot bricking one.
https://bugs.launchpad.net/ubuntu/+source/grub2-unsigned/+bug/2051999
It seems that snapshotting the boot subvolume breaks compatibility with grub completely - and rolling back the snapshot does not help. The subvolume zpool has to be recreated from scratch from a recovery system and in the mean time you will not be able to boot. I’m not exactly sure why this seems to trigger during the update from a 6.2 HWE kernel stack to the 6.5 one.

I’ve now had two systems rendered un-bootable by this bug. Feel free to blame me for performing the update once-more on another system. I have excuses - although not very good ones :sweat_smile:

I wrote a bit about my debugging of the original failure and how I ended up working around it in a blog post if anyone is interested.
https://devblog.yvn.no/posts/notes-from-non-booting-ubuntu-server/

The other bug is related to how the 6.5 kernel ships with ZFS 2.2.0 in the kernel while the userspace tools zfsutils-linux remain on version 2.1.5. This mismatch in tooling can manifest in all kind of subtle bugs if the OpenZFS devs are to be believed. Although it doesn’t seem to be anywhere near as bad as the grub bug over, I personally ran into this when doing Syncoid replication of datasets. I found the easiest solution for me was to cherry-pick the zfsutils-linux package from the 23.10 repos to get matching versions. I wrote about that here.
https://devblog.yvn.no/posts/zfsutils-linux-and-hwe-kernels/

In conclusion I would strongly advice against moving a working 22.04 install to the HWE stack if one doesn’t absolutely have to.

Hopefully the 24.04 install turns out good and stable and will be an attractive base for running ZFS sometime this summer.

1 Like

I’d pretty strongly advise not using Ubuntu’s built in ZFS on root support. The ZFS itself is fine, but that boot loader never was really ready for primetime, and seems unlikely to get there at this point IMO.

Instead, use ZFSBootMenu: https://zfsbootmenu.org/

It’s a bit more of a pain in the ass doing the OS installation itself this way, but you’re left with a MUCH more robust and complete ZFS boot environment when you’re done.

2 Likes

The Ubuntu bug report links a Debian bug report and I will confirm that I was bitten by this on a system running Debian Bookworm and with ZFS 2.2.3 from backports. I’m told that the bug manifests with Grub 2.06 and that 2.12 fixes it. Unfortunately 2.12 is not in backports and I was unsuccessful installing 2.12 or even building it from source on that host. I wound up restoring everything from backups and made sure that nothing is creating snapshots in the boot pool.

I spent more time than I care to admit trying to install ZFSBootMenu but this host is too old to use UEFI and there were no detailed instructions for migrating a system that was installed with the bpool/rpool split. Even with the help of the kind folk at the ZFSBootMenu IRC channel I could not get this working (and that’s on me, not them.) But I did get to help in a very small way with the migration instructions for a system using UEFI. :smiley:

I do plan to give this another go, but I have some other things I want to do first. I’m in the process of mirroring the tank pool in that server to my Pi 4B server and that has also been an adventure.

1 Like

Yep, that’s also what I ended up with to salvage the system where I first encountered the grub bug. I was already not a fan of the default grub-zfs-bpool chewing gum and coat hangers “feel”, so I had already dabbled with it in a set up of another system a few weeks prior. ZFSBootMenu enables the super power that is snapshot rollback for your system in an actually usable package. It’s such a shame that it’s quite the pain to install compared to the guided experience of the default installer.

I would absolutely love ZBM as a supported mode for Ubuntu. At this point I’d even be pretty happy with a sort-of-official community script for setup that is a bit more ergonomic for setting up systems on bare metal. Less typos when reading instructions from another laptop would be neat!

I’m a bit conflicted about the whole thing. Sane defaults and out-of-the-box experience is a lot of the reason Ubuntu is appealing to me and why I’ve never bothered to mess around with ie. Arch outside a very few short play sessions, but ZFS - and ZFS on root - is such a value add that I can’t really say no. But then it’s that much easier to get bitten by corner case bugs.

Hey there, it seems it has entered backport between when you checked and now:

ik5pvx@penny:~$ date
Thu 21 Mar 2024 18:26:48 CET
ik5pvx@penny:~$ apt -a show grub-efi
Package: grub-efi
Version: 2.12-1~bpo12+1
Priority: optional
Section: admin
Source: grub2
Maintainer: GRUB Maintainers <pkg-grub-devel@alioth-lists.debian.net>
Installed-Size: 12.3 kB
Depends: grub-common (= 2.12-1~bpo12+1), grub-efi-amd64 (= 2.12-1~bpo12+1)
Homepage: https://www.gnu.org/software/grub/
Download-Size: 2,396 B
APT-Sources: http://deb.debian.org/debian bookworm-backports/main amd64 Packages
Description: GRand Unified Bootloader, version 2 (dummy package)
 This is a dummy package that depends on the grub-efi-$ARCH package most likely
 to be appropriate for each architecture.

2 Likes

I have a nice setup that will build a root-on-zfs system with 22.04, using zfsbootmenu. It’s running live on several systems at home, including my daily laptop and main media/nas box.

Not exactly polished but it gets the job done the way I like it. There are instructions for building it via packer and booting the resulting image in a qemu VM for testing. Or just boot a live-server cd image in a virtualbox VM and try it.

3 Likes

That looks brilliant. I’ll have a try next time I’m reinstalling!

And I needed it this morning. Thanks for the heads up. When I rebooted my (Debian Bookworm) file server this morning it manifested the grub error again, even though bpool had no snapshots. I used the rescue procedure (Debian Bookworm Root on ZFS — OpenZFS documentation) to install grub 2.12 from backports and my server is back up. :smiley:

Incidentally, I use Ventoy to boot a Live Debian environment with persistence so it comes up ready to roll (ZFS already installed.) I think Ventoy is my favorite thing next to Linux and ZFS.

best,

2 Likes