Really weird issue when booting from zfsbootmenu

Hello folks.

Currently running openzfs 2.3 with arch linux and LTS kernel. Native encryption is being used.

Starting this morning, I’m having a ‘null’ pool issue although, zfsbootmenu is able to see the pool, report it showing as healthy but when I try to boot it states that the pool needs to be recreated.

No recent kernel updates last 5 days but an automatic scrub ran yesterday(with no errors as well)

Is there anything else I could check? Bootfs on the dataset is set correctly, from the recovery shell I can export and import the pool with no issues…

Some extra information as well:

Report:

Edit: Solved.

  • Booted using an archiso-zfs.
  • Imported pool at /mnt, loaded keys with zfs load-key -aL to force prompt instead of file keys for zfs encryption.
  • Regenerated initrd by force reinstalling linux-lts.
  • After that, during boot it crashed stating that this pool was previously imported and used by “archiso”
  • Exporte the pool from zfsbootmenu recovery shell, and used the FORCE flag while re-importing the pool.
  • System booted.

Really weird but, it worked. It must have been something else since kernel had not been updated in a while and I’ve even tried to boot on a snapshot from 6 days ago, and the last maintenance on this pool was the automatic weekly scrub, which still indicates 0B repaired…

1 Like

I’m glad you got it sorted. I had a similar problem with an Ubuntu installation that just suddenly refused to boot, despite the pool being fine, and I never could get it running again.

Personally, I stopped using zfsbootmenu after that–it’s very nice to have a proper ZFS boot environment, but it’s easier for me to manage boot-and-root off of mdraid1, and leave ZFS for the after-the-system-is-booted data!

I suspect there is something with the cachefile going here like a bug but, I have no idea on how to reproduce again and maybe report to openzfs.

scrub was running, the laptop was under some load (nothing exhaustive, just gaming while the scrub was running so, some extra CPU and GPU load), weekly scrub process finished flawlessly and next day when I tried to boot, got contemplated with this null pool.

I’m studying the possibility of not relying on the cachefile anymore for this specific laptop setup by setting cachefile=none, disabling zfs-import-cache.service and enabling zfs-import-scan.service to force import of any pools on boot.