FYI FWIW - Building zfsbootmenu with remote SSH access and ZFS version 2.2.6

Thanks to @Halfwalker sharing his excellent Ubuntu build script, I was finally able to figure out a local build of zfsbootmenu so that I could have the latest debian-packaged zfs and remote ssh access during boot. I thought I would share my notes in case it helps anyone else working on this type of thing.

I did all of this in an Incus VM on my workstation, and would recommend NOT doing this directly on a production machine or personal machine without using a VM - I made a physical debian test system unbootable while working on this by installing the wrong dracut package, and there are a few other footguns in here too.

Also, the resulting images are not signed for Secure Boot, and will need Secure Boot disabled in bios on the machine that you use them on, and I ran into a problem where the build VM couldn’t modprobe the zfs-dkms module because of Secure Boot, and had to disable Secure Boot for the build VM before I could run the following build:

Firstly, get the latest ZFS (2.2.6) from bookworm-backports, and build a ZBM EFI image with it, just to check we can get that far:

sudo su

add bookworm-backports to /etc/apt/sources.list for main an contrib

apt update

apt install -y linux-headers-$(uname -r) 

apt install -t bookworm-backports -y curl zfs-dkms zfsutils-linux

rm -rf /tmp/zfsbootmenu && mkdir -p /tmp/zfsbootmenu

curl -L https://get.zfsbootmenu.org/source | tar xz --strip=1 --directory /tmp/zfsbootmenu

cd /tmp/zfsbootmenu

make install

apt-get -qq --yes --no-install-recommends install libyaml-pp-perl libsort-versions-perl libboolean-perl mbuffer

apt install -y dracut-core kexec-tools fzf systemd-boot

Add the following to /etc/zfsbootmenu/config.yaml

    Components:
        Enabled: false
        ImageDir: /root
        Versions: 3
    EFI:
        Enabled: true
        ImageDir: /root
        Versions: false
    Global:
        BootMountPoint: /boot/efi
        DracutConfDir: /etc/zfsbootmenu/dracut.conf.d
        InitCPIOConfig: /etc/zfsbootmenu/mkinitcpio.conf
        ManageImages: true
        PostHooksDir: /etc/zfsbootmenu/generate-zbm.post.d
        PreHooksDir: /etc/zfsbootmenu/generate-zbm.pre.d
    Kernel:
        CommandLine: ro quiet loglevel=7
    

then
    
generate-zbm --enable

generate-zbm --no-initcpio --debug

(Edit: I just noticed that the code block above scrolls - you might need to scroll down to see everything)

The resulting image, /root/vmlinuz.EFI , will have the latest backports ZFS version, but no SSH capability.

Then, add in dropbear ssh support and build the image again. This uses a workaround from here to fix a recent dependency problem with dracut-network

apt install dropbear-bin dracut-network openssh-server 

rm -rf /tmp/dracut-crypt-ssh && mkdir -p /tmp/dracut-crypt-ssh

cd /tmp/dracut-crypt-ssh && curl -L https://github.com/dracut-crypt-ssh/dracut-crypt-ssh/tarball/master | tar xz --strip=1

sed -i '/inst \"\$moddir/s/^\(.*\)$/#&/' /tmp/dracut-crypt-ssh/modules/60crypt-ssh/module-setup.sh

cp -r /tmp/dracut-crypt-ssh/modules/60crypt-ssh /usr/lib/dracut/modules.d

echo 'install_items+=" /etc/cmdline.d/dracut-network.conf "' >  /etc/zfsbootmenu/dracut.conf.d/dropbear.conf

ssh-keygen -t ed25519 -f /root/.ssh/remote-zbm

echo 'add_dracutmodules+=" network-legacy "' >> /etc/zfsbootmenu/dracut.conf.d/dropbear.conf

echo 'dropbear_acl=/root/.ssh/remote-zbm.pub' >> /etc/zfsbootmenu/dracut.conf.d/dropbear.conf

mkdir /etc/cmdline.d

echo 'ip=dhcp rd.neednet=1' > /etc/cmdline.d/dracut-network.conf

generate-zbm --no-initcpio --debug

It generates the same EFI file as last time. This image works as a normal console based ZBM, but also starts an SSH server on a DHCP address (which you will either need to create a static lease for or figure out from your router what address it has), and can be accessed as root on port 222 using the SSH key that was generated:

ssh -i /root/.ssh/remote-zbm -p 222 10.4.6.108

You then get a shell prompt

zfsbootmenu ~ >

and you have to type “zfsbootmenu” and press enter to get the actual ZBM

zfsbootmenu ~ > zfsbootmenu

From there it’s a regular ZBM, except that when you select an envrionment and boot it, the ssh sessioin drops out and hangs when kexec hands of to the new kernel, which means if the new kernel has any problems opening the pool (e.g. because it’s an old snapshot that has an old ZFS version that doesn’t support the subsequently-upgraded pool), then you won’t see the failure messages that you would normally see in the console, and the new kernel cannot log them because it can’t load the pool, and I haven’t been able to find a way to see them from ZBM logs? I’d love to know how to fix this if anyone has some ideas.