Zpool scrub hung

I have a proxmox 6 server. I have a mirror zpool for proxmox and the VMs. I think have a ZRAID for all my data. I use sanoid and syncoid for snapshots, and send the snapshots from the mirror (VMs) to the data array for backup.

I then also have a 4TB usb drive attached for more backup (trying to follow @mercenary_sysadmin advice of 1-2-3; I also have an offsite backup).

Since I want to use syncoid to sync the snapshots from the mirror (and some of the ones in my ZRAID as well) to my usb drive, I made the usb external drive also a zfs zpool.

Things were working great for about a year. Proxmox (or maybe OpenZFS?) is set up to run a scrub every too weeks on the pools. The external usb drive has been hanging on the scrub. I cannot do any zpool commands, they all hang, including zpool status.

Following from here, I looked at /proc/spl/kstat/zfs/dbgmsg, and I have lots of deadman logs that look like: 1688505404 zio.c:1967:zio_deadman(): zio_wait waiting for hung I/O to pool 'usb_backup' . In /var/log/syslog, I have logs from zed of class deadman for this pool.

Sanoid and syncoid are not running since I think they also hang. The only way I have found to stop it is to shut down and restart the whole node. In Proxmox I cannot even view the disks.

How can I figure out what is causing this I/O hang? It happens every time a scrub happens.

2 Likes

USB is unfortunately notoriously unreliable for long term storage connectivity.

I would not recommend leaving a USB drive with a pool on it connected and imported for long periods of time. USB connectivity typically works well enough for import, do what you need to do, then immediately export and detach. But anything beyond that is asking for trouble.

For a cheap, relatively reliable local backup, I’d recommend setting up another system on your LAN. Use whatever bottom dollar old gear you’ve got lying around or can acquire cheaply and easily–a J1900 Celeron with 4GiB RAM is plenty–or, if you prefer, a new cheap dev board with proper SATA connectivity such as an Odroid HC4.

2 Likes

Ah, okay, I did not know that. Good to know, thanks.

That Odroid is an interesting device. I looked around, and it looks like getting proxmox on it is a but tricky (to make it another node). Have you tried that? Or should I just run Ubuntu on it?

1 Like

I’d just run Ubuntu. You’re not going to be able to run x86_64 virtual machines on that ARM hardware regardless, so it seems a bit pointless IMO to fight proxmox about it. I’d just stand up an Ubuntu server install on it and use sanoid+syncoid to manage replication on both sides.

Got it, thanks. I wasn’t going to put an VMs on it, just thought it would be helpful to have the proxmox gui for it.

But you are right, that is much easier. Thanks for the help!

1 Like

No problem. Don’t forget to back up the VM hardware definitions as well… You can usually just wrap a “new” VM around a backed up drive image without too much trouble, but it’s definitely a big win not to have to deal with things like your “hardware” MAC address changing (and the guest therefore deciding not to use your existing network configs on the “new” “hardware”)! :upside_down_face:

I only have LXCs since I only am using linux for things. I have backed up a windows VM on another machine, and it is not pretty so I didn’t want to get into it here.

The only backups I am doing is of all the underlying zfs filesystems, including the root one on my mirror ZFS. I assumed this included everything in proxmox including the “hardware” definitions of my LXCs, etc. Is that not correct?

If you’re backing up the entire proxmox pool, you’ll definitely get the hardware definitions as well. But you might want to learn where they are and what they look like (I don’t know the answer!) so you know for certain how to restore everything neatly and cleanly when the time comes.

For KVM, that’s an XML file per VM that lives in /etc/libvirt/qemu. But I don’t know how proxmox manages its definitions; could be simple XML files, or it could be keeping them as records in a database. Look that up, do a test backup AND test restore of something trivial, and Bob’s your uncle. :slight_smile:

1 Like

@mercenary_sysadmin, have you put ZFS on one of these? I got an HC4, put the standard Ubuntu 22.04 Minimal image that they supply. But it has a “custom” 4.9x kernel without the headers. To install zfsutils-linux I need the kernel headers. But in the repository for 22.04, it only has 5.1x and 6.2 headers. The only thing I have found is this guy that supplies updated kernels for Odroids. But he does everything in Debian and not Ubuntu, so I am not sure if I should add his whole repo.

Have you been able to get ZFS running on one of these? Where did you get the linux-headers?

Oof… Sorry, I haven’t done this on Odroid hardware since the DKMS days. It might make more sense just to go with Debian, then, assuming they’re still DKMS.

Okay, for those interested, the hardkernel image of Ubuntu 22.04 Minimal with linux kernel 4.9 does not work well with zfs. Following the advice from here, I instead installed another community image of Ubuntu 22.04 from here. See the discussion here for more info on this image. It is not official, but stable. This image comes with linux kernel 5.15, which is one of the the latest linux kernels that work with ZFS 2.1.5. ZFS 2.1.5 is what is installed from 22.04 on this image, so everything seems to work.

3 Likes

Thank you very much for the update!