_OLD_ ZFS randomly unmounts

A friend discovered an OLD system when it started to fail and roped me in… This thing is RHEL 7.4 with ZFS 0.7.13. It hasn’t had a scrub since 2019. It had been online for over a year. There’s so much about this that makes me :scream:

Anyway, it works for a while - then the zpool unmounts for no discernible reason. Not a consistent time or workload - seems to be very random in the timing. The SAS controller is reporting fine - nothing in the logs. Dmesg has nothing in the logs. Messages have nothing in the logs. zpool events -v has nothing (except for where the pool has been mounted and the one reboot performed in troubleshooting). And as of right now, none of the disks report problems and zpool status shows 0 errors. Everywhere we look, there’s no sign of why it simply vanishes.

Any thoughts/suggestions?

I don’t know if it was wise or not, but I started a scrub just to see if some of the disks will throw errors we can troubleshoot. It’s 3 vdevs of 10 drive each in RAIDZ. It’s going to take a bit for the scrub to complete. :man_shrugging:

Our next thought was (after the scrub) to try to update to a newer ZFS (and/or kernel). He’s also checking on if/when it was last backed up (another thought that scares both of us … )

Thanks!

Giving an update. A scrub reported a single read error. Further investigation on the hardware led to a faulty ECC memory that wasn’t flagging/logging errors. Replacing the DIMM has led to stability for nearly a week now. Actively backing up data to a newer system vs trying to maintain this in its existing form.
Thank you for the suggestions and help @mercenary_sysadmin !

Are you sure it’s unmounting from a running system, as opposed to failing to mount while booting?

IDK; I’m very familiar with extremely old ZFS on Linux, but not in the RHEL family (it was always Debian derivatives for me personally). I think at this point the ZFS problem is almost a red herring; the real issue is that you’ve got an out-of-support system that badly needs upgrading. RHEL 7.4 itself is still supported, but ZFS on Linux 0.7 very definitely is not supported by Red Hat. So, yeah, at this point the goal is to upgrade OpenZFS at a minimum.

Assuming this is an x86-64 system, RHEL 7.4 itself will still be supported until August 2027. So if your friend has big concerns about what will happen to bespoke software that might not work properly on newer versions of RHEL, you can likely dodge that for another several years. But that ZFS module… oof. That’s gotta get upgraded, and if you can’t do that without upgrading RHEL along with it, well, you’re gonna need to deal with that too.