ZFS Pool Failure - Any Way to Recover Without Backup?

Hey everyone, I’m in a bit of a panic. I manage a small cluster with multiple ZFS pools, and we recently experienced a failure that left one of our pools unmountable. To make things worse, we didn’t have a proper backup strategy in place - just relied on snapshots, which are now inaccessible.

Here’s the situation:

  • The pool consisted of several mirrored vdevs.
  • Two drives failed in the same vdev, and now the pool won’t import.
  • zpool import shows the pool in a degraded state but refuses to mount.
  • We tried zpool import -f, but no luck.

Is there any way to recover from this, or is our data gone for good? Also, if we somehow manage to recover, what’s the best backup approach to prevent this in the future? Would appreciate any advice from those who have dealt with similar failures!

It’s gone for good, absent forensic recovery (meaning shipping all the drives off to an outfit like Gillware, which will cost you a minimum of $800 per drive).

To avoid this problem in the future, you need to back up the system. The best way to do this is ZFS replication, which you can make much MUCH easier with tools like my own sanoid and syncoid.

You also want to start actively monitoring your systems. Ideally, using a platform like nagios; but you can cobble a poor person’s solution together with shell scripts and eg healthchecks.io.

You should know the same day if and when a drive fails.

If you want some professional assistance with this, you can DM me here, or email jim@jrs-s.net.

3 Likes

Missed this bullet point, so my original post is moot.

Yeah, that makes sense. Definitely a painful lesson learned here. I was actually wondering - does ZFS replication cover everything in cases like this, or do people usually combine it with other backup tools?

I’ve come across setups where people use something like Bacula or Proxmox Backup Server alongside ZFS replication, but I’m not sure how common that is. Would love to hear what others are doing to avoid total data loss if things go sideways.

From a home user point of view, I have a ZFS send script. It sends incremental data streams. It uses bookmarks just in case I delete snapshots on purpose or by accident.

I like to know what’s happening step-by-step, but Sanoid will do the trick too. Jump to the Syncoid section to learn about snapshot replication.