Import of existing ZFS pools fails

My first post here.
I also posted the same on the Gentoo forum, but then I felt it was not the proper place.
So:

I had 4 SSDs installed in a PCIe to NVME expansion card (MSI XPANDER-AERO) that came with the motherboard of a rather old PC (2020).
The card failed after six years of usage, but the SSDs are OK.

Two of them have btrfs partitions.
The other two have zfs pools, each of them single disk, whole disk.
For these last two I used zfs to be able to share data between Gentoo and FreeBSD in a simple, native way for each OS (FreeBSD cannot read/write btrfs partitions and I don’t like ext3/4, which it can do).

The expansion card failure was sudden, like all failures, so there was no time to do proper zpool export or anything else. So, I just removed the SSDs and put them in a Terramaster D4 and also tried each of them in a few USB-C and TB4 external enclosures (I have no other PC with internal M.2 slots).

The two SSDs with btrfs filesystems can be used with no issues to access the data.
Those two with zfs cannot.

When I try to import the pools with

zpool import -d /dev/nvme1n1

or

zpool import -m -n -F -d /dev/nvme1n1

I get

no pools available to import

The SSDs have the typical zfs partitioning:

parted /dev/nvme1n1 p

Model: Sabrent (nvme)
Disk /dev/nvme1n1: 4097GB
Sector size (logical/physical): 4096B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      8389kB  4097GB  4097GB               zfs-5e2d56f0da6e2044
 9      4097GB  4097GB  67.1MB

So, the partitions are still there, certainly with data on them from when they were used inside the PC. Is there a way to recover the data from those zfs pools?

Those ZFS pools are technically on the first partition, and you’re very narrowly requiring ZFS to only look on the whole disk.

Try just plain zpool import with no further arguments and see if it offers you a pool. Or if you absolutely must specify the device name (eg due to possible pool name conflicts) try using /dev/nvme1n1p1 to direct ZFS the rest of the way to the pool you’re specifically trying to import.

Thanks a lot!

This works: zpool import -d /dev/nvme1n1p1

1 Like

Just for the records. When you do just a zpool import without specifying anything else, it will show you which pools are available and ready to be imported. In most cases you do not need to specify -d at all.

The -d option is typically used to tell zfs to look in a different device directory. e.g. you could do:

zool import -d /dev/disk/by-id
zool import -d /dev/disk/by-uuid
etc.

In the end the difference is the name of the devices in the imported pool. If you do
zpool import -d /dev/disk/by-id
the devices in the pool wil be named like nvme-eui.0026b72826a62585.
With zpool import -d /dev/disk/by-uuid
the device names in the pool would be the UUID of the partitions like 642c43f3-cfb6-455d-a62a-8d4b6654e1fa.

You can also define your own device names in /etc/zfs/vdev_id.conf like for example:

alias WD10EARS-00Y5B1-WCAV5N16xxxx      /dev/disk/by-partlabel/zfs-b7e96621311a2297
alias WD5001AALS-00L3B2-WCASYC27xxxx    /dev/disk/by-partlabel/zfs-ece4b6dff15532e7
alias WD4000AAKS-00TMA0-WCAPW005xxxx    /dev/disk/by-partlabel/zfs-ccfd6dc3a55bdbbd
alias WD20EZAZ-00L9GB0-WXH2AB0Hxxxx     /dev/disk/by-partlabel/zfs-9e3e10f7ff34ac52

This will create the devices with the alias names in /dev/disk/by-vdev/ to be used with the -d option. In my case I add the serial number to the device name so that I can clearly identify the devices in the JBOD case. That helps when a drive has to be replaced. A vdev name like nvme-eui.0026b72826a62585 is not helpful when you need to know which drive in your 8-bay JBOD needs to be replaced.

1 Like

Lots of good info.
The important thing is that ZFS did not let me down and my data is safe.

Actually, one of the pools was recovered successfully, one was not.

This last one had a log on another disk that of course is not available when you put the SSD in an enclosure and do the stuff on another machine.
Not even the -m optioned helped, although the error mentioned something about a corrupted partition, which may or may not have been the case.
Fortunately, I had a recent backup for the data in that pool.

LOG or CACHE vdev dying does not kill the pool. (Losing a SPECIAL will, though.) The most losing a LOG can do is lose up to txg_sync_interval worth of writes (5s by default)–but that just leaves the pool “time traveled” backwards by the amount of writes lost; it does not render the pool unmountable or corrupt.

That corrupted partition message wasn’t kidding.