Can't import pool due to missing devices. Can I recover?

I did something really dumb and might have destroyed my pool.

I was adding drives to my pool in TrueNAS and the UI got inconsistent on me. I had added 2 mirrors (4 drives total) to my existing pool, and the UI was still saying I had available drives. It was like the drives were both added and not, at the same time.

At this point I figured the 2 smaller drives were bad maybe (they’re the oldest) and tried to remove them from the pool again.

Then TrueNAS kinda froze and I was really super dumb and rebooted.

Yes, it’s all my fault.

But is there a way to save the situation? No data has really been written to the new drives, it’s all in the first mirror. Here’s the output of things:

zpool import -n -F RUST returns nothing.

# zpool import   
  pool: RUST
    id: 17815952320616229325
 state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

	RUST                                      UNAVAIL  insufficient replicas
	  mirror-0                                ONLINE
	    f2de719d-c1a6-4360-b401-cffebf690fc5  ONLINE
	    5909a77e-dce8-42bd-afeb-85336a058f95  ONLINE
	  mirror-1                                UNAVAIL  insufficient replicas
	    145420f3-6afe-42c7-9def-1f84fb603fea  UNAVAIL
	    569c9201-e00d-4286-8577-438144c4e65d  UNAVAIL
	  mirror-2                                ONLINE
	    186dc447-782f-4563-a669-df14c148adc4  ONLINE
	    7dceccfe-209d-4811-87f5-954f09cdbc7c  ONLINE
# lsblk -f       
NAME        FSTYPE     FSVER LABEL     UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                        
└─sda1      zfs_member 5000  RUST      17815952320616229325                                
sdb                                                                                        
└─sdb1      zfs_member 5000  RUST      17815952320616229325                                
sdc         btrfs                      6f1be2a5-520c-48d6-a3c8-3201a02e81c0                
└─sdc1                                                                                     
sdd                                                                                        
├─sdd1                                                                                     
├─sdd2      vfat       FAT32 EFI       D3BD-1E94                                           
└─sdd3      zfs_member 5000  boot-pool 13580889465207706321                                
sde                                                                                        
└─sde1                                                                                     
sdf                                                                                        
└─sdf1      zfs_member 5000  RUST      17815952320616229325                                
sdg                                                                                        
└─sdg1      zfs_member 5000  RUST      17815952320616229325                                
nvme0n1                                                                                    
└─nvme0n1p1 zfs_member 5000  NVME      17699913917308282213                                
nvme1n1                                                                                    
├─nvme1n1p1                                                                                
├─nvme1n1p2                                                                                
└─nvme1n1p3
# blkid
/dev/sdf1: LABEL="RUST" UUID="17815952320616229325" UUID_SUB="9858992425185324950" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="data" PARTUUID="7dceccfe-209d-4811-87f5-954f09cdbc7c"
/dev/nvme0n1p1: LABEL="NVME" UUID="17699913917308282213" UUID_SUB="2592288722561352011" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="data" PARTUUID="7389f252-2d65-47ec-a61a-e8dff2a8ca40"
/dev/sdd2: LABEL_FATBOOT="EFI" LABEL="EFI" UUID="D3BD-1E94" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="fa34533b-94fa-4ef4-bf47-58b17d2ddf0c"
/dev/sdd3: LABEL="boot-pool" UUID="13580889465207706321" UUID_SUB="12677788076775159089" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="f3372f69-68a9-4d37-aca2-9052d7411451"
/dev/sdb1: LABEL="RUST" UUID="17815952320616229325" UUID_SUB="3660728189304804708" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="data" PARTUUID="f2de719d-c1a6-4360-b401-cffebf690fc5"
/dev/sdg1: LABEL="RUST" UUID="17815952320616229325" UUID_SUB="4346503238552677176" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="data" PARTUUID="5909a77e-dce8-42bd-afeb-85336a058f95"
/dev/sda1: LABEL="RUST" UUID="17815952320616229325" UUID_SUB="7560449754649386217" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="data" PARTUUID="186dc447-782f-4563-a669-df14c148adc4"
/dev/sdd1: PARTUUID="02a1bedf-81e4-423b-a3a7-bcb87f04cd76"
/dev/nvme1n1p2: PARTUUID="29abc7c7-02"
/dev/nvme1n1p3: PARTUUID="29abc7c7-03"
/dev/nvme1n1p1: PARTUUID="29abc7c7-01"

So the missing drives are sdc and sde.

Here’s the output from parted

(parted) select /dev/sdc                                                  
Using /dev/sdc
(parted) print                                                            
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  3001GB  3001GB  zfs          data

(parted) select /dev/sde                                                  
Using /dev/sde
(parted) print                                                            
Model: ATA WDC WD30EFRX-68A (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  3001GB  3001GB  zfs          data

Is it possible that the partition table got messed up?

Any advice would be most appreciated! Thankfully, I have backups of the important data.

Backups rule!

What happened to the drives in mirror-1? Did you disconnect and physically remove them? I don’t see them in your blkid output.

that’s part of why I think they might be bad. Or that the partition table is corrupted or something, since they do show up with fdisk -l or parted

I see that now. The lsblk output shows one with a btrfs filesystem and does not list a filesystem for the other. It would seem that the information used to ID those drives/partitions as ZFS members has been altered. I don’t know if there is any recovery possible. I’d probably be recreating the pool and restoring from backup at this point.

Unless you’ve got a zpool checkpoint to rewind to, as far as I know you’re dead in the water.

(Which is an excellent argument for setting a pool checkpoint before modifying the pool layout, in the future. Might not be so helpful now, I know.)

@allan might know more about this than I do.

adding a checkpoint sounds like very sound practice. too bad TrueNAS doesn’t seem to do that by itself. Makes me wonder if I’m not better off running with stock Debian on zfsbootmenu or something :thinking:

Decided to just get on with restoring from backups.

I’ve installed a fresh Debian with ZFSBootMenu instead.

1 Like