ZFS pool hotspare replace hung

A disk in my raid z2 pool failed/is failing. It tried to replace itself with a hotspare (ata-ST18000NT001-3NF101_ZVTDW28R) but seems to have failed. I would like to remove the hot spare and manauly run the replace command. I can’t remove the disk because zfs says

cannot remove ata-ST18000NT001-3NF101_ZVTDW28R: Pool busy; removal may already be in progress

Thanks for the help.

zfs status output

      NAME                                          STATE     READ WRITE CKSUM
      storage                                       DEGRADED     0     0     0
        raidz2-0                                    ONLINE       0     0     0
          ata-WDC_WD60EFRX-68L0BN1_WD-WX52D30AVEY4  ONLINE       0     0     0
          ata-WDC_WD60EFRX-68L0BN1_WD-WX52D30AVNF8  ONLINE       0     0     0
          ata-ST6000VN0033-2EE110_ZADAG8SB          ONLINE       0     0     0
          ata-WDC_WD60EFRX-68L0BN1_WD-WX52D30AVA82  ONLINE       0     0     0
          ata-ST6000VN0033-2EE110_ZADAGFXS          ONLINE       0     0     0
          ata-ST6000VN0033-2EE110_ZADAKVHQ          ONLINE       0     0     0
        raidz2-1                                    DEGRADED     0     0     0
          ata-ST16000NM001G-2KK103_ZL21FTG0         ONLINE       0     0     0
          ata-ST16000NM001G-2KK103_ZL27RLMD         ONLINE       0     0     0
          spare-2                                   UNAVAIL     68   109    78  insufficient replicas
            ata-ST16000NM001G-2KK103_ZL28BWBD       FAULTED     25     0     0  too many errors
            ata-ST18000NT001-3NF101_ZVTDW28R        REMOVED      0     0     0
          ata-ST16000NM001G-2KK103_ZL28CETE         ONLINE       0     0     0
          ata-ST16000NE000-3UN101_ZVTEF7N0          ONLINE       0     0     0
          ata-ST16000NM001G-2KK103_ZL28JJWZ         ONLINE       0     0     0
        raidz2-2                                    ONLINE       0     0     0
          ata-ST18000NT001-3NF101_ZVTDVCF3          ONLINE       0     0     0
          ata-ST18000NT001-3NF101_ZVTDVDNR          ONLINE       0     0     0
          ata-ST18000NT001-3NF101_ZVTDVE2F          ONLINE       0     0     0
          ata-ST18000NT001-3NF101_ZVTDW28Q          ONLINE       0     0     0
          ata-ST20000NM002C-3X6103_ZXA0H7AZ         ONLINE       0     0     0
          ata-ST18000NT001-3NF101_ZVTDW2A9          ONLINE       0     0     0
        raidz2-3                                    ONLINE       0     0     0
          ata-ST8000NM0055-1RM112_ZA15VQEE          ONLINE       0     0     0
          ata-ST8000NM0055-1RM112_ZA161H3H          ONLINE       0     0     0
          ata-ST8000NM0055-1RM112_ZA161SGY          ONLINE       0     0     0
          ata-ST8000NM0055-1RM112_ZA1629FS          ONLINE       0     0     0
          ata-ST8000NM0055-1RM112_ZA162R80          ONLINE       0     0     0
          ata-ST8000VN004-2M2101_WSD80GKX           ONLINE       0     0     0
      spares
        ata-ST18000NT001-3NF101_ZVTDW28R            INUSE     currently in use

Can you just physically pull the spare? I’m seeing that you’ve got sufficient parity to operate without it.

Safest way to do this is power off the server, pull the drive, power on the server. That way if the pool cannot import without that drive present, you aren’t risking anything permanent–you can just power it off again, put the spare back in its bay, and figure out your next steps from there.

This is probably a lot more caution than you will actually need, but that’s what makes it “cautious” rather than “reckless” amirite? :slight_smile:

Edited to add: with all those hardware read failures, you should be looking at likely problems in cabling, backplane, and controller–in that order. It’s possible for a bad drive to cause the issues you’re seeing, but I’d bet on bad cabling, personally.

Thanks Jim, fan of your 2.5 admin podcast. I ended up rebooting the server for updates and the resilver restarted. I didn’t unplug anything. I did order a replacement drive incase I needed it.

This is the frist time I have tried using zfs’s hotspare feature. I usually just have a spare sitting in the pool and manually replace it when I see the failure.

1 Like