TrueNAS emailed me overnight about one of my disks
New alerts:
* Device: /dev/da5 [SAT], 8 Currently unreadable (pending) sectors.
Current alerts:
* Device: /dev/da5 [SAT], 8 Currently unreadable (pending) sectors.
I kicked off a long SMART test about 6 hours ago which has an ETA of 16 hours from now. Since then, the number of bad sectors has apparently risen to 48 according to dmesg logs
This is a manufacturer recertified Seagate 16TB Exos X16 ST16000NM001G from serverpartdeals.com and I am within the RMA period. The disk is part of a mirror vdev so if it completely dies so I should be okay unless I lose another before I replace it.
Am I reading the SMART output correctly that there are > 46k reallocated sectors? I’m used to running brand new drives so I’m not sure what the expected threshold of bad sectors is, especially on a drive this large.
The read error rate and seek error rate is also concerning, those should ideally both be at 0.
Unfortunately this happens, I’ve recently gotten unlucky with a few of my refurbs as well, but I’ve generally had a good experience with serverpartsdeals.
I think you know what I’m going to ask next… do you have a backup?
Pretty sure that drive is toast, fam. Double-check that your backups are good immediately, replace that drive ASAP.
Be warned that SMART data isn’t always what it appears to be. I would not necessarily take the read and seek errors as dealbreakers; those can and do occur in perfectly healthy drives. Different drive firmware also tends to store SMART data in different ways with different meanings, even when the attribute name is the same, so it’s difficult to know exactly what the values mean, without access to technical documentation for your exact model of drive (which isn’t even always consumer-available), but I definitely don’t like the looks of those raw values.
Nope, that’ll do her. Although I would probably recommend waiting to remove the old drive until after you have the new one; you never know when you’ll lose your “healthy” drive completely while your ailing drive is still limping along on its last leg… and if you’ve still got the ailing drive, you get to keep your pool, but if you’ve already removed it, welp.
Normally yes I would wait to remove the bad one until I have a replacement, but this will be an RMA with serverpartdeals so I’ll have to ship back the bad one and wait.
I did verify that the pool is replicated to my “remote” truenas box which is unfortunately still located in my house. I’ll just have to be extra careful when using the stove until the new drive comes so I don’t burn down both copies of the data
Sorry to hear you’re having a not-great time, but thanks for posting this thread. I just learned some new stuff about reading SMART reports. … And I’ve also learned that apparently Seagates are overeager in their SMART reporting and you need the secret decoder ring @karl mentioned.
Glad to hear you’ve got a healthy replica. The built-in replication feature is one of the things that made me determined to get my head around ZFS when my head really didn’t want to be a team player and learn ZFS.
Am I reading the SMART output correctly that there are > 46k reallocated sectors? I’m used to running brand new drives so I’m not sure what the expected threshold of bad sectors is, especially on a drive this large.
I’ve had excellent luck with used enterprise drives sold from reputable eBay dealers (server parts/corporate installation liquidators with tens of thousands of postive feedbacks + a warranty policy provided by the seller). More often than not, it seems (from the SMART stats, at least), that a lot of the used enterprise drives on the market sit in a server for years being barely used for writes and moderatealy used for reads, spending the bulk of their time idling.
At the sizes and specs I want, used enterprise has been the only real economical way to go for the number of drives I have. It’s definitely not something to be scared of, but you should certainly purchase from the most reputable reseller that has the disk(s) you want.
Just as a point of comparison, the only disk I’ve had fail since building my first storage machine (or rather, buying my first QNAP) was a brand new WD Gold 14 TB that I managed to get sealed in box at “someone died under mysterious circumstances while in the room with this” pricing. And my HDD NAS lives in my bedroom where I sleep, so it’s got the best environment I can manage. So, brand new drives can go sideways, too.
And as a bonus, I now have my Ph.D. in The Western Digital 10,000 Point RMA Process.
I would like to use ServerPartDeals but in the UK no similar site I have found. Shipping from US was pricy. I too buy from eBay. I have bought 3 enterprise drives from eBay sellers. I applied similar strategy: IT recycler, strong feedback and have lots of items on sale (bonus if they have the Offer button). So far, all has been good. My two wins were 2020 drives with warranty remaining and 0 SMART errors. Of the drives I have, they have long run times, and low (or even 1 or 2) start / stop counts; I believe is better to keep driving running then power cycling frequently.
Prior to this, I used to buy new drives (2TB), but the drives prices have increased (thanks AI) and my need is now 4TB drives (got to be careful with SMR); that means I am looking at ÂŁ140 new. So I figure, ÂŁ37 for an enterprise drive, mirrored in ZFS with backup (ZFS replication) and in cloud is more than adequate.
My strategy with drive expansion/replacement is to add the third drive to the mirror, then (if drive working) split and keep as archive, otherwise remove disk after.
This is my usual experience. Tens of thousands of hours of runtime and less than a dozen start/stops. I have indeed read from mutliple sources that frequent power cycling is to be avoided if possible.
I’m still unclear on whether messing with the spin up/spin down settings for the drives is worthwhile, but I think that’s probably a topic for another thread.