Whats your test / burn in processes for new disks

cacti · April 5, 2024, 12:02am

I recently picked up a few drives and figured I ought to run through a few tests before putting data on them. Recommendations seem to span everything from just checking smart data to trying to torture drives for weeks at a time to see if they fail in warranty. What is your burn in procedure, do you use a different method for new vs used disks?

HankB · April 5, 2024, 1:58am

When I buy refurbished HDDs I run a complete destructive badblocks scan on them. New drives just get put into service. If they’re going into a lightly populated pool I might do some simple test like create a single disk pool first, load it up with files and then scrub it, just to exercise it a bit.

I’ve gotten some SSDs that turned out to be pulls (128GB NVME SSDs for cheap on Amazon) and had a couple hours on them. For those I performed a “light:” destructive badblocks` scan.

NB: Between ZFS and backups, a single drive failure will not cost me any data.

denvercoder9 · April 5, 2024, 8:35am

Before trusting a new or repurposed drive with my data, or secure erasing used disks, I perform the following steps:

smartctl to create a baseline
shred one-pass with random data
badblocks alternating four write passes with four read verification passes
blkdiscard for full disk trim on SSD
smartctl to compare with baseline

NOTE: Command badblocks defaults to a 1024 byte block size limiting it to 4 TB disks. Increasing the block size from 1024 bytes to 1048576 bytes (1 MB) allows very large disks. Using the default block size with an 8 TB disk throws the following error:
badblocks: Value too large for defined data type invalid end block (7814026584): must be 32-bit value

sudo smartctl -x /dev/sdX ; date -Iseconds
sudo bash -c 'device=sdX && time shred -n1 /dev/$device && time badblocks -b 1048576 -wvs /dev/$device && time blkdiscard -v /dev/$device ; date -Iseconds'
sudo smartctl -x /dev/sdX ; date -Iseconds

SirGeorge · April 13, 2024, 6:00pm

I use the recommendations of Alex at Perfect Media Server, which leverages this script by spearfoot on GitHub.

That script runs badblocks, and a SMART short test and long test. Should absolutely use tmux!

raidz99 · April 19, 2024, 10:29am

So much important and great tips were already given, I just wanted to add that I do not trust SMART ever, except is says something bad. Then this probably is true.

I saw so many disks with obvious defects, such as were “hanging” from time to time, “clicking”, being partly slow or even causing ATA timeouts, but still claimed via SMART that everything would be fine and all self tests passed. So I don’t trust SMART at all. This is bad and sometimes makes it difficult to find why a pool is slow. Maybe it is to avoid warranty reasons (“the disk itself reports an error, please give me another one!”), but also on SLA enterprise disks someone can observe the same.

(that’s why ZFS is so useful, it does NOT trust the disk data and always maintains an own checksum, so even if the device is lying as in SMART, ZFS detects this and protects you data on the other disks!)

So my test is (after badblocks and that all): LOAD! Put as much load as you can for hours, all-time monitor I/Os and bandwidth (should be same on each disk and constant) and discard any disk that does not stand perfect. Or use RAID-Z3 or higher and live with a little slowness before each future fail, which might be an option for the smaller budget.

kneutron · April 29, 2024, 9:32pm

I use a script to dd zeros to the entire drive, followed by a SMART long test

Undertow8754 · May 6, 2024, 5:58pm

First smart, I ran a badblocks script on 9 drives at once, then smart again. on 14 TB drives bad-blocks took about a week.

Hindsight has shown this was not necessary, 6 months in the 8 drives in my pool are fine, the 9th was a cold spare. but I did not know that going in and “torturing them for a week” gave me confidence that I was not going to encounter “infant mortality” on my new drives, see bathtub curve from Backblaze

karl · May 14, 2024, 1:27pm

New to the the forum and have been lurking. Felt I would like to give my input on this post.

I have recently purchased two refurbished enterprise hard drives to increase my pool size. I did not want to run badblocks for 4 passes, so chose one pass in destructive mode with random data, and checked SMART pre- and post-badblocks scan.

badblocks -b 4096 -wsvt "random" /dev/sda

I feel this is “good enough” and my data is stored in the best file system (ZFS mirror vdev) so do not worry about he drive sending bad data back. All data is backed up elsewhere so happy with one pass. I only buy refurbished enterprise drives with less than 6 years of use.