Scrub repaired data but no errors in pool

ytk · July 18, 2023, 6:56am

I recently did a scrub on my root pool after I realized that I hadn’t done one for… uh… two years… Anyway, the results of the scrub look somewhat concerning:

  pool: zroot
 state: ONLINE
  scan: scrub repaired 8.09M in 0 days 00:09:43 with 0 errors on Mon Jul 17 18:04:06 2023
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    nvd0p4  ONLINE       0     0     0
	    nvd1p4  ONLINE       0     0     0
	    nvd2p4  ONLINE       0     0     0
	    nvd3p4  ONLINE       0     0     0

errors: No known data errors

The drives are Samsung 970 EVO NVME SSDs, attached to the PCIe slots via adapters. If I were seeing actual errors, the adapters would be the first thing I’d suspect, but that doesn’t seem to be the case. The server has ECC RAM, so I don’t think it’s a bit flip issue. And of course no drive errors are being reported.

SMART tests on the drives show between 7 and 80 media errors on each of the drives, though I’ve read somewhere that on the EVO series those could have been caused by unsafe shutdowns, of which apparently there have been 9. So maybe that’s what’s causing the data to need to be repaired? I don’t think the drives are going bad, but I can’t say for sure.

So are the media errors likely to be the source of the corrupted data? 8MB in two years isn’t much I suppose, so should I just not worry about it?

mercenary_sysadmin · July 19, 2023, 5:33pm

Don’t worry about it. Just keep an eye on zpool status, whether manually or by running sanoid --monitor-health (if you’re a sanoid user), to make sure you catch it when one of those EVOs really poops the bed on you.

And make sure you’ve got a cron job scheduled to scrub bi-weekly; modern OSes tend to install those for you, but if you’ve had this one for a while, you may have started out with an older OpenZFS that didn’t install one for you. If this is a Linux system, try grep -ir scrub /etc/cron.d to look for an existing cron job and, uh… whatever the heck you do to look for systemd scheduled recurring tasks; I never can remember that off the top of my head.

If you can’t find an existing cron job or systemd task to run scrub bi-weekly, create one yourself–it’s important!

In the meantime, you’re fine: very few errors, you’ve got dual parity, as long as you fix your “forgot to scrub” issues and keep an eye on disk health moving forward, you’re in good shape.