I recently did a scrub on my root pool after I realized that I hadn’t done one for… uh… two years… Anyway, the results of the scrub look somewhat concerning:
pool: zroot state: ONLINE scan: scrub repaired 8.09M in 0 days 00:09:43 with 0 errors on Mon Jul 17 18:04:06 2023 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 nvd0p4 ONLINE 0 0 0 nvd1p4 ONLINE 0 0 0 nvd2p4 ONLINE 0 0 0 nvd3p4 ONLINE 0 0 0 errors: No known data errors
The drives are Samsung 970 EVO NVME SSDs, attached to the PCIe slots via adapters. If I were seeing actual errors, the adapters would be the first thing I’d suspect, but that doesn’t seem to be the case. The server has ECC RAM, so I don’t think it’s a bit flip issue. And of course no drive errors are being reported.
SMART tests on the drives show between 7 and 80 media errors on each of the drives, though I’ve read somewhere that on the EVO series those could have been caused by unsafe shutdowns, of which apparently there have been 9. So maybe that’s what’s causing the data to need to be repaired? I don’t think the drives are going bad, but I can’t say for sure.
So are the media errors likely to be the source of the corrupted data? 8MB in two years isn’t much I suppose, so should I just not worry about it?