A single disk pool used as a remote backup began reporting errors in zpool status
:
pool: rent
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 11:12:15 with 0 errors on Sat Mar 29 21:40:37 2025
config:
NAME STATE READ WRITE CKSUM
rent ONLINE 0 0 0
ata-QEMU_HARDDISK_ZVT6TCH0 ONLINE 160 6 0
errors: No known data errors
I ran a scrub, which claimed no errors repaired.
This is the SECOND time this has happened. First was about two weeks ago, I ran a scrub then, didn’t repair any errors, and I did zpool clear
on the device.
Output of SMARTctl:
$ sudo smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: QEMU HARDDISK
Serial Number: ZVT6TCH0
Firmware Version: 2.5+
User Capacity: 18,000,207,937,536 bytes [18.0 TB]
Sector Size: 512 bytes logical/physical
TRIM Command: Available, deterministic
Device is: Not in smartctl database 7.3/5319
ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
Local Time is: Sun Mar 30 04:45:11 2025 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 288) seconds.
Offline data collection
capabilities: (0x19) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 54) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0003 100 100 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 16
4 Start_Stop_Count 0x0002 100 100 020 Old_age Always - 100
5 Reallocated_Sector_Ct 0x0003 100 100 036 Pre-fail Always - 0
9 Power_On_Hours 0x0003 100 100 000 Pre-fail Always - 1
12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 0
190 Airflow_Temperature_Cel 0x0003 069 069 050 Pre-fail Always - 31 (Min/Max 31/31)
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported
What next steps could I take to solve this? Could this be a cabling issue? The errors are in the READ and WRITE columns but not CKSUM, what’s that telling me (do read/write failures get logged, but then retry and succeed?).
Or is this disk, which I bought as a decertified drive from ServerPartsDeals last fall, maybe on its way out?
Thanks!