Today I ran a smart test and almost immediately my pool faulted.
⚡ ~ zpool status -xv
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub in progress since Sat Oct 21 20:52:07 2023
172G scanned at 0B/s, 484M issued at 2.55M/s, 172G total
0B repaired, 0.27% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-CT240BX500SSD1_1916E17EC2EE-part3 ONLINE 0 0 0
ata-ADATA_SU750_2J2520052599-part3 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x0>
It seems the metadata was corrupted. I had 44 cksum errors on both drives before running a zpool clear
.
Thankfully I run Syncoid daily to copy rpool
(SSD mirror) over to tank
(RAIDZ-2):
⚡ ~ zfs list -t snapshot -o name | grep syncoid_rpool-to-tank
rpool@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:02-GMT-04:00
rpool/ROOT@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:25-GMT-04:00
rpool/ROOT/pve-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:48-GMT-04:00
rpool/data@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:36:19-GMT-04:00
rpool/data/subvol-101-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:36:41-GMT-04:00
rpool/data/subvol-104-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:43:17-GMT-04:00
rpool/data/subvol-105-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:51:07-GMT-04:00
rpool/data/subvol-114-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:53:31-GMT-04:00
rpool/data/subvol-116-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:04:04:46-GMT-04:00
rpool/data/vm-100-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:04:08:04-GMT-04:00
rpool/data/vm-102-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:24:09-GMT-04:00
rpool/data/vm-103-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:33:22-GMT-04:00
rpool/images@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:33:47-GMT-04:00
tank/backup/zfs/local/rpool@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:02-GMT-04:00
tank/backup/zfs/local/rpool/ROOT@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:25-GMT-04:00
tank/backup/zfs/local/rpool/ROOT/pve-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:25:48-GMT-04:00
tank/backup/zfs/local/rpool/data@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:36:19-GMT-04:00
tank/backup/zfs/local/rpool/data/subvol-101-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:00:36:41-GMT-04:00
tank/backup/zfs/local/rpool/data/subvol-104-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:43:17-GMT-04:00
tank/backup/zfs/local/rpool/data/subvol-105-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:51:07-GMT-04:00
tank/backup/zfs/local/rpool/data/subvol-114-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:03:53:31-GMT-04:00
tank/backup/zfs/local/rpool/data/subvol-116-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:04:04:46-GMT-04:00
tank/backup/zfs/local/rpool/data/vm-100-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:04:08:04-GMT-04:00
tank/backup/zfs/local/rpool/data/vm-102-disk-1@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:24:09-GMT-04:00
tank/backup/zfs/local/rpool/data/vm-103-disk-0@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:33:22-GMT-04:00
tank/backup/zfs/local/rpool/images@syncoid_rpool-to-tank_cleteServer_2023-10-21:05:33:47-GMT-04:00
My questions are:
- Is there any way to fix metadata corruption? I assume no
- What is the best way for me to restore the pool to a healthy state?
2a. Do I create a brand new pool, totally destroyingrpool
, and then sync over those datasets I pasted above?
2b. How do I ensure all the properties are copied from the backup too?
PS This is my boot pool so I will have to boot from a USB drive to do the restoration.