Hello everyone,
since setting up a new server I experience some strange behavior.
zpool status
shows permanent errors in my snapshots, but no errors are detected for the drives.
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:06:10 with 0 errors on Sun Oct 1 00:51:49 2023
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0W800973M-part4 ONLINE 0 0 0
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0W820198A-part4 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
rpool/vm/vzsmb01@autosnap_2023-10-01_05:00:14_hourly:<0x0>
rpool/ROOT/ubuntu/opt/unifi-controller@autosnap_2023-10-01_19:00:04_hourly:<0x0>
rpool/ROOT/ubuntu/opt/nginx@autosnap_2023-10-01_05:00:15_hourly:<0x0>
rpool/ROOT/ubuntu@autosnap_2023-10-01_05:00:12_hourly:<0x0>
rpool/ROOT/ubuntu/opt/thelounge@autosnap_2023-10-01_11:00:12_hourly:<0x0>
rpool/ROOT/ubuntu/opt/nginx@autosnap_2023-10-01_04:00:04_hourly:<0x0>
The snapshots do indeed have some kind of error, as replication via syncoid fails with either broken pipe
or I/O error
.
This is the third time this happened.
The first two times I fixed it by destroying the offending snapshots and scrubbing the pool twice, because the errors remained after the first scrub. zpool clear
did nothing, even after deleting the offending snapshots and scrubbing once. Both times the scrub repaired 0B of data.
I want to find out what is going on, but my google-fu is failing me.
Here are the specs and logs for the machine in question:
System: Lenovo ThinkStation P330 Tiny (latest firmware)
CPU: Intel Core i7-9700T
RAM: G.SKILL 2x 32GB DDR4-2666 CL19 | F4-2666C19D-64GRS (tested extensively with MemTest86 & MemTest86+ before deploying at the start of September)
Disks: 2x Samsung 970 EVO Plus 2TB | MZ-V7S2T0BW (latest firmware)
OS: Ubuntu 22.04.3 LTS
Kernel: Ubuntu HWE 6.2.0-33-generic
zfs: 2.1.9-2ubuntu1.1
sanoid: 2.2.0 (Getopt::Long::GetOptions version 2.52; Perl version 5.34.0 | grabbed the .deb from mantic to get the new version)
The system is a docker and kvm host using root on zfs with zfsbootmenu and native encryption.
Exact command to create the pool:
zpool create -o ashift=12 -o autotrim=on -o autoexpand=on -o compatibility=openzfs-2.1-linux -O encryption=on -O keylocation=file:///dev/disk/by-partlabel/rpool.key -O keyformat=passphrase -O xattr=sa -O acltype=posixacl -O compression=lz4 -O dnodesize=auto -O normalization=formD -O atime=off -O canmount=off -O mountpoint=/ -R /mnt rpool mirror $DISK0-part3 $DISK1-part3
My disk layout:
GPT fdisk (gdisk) version 1.0.8
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 3907029168 sectors, 1.8 TiB
Model: Samsung SSD 970 EVO Plus 2TB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): B5F94C51-45D6-4C9B-8D26-B325ABAEBB34
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 4061 sectors (2.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 2048 512 bytes 8300 rpool.key
2 4096 4198399 2.0 GiB EF00
3 4198400 71307263 32.0 GiB FD00
4 71307264 3907029134 1.8 TiB BF00
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 1.8T 0 disk
|-nvme0n1p1 259:2 0 512B 0 part
|-nvme0n1p2 259:3 0 2G 0 part /boot/efi1
|-nvme0n1p3 259:4 0 32G 0 part
| `-md0 9:0 0 63.9G 0 raid0 [SWAP]
`-nvme0n1p4 259:5 0 1.8T 0 part
nvme1n1 259:1 0 1.8T 0 disk
|-nvme1n1p1 259:6 0 512B 0 part
|-nvme1n1p2 259:7 0 2G 0 part /boot/efi
|-nvme1n1p3 259:8 0 32G 0 part
| `-md0 9:0 0 63.9G 0 raid0 [SWAP]
`-nvme1n1p4 259:9 0 1.8T 0 part
Any input is highly appreciated. Thank you in advance for your time.