First post here, to share a recent problem I encountered (and solved).
I bought used Exos 16TB HDD to add in an existing pool.
My routine before adding disks in a pool is to test them…
- long SMART test
- test pool to copy some data on them
The SMART test was ok for all disks (12 of them), but when I created the test pool and copied data on it I got this:
My setup is the following:
- Dell T7910 server (running Proxmox 8.4 with ZFS 2.2.7)
- SAS2308 HBA
- IBM DCS3700 storage shelf (60 LFF SATA/SAS2)
As all other disks in the same storage shelf were ok, I suspected the new disks themselves.
smartctl showed me that they were using “Type 2 protection”, which is storing a few additional bytes for each sector. for example 512 bytes sectors becomes 520 ones.
I did not found much about this could cause problem with ZFS, except this blog post: https://www.junni.fi/disabling-type-2-protection-in-sas-hdds
Sometimes old enterprise SAS disks come with Type 1 or 2 Protection enabled. This is based on T10 Data Protection Information specification, which aims to ensure end-to-end-data integrity for data storage by using different (520 byte) block size where the disk controller can use the extra bytes for parity checking etc. In some cases this configuration can result problems with software RAID setups, like ZFS.
So I reformated my disks without this feature, as plain 4096 sectors with sg_format -F -f 0 -v /dev/sdYOURDISK
I then recreated a test pool and copied data on it… without any error.
I don’t know if it is the combination of HBA/shelf/disks that is causing the problem with ZFS, but wanted to share this in case someone gets hit by the same problem.