Lower read speed than usual

havard · December 11, 2024, 8:43pm

Hi There

We have 4 1TB disks running in a pool. The disks have been there for a few years. The scrub this weekend took more time than usual. And the replication speed is very low.

root@prod1:~# zpool status
  pool: srv
 state: ONLINE
  scan: scrub repaired 0B in 1 days 21:29:25 with 0 errors on Mon Dec  9 21:53:27 2024
config:

        NAME        STATE     READ WRITE CKSUM
        srv         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

This graph is was for unlimited replication speed. The replication speed stays stable for 1M and lower.

This is exciting. I have newer experienced this before. How would you start debugging? Can we tell what disk are bad?

mercenary_sysadmin · December 12, 2024, 5:21pm

Start up a nice heavy workload that’s running much slower than you think it should, then pop a new terminal and do a continuously updated iostat:

https://jrs-s.net/2019/06/04/continuously-updated-iostat/

Watch for one drive to have massively higher %util than the rest. That’s your flaky drive. (You can also look for discrepancies in wait time, but honestly my eye usually picks out the %util outlier the most easily, so that’s what I tend to recommend.)

havard · December 12, 2024, 8:33pm

Yes! I believe you are right.

Thanks Mr. Salter

apt install sysstat
watch -n 1 iostat -xy --human 1 1