Why is scrub writing?

ZFSZealot · September 12, 2023, 1:33pm

Not sure if this is the right place to ask, can someone direct me elsewhere if not?

I just built a pool on FreeBSD 12.4-RELEASE-p4 and did a “zfs send -R ” of an existing pool over the network using netcat to a “zfs recv -Fdv ” on the new server. That all went perfectly, but when I scrubbed the pool I noticed something odd. Somehow scrub is causing steady writing to the disk as it works.

There is absolutely no outside activity to the pool yet. The zfs filesystems aren’t even mounted. When not scrubbing, gstat shows absolutely no disk activity. When I start the scrub I’m seeing a steady roughly 30-35MB/sec write happening. This starts even in that beginning part where it’s scanning and continues the entire time into when it’s issuing and even in the phase after the scanning is done and it’s just issuing. This does NOT happen on the server the pool was sent from from which is running an admittedly older FreeBSD 12.3-RELEASE-p5, and I’ve never seen this on any other ZFS implementation, although admittedly they’re usually in service and you do see writes, they’re just not steady like this.

Stop the scrub and this write activity stops also. I’m 100% positive it’s the scrub causing this. The scrub has finished perfectly fine several times, there are absolutely no disk errors in dmesg, SMART shows absolutely nothing that’s a concern, not a single UCE, grown bad sector, logged sector rewrite/reassign, not even any substantial number of error correction invocations. These disks are all completely healthy, I cherry picked them specifically for this array based on SMART stats and very thorough testing.

Gstat shows similar, but here is a sample of what “zpool iostat -v 5” shows basically the whole time the scrub’s running.

                             capacity     operations    bandwidth
pool                      alloc   free   read  write   read  write
------------------------  -----  -----  -----  -----  -----  -----
rpool                     28.5G  98.5G      0      0      0      0
  mirror                  28.5G  98.5G      0      0      0      0
    gpt/os0                   -      -      0      0      0      0
    gpt/os1                   -      -      0      0      0      0
------------------------  -----  -----  -----  -----  -----  -----
sas15k                    2.64T  3.33T  6.86K  1.09K   598M  37.5M
  mirror                   245G   311G    676     89  55.7M  3.28M
    diskid/DISK-0XV339ZJ      -      -     60     46  55.2M  3.28M
    diskid/DISK-0XGXWS2P      -      -     59     46  55.8M  3.28M
  mirror                   246G   310G    666     82  55.1M  3.60M
    diskid/DISK-0XV0Z03H      -      -     57     43  54.7M  3.60M
    diskid/DISK-0XH9GVYP      -      -     61     42  55.5M  3.60M
  mirror                   246G   310G    663     86  57.2M  3.87M
    diskid/DISK-0XV12PRL      -      -     61     38  57.1M  3.87M
    diskid/DISK-0XV14YJH      -      -     61     39  57.7M  3.87M
  mirror                   245G   311G    514    117  52.6M  4.40M
    diskid/DISK-0XV0NP3H      -      -     55     46  52.6M  4.40M
    diskid/DISK-0XH9GV2P      -      -     64     47  52.2M  4.40M
  mirror                   245G   311G    607     82  51.1M  3.31M
    diskid/DISK-0XV0NMLH      -      -     54     38  51.0M  3.31M
    diskid/DISK-0XV0R2SH      -      -     57     38  52.0M  3.31M
  mirror                   245G   311G    614     66  55.1M  2.68M
    diskid/DISK-0XV3U9GJ      -      -     59     32  56.1M  2.68M
    diskid/DISK-0XV0VYLL      -      -     58     31  55.1M  2.68M
  mirror                   247G   309G    792     76  53.7M  2.68M
    diskid/DISK-0XV0PHHH      -      -     57     39  53.6M  2.68M
    diskid/DISK-0XV13K0H      -      -     60     39  55.3M  2.68M
  mirror                   246G   310G    647    139  54.5M  3.32M
    diskid/DISK-0XV0W4TH      -      -     58     60  54.5M  3.32M
    diskid/DISK-0XH2VLXV      -      -     58     59  54.7M  3.32M
  mirror                   245G   311G    708    142  55.1M  3.12M
    diskid/DISK-0XV450MJ      -      -     58     59  54.9M  3.12M
    diskid/DISK-0XV182KH      -      -     58     58  55.1M  3.12M
  mirror                   245G   311G    558    146  53.5M  3.83M
    diskid/DISK-0XV0YJXJ      -      -     57     61  53.9M  3.83M
    diskid/DISK-0XV0P03H      -      -     57     61  53.5M  3.83M
  mirror                   245G   311G    571     90  54.3M  3.46M
    diskid/DISK-0XV3KN8L      -      -     68     46  54.3M  3.46M
    diskid/DISK-0XV0NWDH      -      -     58     45  54.4M  3.46M
------------------------  -----  -----  -----  -----  -----  -----

If it matters, topology of both sending and receiving pools are a set of mirror vdevs, 8 vdevs in the source pool and 11 in the destination. This is used for NFS VMware datastores and fibre channel zvol targets.

This is the only thing I’ve been able to find that talks about anything similar and there are no real answers in it: Why does 'zfs scrub' write to disk every 5 seconds? - Unix & Linux Stack Exchange

allan · September 13, 2023, 6:24pm

Periodically, ZFS will write out a scrub checkpoint, saving to disk the state of what datasets/objects it has scrubbed, so it can resume if the system reboots etc.

That is generally once every 2 hours though.

You might look at the per-dataset kstats to see which dataset the writes seem to be going to:
sysctl kstat.zfs.sas15k.dataset

A tool like ztop or ioztat will give you iostat like output per-dataset for the same.