Hi, I have two systems where I observed this same behavior. One is a simple 2-disk mirror, another is a 60-disk server configured in several raid-z groups. In a sequential buffered write workload, the speed is like running several seconds at RAM speed, then it stops writing for some dozens seconds, then start again and speed climbing up, then stop, repeating this pattern. It seems the system buffers too many writes before starting writing to disk, and when the buffer is full it has to stop the writer, but in an inconvenient way. What parameters should I check?
What you’re seeing is expected behavior: the value txg_timeout determines how long your system will buffer writes before committing them to the metal. By default, this value is 5 seconds (down from a default of thirty seconds in earlier ZFS versions).
For the vast majority of workloads, this is a good thing. It keeps a single large sequential write from bringing the entire system to a halt due to the very high throughput seq write workload exhausting all available IOPS.
You can experiment with different values of txg_timeout on your system if you like, but be warned that there be dragons down that path. I do not advise light-heartedly messing around with it without paying attention to the actual results, documenting what you did, being able to remember later if you see problems that you might have caused by messing with it, etc.
In general, higher values of txg_timeout lead to higher maximum throughput at the expense of higher latency; lower values of txg_timeout lead to much lower latency at the expense of lower maximum throughput.
Thanks, I thought maybe there’s an upper limit to buffered bytes…