I’m relatively new to ZFS, I’m familiar with the basics but haven’t been using it for much more than my home lab. From what I’ve seen, it looks like recordsize is one of the more important tunables on a dataset, especially for workloads such as databases or VMs. General wisdom seems to be to match the typical size of your IO operations, which makes sense.
The problem I’m facing is finding that typical size for an arbitrary workload. There are some resources on good recordsizes for various popular workloads, but there are also lots for which there is no information. Generally, it looks like one needs intimate knowledge of a workload’s implementation to determine optimal recordsize, which isn’t always possible.
In my specific use case, I’m looking to store InfluxDB data on ZFS, but cannot seem to find much information on what typical IO looks like for this workload. I could (and probably still will) talk to InfluxDB experts and devs and look at the source code, to see if I can determine a typical write size. That’s a pretty significant amount of effort, and I might still get it wrong, not to mention some workloads will be opaque and closed-source enough that investigating its write characteristics is not an option.
I’m wondering if there’s a more generic, empirical way to determine the typical size of IO for a workload, given that you already have the ability to deploy it? Is there some program or filesystem utility (ZFS or otherwise) that logs and profiles the size of IO operations on a dataset/filesystem, so I can figure out the recordsize from a real usage scenario?
I can see the use in deploying some workloads on an untuned ZFS or on a non-ZFS storage solution until I can determine the right recordsize, and then migrating them to ZFS with decent confidence that I tuned the recordsize correctly.