1MiB random I/O. If you’re really only concerned about rsync in one direction, then 100% read or 100% write. If that’s the only major workload you expect to be happening when you run it, just run the full-speed test and check throughput, and don’t worry about latency.
If you’re worried about the impact on normal operation while one of those rsync runs is going… do the 1MiB random read or write (as appropriate), and do a separate run with, for instance, the 25/75 R/W random 64K I/O limited to about 25% of your normal max throughput for that 64K random workload, then look at latency (while ignoring the throughput on the first job simulating the rsync run; that’s just there to put pressure on the system to see how bad it makes the latency of your desktop-type load).
It’s probably worth noting that rsync is actually going to require reads AND writes, on the target side, since it’s got to grovel over every block of the target before deciding which blocks it needs to pull from the source. So you’ll have, essentially, large reads of the same blocks twice on the source, vs large reads first and large writes later on the target. Again, this is all assuming we’re talking about your “large media files” as basically a solo load.
There’s also some 4K random I/O from needing to stat
all the files first, but you don’t really need to worry about that if what you’re rsyncing is all large media files; there won’t be enough of them for that phase to make much of an impact. If you ever need to rsync tens of thousands of small files, though… oof. Lotta 4K. Best to avoid needing to do that at all, if at all possible!
Definitely, it just needs to be properly tuned. If you’re working with large media files, you need to have recordsize=1M
at minimum; ideally, larger than that if you’re working with RAIDz (which splits blocks up into pieces distributed across the drives in the vdev). Ideally, you’d like to have 1M random I/O per disk, which would mean eg recordsize=4M
if you’re rocking six-wide Z2 vdevs.
On mirrors, there’s not usually much point in bumping recordsize up past 1M. You’ll see some gains, on workloads with really massive files, but not usually enough to matter.