Any adjustments for read-ahead strategy?

adaptive_chance · October 15, 2024, 2:26am

I have a corner-case where what should be a steady, long, sequential big-block file read is actually a mixed smallish blocksize schizophrenic chopfest. The devs have no interest in fixing this because reasons.

After examining a Procmon capture I’m fairly confident if the filesystem’s read-ahead could be wildly extended to something like 13-14MB it would bridge most of the gaps, the read would become mostly sequential in nature, and these file loads would be dramatically faster. By dramatically I mean a 60-70% reduction from start to finish.

Does ZFS have such a knob? The OpenZFS defaults don’t appear to be adaptive enough to pick up on this.

zeefizz · October 15, 2024, 3:15am

Since this is a read, the ARC should take care of things (so having enough RAM). Plus, having the file(s) in question in a dataset with a large record size (1MB) should also help.

adaptive_chance · October 15, 2024, 3:41am

ARC works great until the blocks expire. I’m at 96GB RAM and can’t go any higher. L2ARC is helpful for the handful of blocks that find their way in. l2arc_headroom is zero, l2arc_write_max is cranked to low earth orbit, and l2arc_write_boost is cranked to the moon.

I did just discover that I’ve already set zfetch_max_distance to 13MB a while ago. I’ve just bumped it a little higher still…

Recordsize is 1MB. It was 256k earlier but it doesn’t seem to have made any difference here.

zeefizz · October 15, 2024, 4:31am

I don’t know what version of ZFS you’re running but here are a few more pointers (esp. since you’re already in L2ARC territory).

github.com/openzfs/zfs

L2ARC with metadata and MFU only

opened 11:53AM - 12 Jul 24 UTC

closed 08:34PM - 16 Aug 24 UTC

tkittich

Type: Feature Component: Memory Management

### Describe the feature would like to see added to OpenZFS Option to store onl…y metadata (all metadata) and MFU (both metadata and data) in L2ARC ### How will this feature improve OpenZFS? This feature should be the default and would make a perfect L2ARC for home users. Most metadata and most frequently used data would eventually get stored in the L2ARC!! This would be the perfect tiered storage system. Of course, this is currently possible by using metadata special vdev together with l2arc_mfuonly L2ARC. But a special vdev introduces another risk if damaged without enough redundancy. For RAIDZ2 of home users, it's not easy to have 3 nvme's as special vdev because most mainboards only have 2 nvme slots. ### Additional context Currently `primarycache` and `secondarycache` ZFS properties only allow `all`, `none`, and `metadata`. Perhaps this feature could be added as a new setting, i.e. `metaandmfu`, for `primarycache` and `secondarycache` .

(the l2arc_mfuonly tunable is worth looking into for your use case)

Also, if the blocks are expiring, then there is “pressure” on the ARC in that something else is displacing those blocks which shouldn’t happen if the file is “hot”.

I guess there’s a lot more specific stuff that is relevant to your use case that you need to look into (on the other hand, too much “tuning” is also quite fragile - so beware).

mercenary_sysadmin · October 15, 2024, 7:29am

Several. I think you were just getting tripped up because you were using the term “read-ahead” and the zfs devs use the term “prefetch” to describe the same basic concept.

adaptive_chance · October 16, 2024, 6:25pm

Thanks for that. Neat article! I enjoy reading stuff from back in the days when Solaris mattered and wasn’t just a fancy bootloader for Oracle databases.

The prefetch is working a little better than I thought. I’ve discovered and added some zfetch columns to my arcstat: