To l2arc_mfuonly or not to l2arc_mfuonly?

adaptive_chance · October 4, 2024, 4:44am

My ZFS is exclusively for block (excepting the local OS which runs nothing significant).
All ZVOL + iSCSI over 10Gb network. 4x slow rust and a small Optane LOG + CACHE.

The storage is used thusly:

VMware + Microsoft lab: General purpose VMs with no substantial data ingestion/egress. Not fully static but also not very dynamic. Mostly the same VMs fired-up over and over.
Block storage for a Windows desktop w/ NTFS-formatted volumes. I have an AI lab of sorts hosted on iSCSI to leverage ZFS snapshotting, data integrity, etc.

The VMware setup loves l2arc_mfuonly. Repeatedly-accessed blocks rapidly become mfu thus CACHE resident. L2ARC persistence is a game-changer.

IMHO those who interact with VMs on a small ZFS rust pool (therefore they feel the nasty storage latency) ought to have a modicum of L2ARC to cover booting their VMs. I can boot a VM and watch a nice string of 100s in the l2hit% column.

The desktop access patterns are very different. Probably 95% read and 95% of said reads are long sequential (big block) file access. Each file is 6-8GB and gets read perhaps 10-15x per workday. Maybe 10-15 of these will be read on any given workday.

My CACHE can readily saturate the 10Gb network. My rusty shitpool couldn’t saturate two tin cans and a string.

l2arc_mfuonly appears to do exactly with it says on the tin – It prevents the desktop’s I/O from purging CACHE. My problem is I want to let this stuff into cache and not have it blow out all of my VM blocks. Which is exactly what happens with l2arc_mfuonly turned off.

tl;dr:
l2arc_mfuonly=1 gives CACHE to my VMs at the desktop’s expense.
l2arc_mfuonly=0 (Miley-mode) turns my desktop into an mru wrecking ball.

I don’t see any helpful-looking knobs for this. Is the prescription more ~~cowbell~~L2ARC?

OtherJohnGray · October 4, 2024, 8:44am

L2ARC gets an undeservedly bad rap, now that it’s persistent, and if you have repetitive read workloads like operating system files. IMHO you can never have too much ~~cowbell~~L2ARC.

(I have 5TB on a pool that handles both root-on-zfs and fileserving photos for editing over 40GbE. The only time I notice that it’s not locally attached nvme is when I upload a big batch of files from local nvme, which I almost never do.)

jay_tuckey · October 4, 2024, 9:06am

Out of curiosity, can you have two l2arc devices with different settings? Maybe you could split the cache in half?

adaptive_chance · October 5, 2024, 10:19pm

Not that I know of.

Since posting the above I’ve tripled my L2ARC space and left l2arc_mfuonly=1. I’ve watched some of those sequential desktop blocks meander their way into l2arc. So I probably just need to give it more time.

Buried in the noise from fruitloops who insist nobody needs L2ARC and it doesn’t work anyway is the advice that L2ARC ought not exceed 10x max ARC size. I’m somewhere around 3.5x my 96GB RAM so I think this will be fine.

adaptive_chance · October 5, 2024, 11:10pm

Here’s a question for the gang: If arcsz reports 1.6G immediately after system restart (and before any significant data access has occurred), would this number approximate my L2ARC overhead given the information that ARC space is used for tracking L2ARC blocks?

I’ve heard how to approximate L2ARC overhead – something like 70 bytes times something times recordsize – but I’ve changed the pool’s recordsize several times so I don’t think that rule of thumb works here.