Audio/video stutter during I/O load after switching Linux root filesystem to zfs

Hello, all! Thanks mercenary_sysadmin for setting this forum up! I’ll start off with one of the first technical questions :slight_smile:

I recently switched my (Linux) root filesystem to zfs and I’ve been running into issues where I/O load causes audio/video playback to heavily stutter. It seems to happen mostly during writes, whether it’s a Steam game install sequentially writing a huge file or a build process + ccache writing tons of little object files.

I’m fairly certain the issue is with zfs or my configuration of it because it doesn’t seem to happen with btrfs (+zstd:3) or ext4. What I did for testing was:

  1. blkdiscard the entire disk
  2. Format with EFI partition + single zfs, btrfs, or ext4 partition
  3. Load my Fedora 38 OS image
  4. Boot up the system
  5. Start up a Youtube video in Firefox
  6. Run fio (or anything else that causes I/O)
  7. Audio/video starts stuttering with zfs, but not with btrfs or ext4

To make sure it’s not something hardware related, I tested the same steps (and same OS image) on all of my computers:

  • Laptop: Intel i9-10885H + Toshiba KXG6AZNV1T02 NVME SSD
  • Desktop: Intel i9-9900KS + Samsung 970 Pro NVME SSD (also tried with 980 Pro)
  • Server: Intel i5-13600K + Lenovo PX04PMC NVME enterprise SSD

Without fail, the same thing happens on every system.

ZFS configuration:

  • Fedora 38 + kernel-6.3.8-200.fc38.x86_64 + zfs-2.1.12-1.fc38.x86_64
  • pool ashift=12
  • dataset recordsize=128K (default, not explicitly set)
  • dataset compression=lz4
  • dataset xattr=sa
  • dataset acltype=posix
  • dataset relatime=on
  • dataset encryption=aes-256-gcm

Has anyone else ran into this sort of issue? If so, did you manage to fix or work around it?

I came across ZFS kernel threads priority · openzfs/zfs · Discussion #14258 · GitHub, which seems to be relevant. It would make sense if the ZFS kernel threads are being set to the highest priority nice value. I’m only using lz4 though, which shouldn’t be very CPU-intensive. The thread links to commit Align thread priority with Linux defaults · openzfs/zfs@1229323 · GitHub, which first introduced the thread priority. Maybe I’ll try modifying the source to change the priority. I’d happily trade some I/O performance for better CPU latency.

2 Likes

By the way, I recently came across these two comments on the zfs discussion thread for this issue:

I gave the preempt=full option a try (current value can be found in /sys/kernel/debug/sched/preempt) and that seems to have resolved the issue for me. No more stuttering! This is a better solution than what I was doing before (turning off thread priorities for zfs’ kernel threads). Hopefully zfs will add more preemption points in the future so that this workaround wouldn’t be necessary anymore.

Have you split up the root filesystem into separate datasets at all, or is it just one big dataset with the whole OS in it? Also, do you have any swap enabled, and if so, is it a separate partition of actual swap, or is it a swapfile or swap zvol sitting on ZFS? (Neither of the latter perform well at all.)

You can likely make some pretty significant performance increases by tuning recordsize appropriately, which will require you to create datasets appropriate to different workloads: databases (including flat-file “databases” like sleepycat, sqlite, etc) need much smaller recordsize, whereas media files and documents that don’t need random I/O inside the file want much larger recordsize (ideally 1M).

But with that said, I’m posting this from a Core i7 running ZFS-on-root with Ubuntu 22.04 (by way of ZFSBootMenu), and I’m not seeing the same kind of issues. One obvious difference, though, is that I’m not using encryption. Since it looks like you’re up for carefully testing and experimenting, you might want to try doing a ZFS on root install without encryption on one of those systems, and see if that has an impact on your stuttering issues.

I’m running ZFS on Debian (Sid). The whole OS is on a single encrypted dataset with LZ4 compression, default record size. My personal files/media is on another dataset, same configuration. My swap is also an encrypted zvol with zram (LZ4). /tmp is tmpfs. No issues playing media, 4k60 and 8k30 video play fine. CPU is Ryzen 5850U and Storage is WD (SanDisk) NVMe. Also running preempt=full.