Hi,
I recently migrated CI machines to ZFS (ZFS on Root, ZFS is the only filesystem overall) and noticed that CI pipelines are taking more time to finish.
I tracked down the performance degradation to docker tasks:
- docker daemon is using
storage: overlay2
- docker’s
data-root
is set to default path -/var/lib/docker
- docker’s
data-root
path is in the same ZFS dataset as OS’ (Ubuntu 24.04) Root:root@test-vm:~# zfs list zroot/ROOT/ubuntu NAME USED AVAIL REFER MOUNTPOINT zroot/ROOT/ubuntu 15.7G 157G 14.2G /
With such setup, particular CI pipeline takes ~300s to complete, but if I introduce new dataset zroot/docker-data
and configure docker to use it as data-root
(in /etc/docker/daemon.json
) - the same pipeline completes in ~50s. I run docker system prune -af --volumes
between my tests, so docker’s caching shouldn’t influence my test results, which are indeed consistent.
New dataset zroot/docker-data
should have all the same properties as zroot/ROOT/ubuntu
, but please let me know if there is something worth checking, here are some inherited ones, that I set while creating the pool:
root@test-vm:~# zfs get compression,acltype,xattr,relatime,recordsize zroot/docker-data
NAME PROPERTY VALUE SOURCE
zroot/docker-data compression lz4 inherited from zroot
zroot/docker-data acltype posix inherited from zroot
zroot/docker-data xattr sa inherited from zroot
zroot/docker-data relatime on inherited from zroot
zroot/docker-data recordsize 128K default
root@test-vm:~# zfs get compression,acltype,xattr,relatime,recordsize zroot/ROOT/ubuntu
NAME PROPERTY VALUE SOURCE
zroot/ROOT/ubuntu compression lz4 inherited from zroot
zroot/ROOT/ubuntu acltype posix inherited from zroot
zroot/ROOT/ubuntu xattr sa inherited from zroot
zroot/ROOT/ubuntu relatime on inherited from zroot
zroot/ROOT/ubuntu recordsize 128K local
root@test-vm:~# zpool get ashift zroot
NAME PROPERTY VALUE SOURCE
zroot ashift 12 local
root@test-vm:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 228G 63.5G 165G - - 13% 27% 1.00x ONLINE -
zroot/ROOT/ubuntu
usually has ~50 ZFS snapshots, but I don’t think that this could be the reason for such performance degradation?
A bit more info about the system:
Instance type: VM at Hetzner Cloud
OS: Ubuntu 24.04 (server edition)
ZFS: zfs-2.2.2-0ubuntu9.2 , zfs-kmod-2.2.2-0ubuntu9.2
Docker: Docker version 28.2.2, build e6534b4
Docker's storage: overlay2
ZFS on root, using ZFSBootMenu
To summarize:
Docker `data-root` value | ZFS dataset | CI Pipeline duration |
---------------------------------------------------------------------
/var/lib/docker | zroot/ROOT/ubuntu | ~300s
/docker-data | zroot/docker-data | ~50s
Any ideas why new dataset can perform that much better, while still being in the same ZFS pool?
By the way, I already tried docker’s zfs
storage driver, it performed even worst (the same pipeline took ~90s) and in general I would like to understand why new dataset is introducing such performance boost, I’m not intending to move to docker`s-zfs-driver, zvol with ext4 (~50s) or something else.
Thank you!