LXC container can not see inside bind mounts containing ZFS Datasets

tmo · February 13, 2024, 1:03pm

Hey all! I have a work around for the above, but I would like some help understanding why the workaround is necessary. I’m trying to better understand the inner workings. All my research leads me to file permissions and (un)priveledged containers, both of which I have ruled out.

Issue:
If I bind mount a top level directory to an LXC, then no process within can recursively see inside nested ZFS datasets, but can do so with directories.

The work around is to create multiple bind mounts for each sub directory and dataset, but I am not sure why this works.

Example:
Host:

admin in ~ at lab-dl360 % zfs list          
NAME                                USED     AVAIL  REFER      MOUNTPOINT
test-pool                           11.8T    848G   96K        /test-pool 
test-pool/vm                        11.8T    848G   104K       /test-pool/vm
test-pool/vm/test-zfs               7.49T    848G   7.49T      /test-pool/vm/test-zfs

admin in ~ at lab-dl360 % mkdir /test-pool/vm/test.d

admin in ~ at lab-dl360 % touch /test-pool/vm/test.d/example.file

admin in ~ at lab-dl360 % touch /test-pool/vm/test-zfs/example.file

create any LXC. If I set the mount as such:

mp0=/test-pool/vm,/host

no process can see example.file in /host/test-zfs, but have no issues with /host/test.d

If I instead set the mount as such:

mp0=/test-pool/vm/test.d,/host/test.d
mp0=/test-pool/vm/test-zfs,/host/test-zfs

every process can access both /host/test-zfs and /host/test.d

Can someone help point me in the right direction to understand why this works this way?

If I was not clear, please let me know. This is my first time using a discourse-based forum.

Thanks!

tvcvt · February 13, 2024, 3:43pm

I think your work-around is as good as it gets. Each dataset is like its own file system, so nested datasets aren’t treated the same as simple directories. I’ve seen this same behavior with NFS exports on ZFS where I’ve had to mount multiple levels of datasets.

quartsize · February 13, 2024, 7:56pm

With NFS you can export with the crossmnt option. I’m not sure if there’s something similar for bind mounts.

tvcvt · February 13, 2024, 8:50pm

Ooh, very good to know. Thanks for the heads-up on that one.

derkades · February 14, 2024, 1:59pm

Not familiar with LXC, but with manual mount -t bind mounts you can use one of the shared/slave/rshared/rslave options to propagate child mountpoints. See man mount.

Docker has a bind-propagation option for volumes, LXC probably has a similar option.

Hopefully this gives you some terminology to search for

tmo · February 17, 2024, 3:15pm

Thanks all! I was able to figure it out based to the suggestions here.

The solution is to change the mount point line in /etc/pve/lxc/<vmid>.conf from:
mp0=/test-pool/vm,/host
to:
lxc.mount.entry: /test-pool/vm host none rbind,create=dir,optional 0 0

per man mount the --rbind option is needed to recursively bind-mount nested filesystems.

LXC natively supports this as an option:

from here: Linux Containers - LXC - Manpages - lxc.container.conf.5

Unfortunately Proxmox’s PCT tool does not seem to support this with the mp0=xzy syntax in the conf file. It will take the standard LXC option entries though, which is what I did above.

Thanks for the help, and I hope this helps someone else avoid this headache in the future.

tmo · February 17, 2024, 3:20pm

Thanks! I was able to get it work with --rbind instead of --bind as the mount option.

tmo · February 17, 2024, 3:21pm

Thanks, this was what I was missing, combined with the needed flag not being exposed by default Proxmox syntax