Ceph Node Resilience / Or Other Datastores

zfslover · August 24, 2024, 7:20am

I would like to know how to secure Ceph nodes. What can an attacker do with access to a node? I am looking for a solution where a node compromise (full access) cannot affect the data structure. Is this possible with Ceph? Is there a better file distribution tool for archiving resilience datastores? I am basically looking for something like BitTorrent distribution resilience (if a peer/node is malicious or fails does not matter) but with a convenient way to manage, private (not full blown P2P) and S3 API. Any ideas?

itoffshore · June 9, 2025, 8:46pm

I played around with ceph on Ubuntu (pain in the ass) & I have previously managed Proxmox clusters with ceph - it works great until one node stops. It can also use a lot of cpu.

Shortly I will be using longhorn for storage - this provides multiple redundant copies of your data (Ideally run on nvme & 10G network - the same as you would with ceph) - & is probably more resilient.

If you want something bulletproof from a security perspective - take a look at Talos Linux - it runs I think with less than 50 binaries & configuration is done via API over mutual TLS (no ssh).

At the moment I run 2 clusters on MicroOS (also very secure) - a rolling release always up to date with an immutable root filesystem & the ability to transactional-update rollback to a previous btrfs root snapshot if upgrades fail. Just a small learning curve for selinux. I will be putting my main cluster on Talos soon - their documentation is quite good nowadays. RKE2 in FIPS mode also worth looking at.

You could also run your services under rootless podman, an attacker would have to break out of the container (tricky if you set NoNewPrivileges=true in the quadlet file) - see also podlet on Github. Even if they break out of the rootless container they are still just an ordinary user on bare metal. By then you should have noticed unusual behavior. You can also use podman as a stepping stone to a real kubernetes.

You can for instance convert from:

container => pod => kubernetes yaml (with a single command each)

MicroOS has RKE2 / K3s / Podman in their official repos - opensuse take security seriously. I left my servers alone for 6 months, came back & had practically zero sysadmin to do. A great choice for developers.

For a firewall a good choice is an nftables firewall (thanks to the built in ip set support) with fail2ban & block attackers with ip sets at the ingress level (i.e before packets hit conntrack) - this will block 5-6000 ip’s with practically zero overhead:

Also take a look at netbird (wireguard vpn) - I run some essential internal services in rootless podman over it (things should be more resilient with these outside of the cluster)

Hopefully I’ve given you a few ideas about improved security ;o)