Sanity Check ZFS shared storage for small production Proxmox cluster

About to deploy my first shared storage solution underpinned by zfs and wanted to outline for folks here and see if I’m missing anything.

Problem Statement:

  • Optometrist office, 12 employees, 4 exam lanes, roughly 20 shared windows workstations, 6 or 7 network / storage enabled instruments (imaging devices)
  • Active Directory used b/c of the shared workstations
  • Main LoB app is client-server, windows based, using mysql as database
  • App developers require backups of mysql db to be a full copy
  • DB size is roughly 200 GB, but with slow growth curve

Hardware Stack

  • 2x small epyc servers used as proxmox nodes.
  • 1x nuc-style device used to achieve quorum for proxmox cluster HA of the above nodes
  • 1x storage server running Truenas Scale
    • shared storage for proxmox cluster (NVME mirror pool)
    • storage for mysql database (NVME mirror pool)
    • storage for general purpose file shares, image storage for instruments (sata ssd pool)
  • 1x storage server running Truenas Scale as backup target (sata rust pool)
  • 10g networking for inter-node and storage communications

Storage Setup for Shared Storage

  • VM disks stored as qcow2, dataset presented via NFSv4.2
  • Mysql DB stored inside of qcow2 disk image, dataset presented as NFSv4.2
  • *disk image attached to LoB vm. NTFS filesystem.
  • datasets on sata ssd pool are presented via SMB to various client machines

Storage Setup for Backup Storage

  • 6x 16 TB drives configured as 3x2 mirrored vdevs.
  • Mirror instead of raidz because i was willing to give up some usable storage for easier upgrades when/if office added new equipment and backup growth curve changed quickly.
  • Store snapshots / file-level backups from workstations on datasets here

I tried writing out the above in a more narrative format with more detail, but was afraid my first post here would be my last after I got banned. My testing of the above setup has been ok thus far. The new pieces for me here are using proxmox to host windows domain controllers (no big deal so far), and the zfs underpinnings for the shared storage.

My one outstanding question is regarding backing up the mysql DB. If i’m taking regular snapshots of the dataset holding that qcow2 file, should i be able to maintain consistency of the DB or do I need to worry about it becoming corrupt? Right now, we back it up nightly during a dead period window. I have a script that shuts down the mysql service, copies the files to backup storage, and restarts the mysql service. I had a thought that i could continue to do this and rather than replicating the db itself, snapshot and replicate the dataset holding the nightly copy. I can be certain the data there is static. Does this make sense or am I over thinking it? Does anything else about the setup raise eyebrows / bring up questions?

I appreciate any insights.

Everything looks fine to me. I don’t see any issues with how you’re doing the MySQL backups. I assume you have automatic replication of the production server to the backup, and monitoring thereof.

My only concerns would be what happens if that qdevice gets hosed? I tend to NOT want automatic migrations and the like, just because of the absolute nightmare fixing split brain scenarios is.

1 Like

Good catch. Actually woke up thinking about this today. I have several “pi like” boxes that I can use for this role. I will throw 3 in here so that the odds of a single fail causing problems is minimal.

I am also debating how badly I need auto fail over. I went that way out of habit, but really looking at the setup here, I am not sure it is necessary. The ad dcs are already living on separate hosts and I will have a third, not virtualized dc as well. Biggest single point of failure I have is the storage, since I didn’t have budget for dual controller type solution.

Thank you for the thoughts

1 Like

Seconded. HA is for a well-funded org willing to throw a skilled team at babying it along. It is not for a lean org that needs to get the most bang for the buck out of IT payroll as possible.