Hi everyone!
Three friends are building mirrored homelabs and backing up to each other. I’d love a sanity check and your best practices. I’m a tinkerer, not an expert—please be candid (flame me if you must, but be kind ).
Priorities & workloads
- Priority: Resilience > Performance > Capacity (target ≥80 TB usable on the main site after ~20% free space is kept).
- Workloads (Proxmox 9):
- Virtualized UTM (been virtualizing pfSense for ~10 years; OPNsense for ~3 years).
- Proxmox Backup Server (PBS) on the same host (for VMs/LXCs/services; media/data handled separately via ZFS send/recv).
- Media & general file storage, family photos.
- Windows 11 “work PC” VM; SPICE (and NoVNC) are the only remotes that keep working while the guest is on corporate VPN (I believe Proxmox proxying helps).
- Light databases for self-hosted services; some AI tinkering.
- Compression: leaning lz4 unless you convince me otherwise (open to zstd guidance).
Hardware (my site; the other two are similar but smaller)
- Chassis: Rosewill 4500 (15×3.5″), strong airflow.
- CPU/RAM: AMD EPYC 7532, 512 GB ECC (8×64 GB—unlikely to grow; larger DIMMs are too pricey).
- Motherboard: ASRock Rack ROMED8-2T/BCM; all PCIe slots support x4/x4/x4/x4 bifurcation.
- HBA: Broadcom/LSI 9400-16i → SATA Exos HDDs.
- NIC: Mellanox ConnectX-4 (MCX4121A-ACAT) 25 GbE dual SFP28.
- Spinning disks: 15 × 16 TB Seagate Exos (mix of batches, some recerts; all are 16 TB despite “Exos 16/18” naming).
- NVMe (consumer): 4 × 4 TB FireCuda 530 + 2 × 512 GB FireCuda 530. No PLP; endurance: 4 TB = 5100 TBW, 512 GB = 640 TBW.
- Other SSDs in the group: one site has Samsung 983 DCT M.2 960 GB (as I understand, with PLP); another site has unknown consumer Samsung “Pros.”
- Power: my site = hours of UPS; the others ≈5–15 min.
- LAN/WAN: my LAN mostly 1 GbE (UniFi Switch Pro 24 PoE with 2×10 GbE); WAN 1 Gbps symmetrical. Other sites ~50–150 Mbps down / ~25 Mbps up. For “fast restore,” plan is direct 25 GbE DAC between servers when physically co-located.
Pool design I’m considering (please critique!)
- Main HDD pool: 7 × 2-way mirrors + 1 hot spare (14 data, 1 spare).
- Aiming to clear ≥80 TB usable with ~20% free kept.
- Mirrors for IOPS and fast rebuilds. I might later convert some to 3-way mirrors as larger disks arrive—but not now.
- Special vdev (metadata/small files + DDT for “fast dedupe”): 4 × 4 TB FireCuda 530 arranged as two mirrored special vdevs (i.e., special class = stripe of two mirrors).
- Tentative
special_small_blocks
= 32K or 64K to accelerate filesystems/VMs/service configs and small media metadata. - I know lose special = lose pool; hence mirrored special vdevs, cooling, and frequent replication.
- Tentative
- Boot: 2 × 512 GB FireCuda 530 mirrored for Proxmox boot (may also hold ISOs/container images in a small dataset).
- SLOG / L2ARC: No SLOG (I don’t expect sync-heavy exports) and no L2ARC (512 GB RAM + special vdev should suffice).
- Datasets: Prefer many datasets to tune per use case (recordsize/volblocksize, atime, special_small_blocks, dedup).
- Dedup: Not pool-wide. Considering it only where it truly pays (shared game libraries, VM/template libraries, maybe photo duplicates across project datasets). I’d love rules-of-thumb that fit 512 GB RAM and consumer NVMe special vdevs.
Replication, encryption & PBS
- Backups: local PBS first (for VMs/LXCs/services), then off-site. Media/data handled via ZFS send/recv.
- Mesh: full mesh (A↔B, A↔C, B↔C).
- Transport: first full transfers over LAN; ongoing over WireGuard/Tailscale/SSH.
- Plain-English targets: keep data loss small; I can live with rebuild < ~1 week in a disaster.
- Concern: my site will have a special vdev; the others likely won’t. Any caveats sending from special→non-special pools (e.g., where special-class blocks land on the target; performance side-effects)?
- Encryption: I want encrypted off-site backups so each person’s data remains private. Views on dataset-level encryption + raw (
zfs send -w
) sends vs alternatives?
What I’m hoping you can help me decide (and why)
- Vdev layout for resilience-first (still need ≥80 TB usable):
Would you stick with 7×2-way mirrors + spare for IOPS and rebuild behavior, or steer me to 3×(5-wide RAIDZ2) for capacity while remaining robust? Any best practices for spreading drive batches across vdevs and spare strategy (on-pool hot vs shelf)? - Special vdev details (metadata + DDT on special):
Is two 2-way mirrored special vdevs (striped) a sane failure domain, or is a 3-way special mirror worth the capacity hit for home-lab “prosumer” reliability?
What real-world starting point forspecial_small_blocks
would you use for mixed VMs/media/photos/services? (I’m torn between 32K and 64K; I want enough metadata/small-IO acceleration without dragging too many medium files onto the special class.)
Any consumer NVMe pitfalls you’ve seen when used as special vdevs (endurance/thermal), given my FireCuda 530s (no PLP, but good TBW)? - Dedup scope with DDT-on-special:
If I limit dedup to a few datasets (shared game libraries, VM/template libraries, maybe photo duplicates), is this sane with 512 GB RAM and special vdev DDT placement? I’d love rules-of-thumb (e.g., DDT size per TB of logical data, RAM headroom) and how they would fit my profile. - VM disks on ZFS (current best practice):
In 2025 on Proxmox 9 + ZFS, would you put VM disks as sparse files on a dataset (qcow2 or raw) or on zvols?
- If files: preferred dataset props (recordsize, atime, TRIM/discard) and relevant Proxmox settings.
- If zvols: your go-to volblocksize, snapshot/replication considerations, and any current pitfalls.
- Most importantly: what do I truly gain/lose in practice (performance, fragmentation, management) with sparse files vs zvols?
- Windows “Previous Versions” from inside the VM (no SMB if possible):
I’d love for users to restore files via Explorer’s “Previous Versions” inside the VM without exposing host SMB shares. Is there any practical route (e.g., iSCSI/vdisk + host snapshots surfacing in-guest, or managing VSS inside the guest) that actually works well? Or is Sambavfs_shadow_copy2
effectively the only sane option if I want that experience? - PBS on ZFS (for VMs/LXCs/services only):
- Dataset props you recommend (e.g., recordsize, compression choice, dedup=off?).
- For cross-site: use today’s PBS push/sync features, or stick with
zfs send/recv
of PBS datastores? - A resilient but sane retention plan for homelab (I’m thinking: hourlies ~24–48 h, dailies ~14–30 d, weeklies ~8–12 w, monthlies ~6–12 m—feel free to edit).
- Tunables & sensible defaults (fit for my mix):
- Compression: stay lz4, or move to zstd (which level) for VM/media mix?
atime
mostly off? Any exceptions you’d keep on?- Defaults you like for
xattr=sa
,acltype=nfs4
(Samba),dnodesize=auto
,redundant_metadata=most
,logbias=throughput
on media datasets, etc.
- Maintenance, health & monitoring (multi-site):
- Even if “pre-burned,” what burn-in would you still run (SMART long, badblocks, fio patterns)?
- Scrubs: monthly? More frequent at first?
- SMART replacement rules: your thresholds (reallocs/pending/etc.) vs “run to failure.”
- Observability: Would you run per-site Prometheus exporters with federation and a central read-only Grafana at my site, or something simpler? (I’m open—just want reliable alerts across all three sites.)
- Odds & ends / context worth judging me for
- I rarely expose SMB/NFS/iSCSI (I prefer Nextcloud/Seafile/Syncthing/Resilio).
- I know virtualized firewalls have caveats; I’ve run them carefully for a long time.
- Consumer hardware is a compromise—I welcome reality checks specific to this design.