I have next hardware:
2 servers with 8x8TB SSD Samsung PM893 enterprise(?) sata connected via onboard sata;
2 servers with 4x12 SAS HDD WD + 4x960 SAS SSD Samsung PM1643a, via hba controller.
All servers are supermicro 8-bay units, 1xE5 2690 v4, 48GB RAM (can be increased), 2x10GB Link per unit.
One of “hybrid” SSD+HDD servers now is acting as storage node for ~100 VMs running on half-full 6 compute nodes, disk format is qcow2 format via NFS share, virt enviroment is Proxmox.
Storage scheme for this node is LVM on top of HDD mdadm raid10 + lvmcache on top of ssd mdadm raid10. It was made as a test-temp solution and as usual became “prod” somehow. In terms of performance it… well, at least it works for now.
Other 3 storage nodes are empty now and i’m in the middle of searching for best fitting configuration for described setup. Count of VM wll increase to ~600-700, then part of them eventually will move to onpremise private cloud solution, most likely CloudStack (on top of existing Proxmox enviroment or baremetal maybe). Most of VMs have “no cache” async disk profile. All hardware is UPS-backed with graceful shutdown.
VM IO load is purely random, there are dev, test, preprod, ci/cd, databases, java apps, youtrack, teamcities, gitlabs etc. Purely random.
At first i did not want to mess up with ZFS. I know it on very basic level (my homelab happily lives from zfs 0.6 to 2.1.12 as for now), without any deepdives. Supposed solution was to use LVM VDO for both hybrid and allflash storage nodes, but in short - it failed completely after testing. The only thing i needed from it was compression on allflash and vdo+lvmcache on hybrid for both data and cache compression to save iops when backwriting from cache to slow hdds.
I can always do classic mdadm+lvm for allflash and mdadm+lvmcache for hybrid. But maybe ZFS can help somehow? Faster and more intellectual rebuild and data integrity check would ofcourse be a benefit.
Could you suggest ZFS setup best fitting for described above 2 types of storage nodes and max possible storage performance? I know words like ARC, ZIL, SLOG but have no real experience with it. Or maybe leave it all as it is with classic solutions?