Seeking zpool layout recommendation for NAS/Server

trumee · August 8, 2024, 1:55pm

Hello,

I have been running a RaidZ2 pool for few years with 5x4TB WD Red drives. The pool status is

$ zpool list -v tank
NAME                                                  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank                                                 21.8T  18.7T  3.01T        -         -    27%    86%  1.00x    ONLINE  -

The host is running Archlinux with 10G SFP+ NIC.

I am looking to upgrade the pool and have bought 8x20TB drives. The drives are a mix of shucked WD drives, retail Red Pro drives and Seagate Exos re-certified drives. It is a challenge to buy drives (need to make international trip to USA!), so my intention to store some of these drives in cold storage. That way if a drive fails in production i can replace it from cold storage.

The primary criterion i have is low failure risk since the data is all irreplaceable data (family photos/documents). However I would like to enhance file transfer speed as well with the 10G NIC.

Any suggestions?

mercenary_sysadmin · August 8, 2024, 8:23pm

Either six-wide Z2 plus two on-hand spares, or one seven-wide Z3 plus one on-hand spare, given the disks you already purchased. (My recommendations might have been different if you were asking what to buy.)

I’m really not a big fan of Z3, but if your only real consideration at this point is “most resilient pool I can manage” then a seven-wide Z3 (this is an “ideal” width–not absolutely necessary, but a good idea nonetheless) plus an on-hand spare is about as good as it’s going to get, topology-wise.

BUT with that said, if you REALLY want to make sure you don’t lose this data, forget about the topology, what you really need is monitoring and backup. Monitoring to let you know when a failure occurs–so you don’t spend three months blissfully ignorant of two failures that already occurred in your Z2 or three in your Z3–and backup for the many, many types of failure that redundant drives in one host can’t mitigate.

If you can pick up a second machine and set it up as an automated backup host, pulling hourly daily and monthly replication from the main host, that’s going to be a far better hedge against dataloss than extra redundancy. If you are willing to do that, my recommendation changes: now I’d say do either two 2-wide mirrors or one four-wide Z2 on each host; either way giving you roughly triple the effective storage you have now on your five-wide Z2 on 4TB drives.

You do still need the monitoring, though, if you’re really taking this seriously.

trumee · August 9, 2024, 5:07am

Thanks for your recommendation. I think a Z2 should be good enough for me. That way i can put away an additional disk in cold storage.

I do have a partial backup of the pool made possible your sanoid tool (thank you!). The partial backup is for the important stuff. This data is mostly static.

The same pool is used to host docker and LXD containers. This is where the performance for the pool is important.

My initial gut reaction was to use 5 wide Z2 layout mimicking what i have at present. That means i could put 3 disks out of total 8 in cold storage. However, i guess the performance wont be the best. I access this server over NFS and transfer speed over the LAN at present is not that great (nowhere what iperf shows i.e. 10G). This is with a NVME device setup as a LOG (Intel SSD 750).

The host is a Supermicro SC846 chassis with a SAS2 backplane with expander (SAS2-846EL1). I have procured a SAS3 backplane and am going to make a switch soon

I am guessing the performance will be better with a two 2-wide mirror than a four-wide Z2. Is the risk tolerance the same as well?

mercenary_sysadmin · August 9, 2024, 1:06pm

Mirrors will DRASTICALLY outperform Z2–at the six disk level, you’re generally looking at triple the write IOPS and six times the read IOPS. You also get easier management–you can easily add a fourth two wide mirror to get more space, and even after your bays are fully occupied you only need to replace two disks (both sides of any of your vdevs) with larger ones in order to get more space.

But you do get a bit more risk exposure with it. You can lose more total disks with mirrors and keep the pool operational… But you can also lose the pool on the second failure, if you’re unlucky about which drive fails second. By contrast, a single Z2 vdev survives ANY two consecutive failures, but always dies on the third.

trumee · August 14, 2024, 9:16pm

If i lose the pool, is it still possible to import the pool in a read-only mode and copy the files over?

Since i have disks from different vendors, should i keep the same vendor in a mirror or mix vendor in a mirror? I was wondering if the latter approach is better since i will minimize failure in the same mirror. However, the mismatch of drives may become a performance issue e.g. 7200 rpm and 5400 rpm in the same mirror.

mercenary_sysadmin · August 14, 2024, 9:18pm

Losing the pool means exactly what it sounds like. If you lose an entire vdev–which means all disks in a mirror vdev, or more than (p) disks in a RAIDz vdev–you lose the pool with it.

You could in theory recover SOME data via forensic analysis, but there will be few or no whole, undamaged files in there, because ZFS distributes blocks between vdevs as they come, it doesn’t distribute files. So even if you can recover the remaining blocks, what you have won’t be a “filesystem” it will be a giant mess of data you could try and untangle in order to get some percentage of the content of any individual file back (assuming you can find the remnants of that file).

Joghurt · August 15, 2024, 10:07pm

You’re thinking about this wrong. No matter what you do, if you can’t afford to lose the files, you need another place where they are backed up. ZFS can’t do magic, if say all disks write the wrong data because of a severe hardware fault, you’re out of luck. Or if lightning strikes and fries your server, or just somebody breaking into your home and stealing it.

You can minimize the risk of data loss, but it’s never zero, so make sure to backup the irreplacable data into a second location.