Topology questions for 4x 16TB HDD + 2x 4TB NVMe NAS

Hello! I recently started planning a NAS build and would like some advice. If this is the wrong category, please let me know! I’m happy to answer with any followup information as needed. Thank you in advance!

Intended use

  • Backups from endpoints (Android phones and Linux machines under a variety of distros)
  • Backups from my primary virtualization server which runs proxmox
  • Format shifted media to be accessed by Jellyfin (movies, TV series, music, books)
  • Photos to be accessed by immich
  • No more than a handful of users using these simultaneously. 95% of the time it will be just one or two.

System

Qotom Q20332G9-S10

Software

Host OS: Proxmox 8.3 on separate 1TB SATA SSD

Specs
  • Intel C3758
  • 32 GB RAM
  • 4x 16TB Seagate EXOS X16 HDDs (fresh factory recerts)
  • 2x 4TB Western Digital SN700 NVMe SSDs (each at x2) (New)
  • 1x 1TB Samsung 870 Evo (OS drive) (Used)

Network

2.5 Gb LAN
1 Gb symmetrical internet connection

Questions

  1. What is a good way to utilize the NVMe drives?
  2. I’m currently planning on implementing 2 drive mirror vdevs as recommended in ZFS: You should use mirror vdevs, not RAIDZ.. Does this advice still hold true? Is resilvering a mirrored, 2 drive vdev more or less risky than resilvering from RAIDZ1?
  3. The aforementioned blog indicates that it’s more stressful on I/O. Does this mean disk I/O or just total system I/O?
  4. Is there a way to test resilvering times with different topologies before putting real data on the pool?
  5. I don’t presently have any backups for the pool, but am planning on getting an offsite backup put together in the not-so-distant future. Does this change anything in the interim?
  6. All the drives I’m planning on using are 4KiB native sector devices currently in 512B sector emulated mode. Should consider swapping these over to 4KiB native sectors?
  1. The NVMe drives are probably not going to be very useful directly. You can use one as a CACHE vdev and one as a LOG vdev, if you’re determined to use them in this build. But it might be better to just use them on another project. (Unless you’re doing NFS, which means all writes are sync writes. If you’re doing that, use one of them for a LOG vdev for sure.)
  2. Less risky than RAIDz1, much higher performance than RAIDz1 on your four available drives. Honestly, the choice here isn’t RAIDz1 vs mirrors, it’s RAIDz2 vs mirrors, with the same 50% storage efficiency but (much) higher performance on the mirrors side vs dual redundancy on the Z2 side. (Personally, I’d still take the mirrors every time. But I monitor my systems. YMMV.)
  3. I don’t understand this question.
  4. Create the pool with a given topology, fill it at least 75% with data, fail out a drive and record the length of the process when you resilver. But you might want to wipe the drive before re-adding it; otherwise it may do a fast resilver that isn’t representative of the experience when you bring a brand new drive to the vdev. (If this sounds like way too much work: my advice remains to go with mirrors, for a resilver time roughly 1/2 of what you’d see on a single four wide Z2).
  5. Nope. Keep working on that backup solution though!
  6. Yes.

Jim has basically answered the question at this point, but I’m curious, how are you connecting these drives to that Qotom appliance?

Thank you for the reply!

Seems reasonable. What kind of monitoring do you use? Is there documentation on setting up a proper monitoring service?

Apologies, I’m not terribly familiar with spinning rust, so the wording isn’t clear. That being said, I did some more digging and I believe I answered my own question here.

When resilvering a mirror vdev with 2 drives, the remaining drive will presumably be doing a long, sequential read at the remaining drive’s maximum speed for the entirety of the resilver. With parity; however, I believe all the drives will need to spend a lot more time seeking as they skip around between parity blocks and data blocks + the math requirements to resilver the new drive logically. It seems there are additional and more varied head movements and across all drives in RAIDZ of any level when compared to a mirror given how the data is written to the disks in each topology.

Meaning, resilvering a disk in a RAIDZ topology will be more mechanically, system I/O, and compute intensive when compared to a mirror topology. At least, that is what I believe after reading through Primer: How data is stored on-disk with ZFS and ZFS RAIDZ stripe width, or: How I Learned to Stop Worrying and Love RAIDZ.

Applying that logic to your original mirror vs RAIDZ blog post seems to fit well, but please correct me if I’ve missed something or have some other misunderstanding.

The Qotom appliance has an SFF-8087 port on the side of it which supports 4 SATA drives. It’s a wonderful feature I wish more devices and especially motherboards implemented as I’m able to connect 4 drives on a mini-itx sized board without any expensive, hot, finicky HBAs which seem to rarely have fully working idle states or ASPM.

See the below pics for more details. Don’t judge it too hard quite yet please, I still have lots of assembly to do lol.

System Overview

Qotom Side

Drive Side

Yes. But perhaps more importantly, a RAIDz pool has fewer IOPS to spare for resilvering in the first place.

Aside from that, imagine resilvering two six disk pools: one of which is a single six wide Z2, the other being three two wide mirrors.

EVERY operation on the Z2 pool blocks every device in the pool.

Meanwhile, only 1/3 of the operations on the pool of mirrors are blocked by the resilver operations, since only one of the three vdevs is impacted by resilvering.

It’s a big difference.

1 Like

For anyone who needs help with the format conversion in future, despite what the Arch Wiki’s entry on advanced formatting and some potential with sg_format given the kernel’s SCSI to SATA command conversion using libata, my drives (P/N: ST16000NM001G on the current SN04 firmware) would not respond to the generic SET SECTOR CONFIGURATION EXTcommands. This is despite the product manual alleging otherwise.

To change the sector size on my drives, I had to use Seagate’s Seachest Lite, which worked flawlessly. Despite what the site seems to indicate, they include a Linux binary in addition to a Windows one.

1 Like