ZFS for cold storage (reduce spin-ups, power usage, noise)

WildPenguin · November 13, 2023, 9:29am

Hi,

I’ve just started to learn ZFS (I’ve known for it’s existence for along time, but never got around learning / starting using it). It will have a place in my PC and some (hobbyist) servers ;-).

I currently use Snapraid for my cold data, i.e. archive data (mostly media library and backups). It will be written very seldomly, mostly red quite slowly (to play back music/videos and pull some ancient backup) etc.; so for this use case, it is certainly optimal to not spin the whole array to read a single file. Syncing the parity does not need to be done immediately.

This is the use case Snapraid is optimal, typically real RAID solutions are not able to not spin up the whole array when data is red (also: in case too much disks fail, all data will be lost; whereas with snapraid, all disks who have not failed, will have intact data - they are not in any way dependant on each other, parity can just be used to rebuild a failed data disk).

However I could see some benefits doing the same in ZFS, however it seems it can not do it currently, or perhaps I’m mistaken (?). To do what Snapraid currently can, ZFS would need these features:

Make datasets into a vdev which are tied to one physical device (and, optionally, add a quota so that the dataset can not overflow the disk); no other disks should spin up when data from this dataset is red from;
Perhaps a new vdev / raid type is needed; i.d. raidCold or raidZX-Cold? Or, current types could have an option (perhaps named ColdStorage)? The parity can not be striped for this use case, but (as the actual data) tied to one physical device.
Adjust the sync frequency of the parity; it doesn’t matter that much if writes are done seldomly, though, as the disk would be spin up quite rarely in any case (i.e. a “nice to have” feature)
(for partial recovery) be able to re-import individual data disk as a temporary pool+vdev - but again: not really required, but a “nice to have” feature (i.e. take a disk temporarily elsewhere, read the dataset, put it back into the original array). For minimal feature set, only re-silvering would be enough (as with a regular pool).

Now, considering the above, I have two questions:

Would this kind of functionality be welcome in ZFS? I don’t want to make a feature request just yet in github since I feel like I could be adding unwanted noise there =)
Now knowing ZFS internally, are there some underlying design choices which would make implementing this difficult, or is it perhaps even a low-hanging fruit?
Is this a case of “right tool for the job”; i.e. just keep using Snapraid if that is what suits best the use case =)

EDIT: I have mixed pool and vdev in my post. I hope that’s understandable as I’m new to ZFS ;-). I’ve tried to edit this error out. Also, pointed out that parity can not be striped for this use case (but tied to a physical device, just like any non-parity disk)

EDIT: I’ve realized this would mostly be like standard RAID4 … implemented in ZFS. But with the addition of arbitrary number of parity disks (like on Snapraid). On further reflection this is best implemented on FS level, just like Snapraid does - however in case anyone has any thoughts you’re welcome to post them here ;-). It seems like a user had a use-case for a somewhat similar situation in mind here - but nobody mentioned snapraid there. Conlusion: just use Snapraid, if that is what you need!

Cheers!

mercenary_sysadmin · November 13, 2023, 10:33pm

This is not something that we’re looking for inside OpenZFS itself. What you’re looking for here is unraid using ZFS as its “base layer” filesystem.

Hybrid Approach

ZFS-formatted disks within the Unraid array

Pros:

This strategy combines Unraid’s array flexibility, allowing for easy capacity expansion, and ZFS’s advanced features, such as data compression and snapshots.

Idle disks can be powered down to conserve energy when unused.

Jsdoc · November 26, 2023, 8:10pm

In agreement with Jim, I do this with UnRaid, works great, tho I’m evolving slightly away from it and decided to reply with the use case I’m moving toward since I think it more or less matches the SnapRaid theme of your post (IE advantage of JBOD / spin down drives for cold storage vs. advantage of ZFS, and option to do both with UnRaid).

So as I got more time with ZFS on UnRaid – assuming you also want to leverage the docker hypervisor (and app store for dockers / plugins) that UnRaid has along with high IOPS that ZFS gives you, I would think this might be better evolving into dual UnRaid servers. The cold storage / JBOD traditional UnRaid array if used for “just a NAS” can be VERY low powered x86 CPU, 2-4GB RAM, no cache drives, tiny motherboard, etc), perfect use case for a older otherwise defunct PC or say leveraging a newer gen Celeron, etc. — then have a nicely spec’d UnRaid rig with ZFS array(s) so you can leverage the speed / extra protection / and fast IOPS for the docker / VM hypervisor features UnRaid + ZFS gives you. My sweet spot right now would be a 12th gen i5 or i7 Black Friday speical in DDR4 motherboard. For this array, it can essentially be “all ZFS” – while UnRaid still requires at least one “main array” drive (XFS or BTRFS), you can “cheat” and have this be a USB disk without parity that you basically ignore, and not have to waste a rust or SSD drive or take up a SATA slot, etc.

My main point was that mixing the “cold” array and the “hot” (ZFS) array on one rig mostly defeats the advantage of using sleep state, etc., Certainly you CAN blend the different arrays on one rig, but intermittent sleeping is definitely not what your ZFS array “wants” for ideal functionality and of course availability. So all you’d be gaining is the savings of spinning down the non ZFS drives. I’m also betting for most, splitting this up into 2 machines will be a better match for another reason - in that the number of drives you’re wanting to use for each use case would more likely fit the native motherboard ports vs. buying a JBOD PCIe card, more expensive server board, etc.

I’m actually coming to a second conclusion in my personal use case - a small drive count high capacity second ZFS low spec’d machine that sleeps / wakes up a couple times a week to take advantage of Sanoid/Syncoid and have that as the “cold storage” option (IE same “just NAS” cold storage function but with ZFS). UnRaid makes it very simple (with some plugins) to turn for example your shares into databases (lots of options - with/without encryption, read only, etc.) so that you can then leverage Sanoid/Syncoid snapshots in really interesting ways in combo with Share configs on the unRaid side. Here’s a great video to give more details (his whole channel is mostly dedicated to UnRaid tutorials)

part A. https://youtu.be/RTMMPHc9OoU?si=MswKUdeoITL96Em7
part B. https://youtu.be/bXTeftSu0J0?si=sJj90xelSuDw4ss-

Edit: Forgot to mention - the advantage of UnRaid Shares as databases on the primary server also means you can toggle the backup method from ZFS snapshot mode to RSync (IE sending to non ZFS array) - and works either locally (same machine), across the network, across the web, etc. Another cool advantage of the hybridization of UnRaids preexisting features plus ZFS arrays. I posted a screenshot example of my server - in this example, I’m right clicking (middle of pic) to create a snapshot of the Pictures Share, with my shares all set up as separate databases for ZFS. Of note, the “bigpool” is rust drives (dual 4 drive RaidZ1 array) and the Cache pool is a pair of striped NVMEs for max IO for VM’s, dockers (which includes my media servers), important data backed up to the large pool, REALLY important data to be doubly backed up to the secondary server I mention above, probably 2 mirrored pairs of high capacity drives, still working to complete that.
iCloud Photo Sharing