Should I learn ZFS as I go? Or have a Master's degree in ZFS before putting it on anything important?

Hi everyone!

Title may sound provocative, but as a noob on the outside wanting to come in, I actually kinda mean it.

Context:
I have spent the last 6 months or so learning Linux, tinkering with old hardware, creating a home server, etc - been having a blast. I am now getting to the point where I want some of the servers / applications / containers to be more permanent, while keeping a playground area to mess around with. So I guess I’m getting to a “production stage”.

I have read a lot about ZFS and it sounds downright fascinating. However, as I’ve been reading, I get the sense that you really need to “get” ZFS, or you run the risk of catastrophic consequences (for example, I just spent this afternoon wiping all of my disks and reinstalling Proxmox with the goal of making ZFS the root Proxmox filesystem, and allocating drives across my nodes thinking about the best way to have all my VMs running ZFS - only to get here with questions and learn that I SHOULD NOT have my VMs formatted ZFS). All of this has given me the impression that ZFS is very powerful, but can actually be dangerous if you don’t know what you’re doing.

Main Question:
My question is: is ZFS something I should “jump in and learn as I go”? Or is it “mess around with it and make sure you REALLY understand it before using it for anything important”?

I was all ready to go today, but the Proxmox/VM realization plus my reading of horror stories has me really questioning things.

Any help or guidance is greatly appreciated - thank you!

PS- while I have you, my plan was to use Proxmox and TrueNAS, but learning about ZFS on Proxmox, setting up Proxmox Backup Server, and wanting to look into Syncoid / Sanoid - should I leave it at that and forget TrueNAS? Any thoughts or input here would also be appreciated!

The ideal situation is to set up a spare host and start working with ZFS as you plan to deploy it in a way that won’t result in data loss if you make a mistake. As you gain confidence you can start to rely on ZFS for important stuff. I don’t recall if that was months or years for me, but I have been using ZFS for about 7 years. My first experimental host was in late 2017 and it looks like I put it into service on my (home lab) production file server around May of 2018. It’s been spreading among my PCs like a virus, but in a good way. :smiley:

I can’t speak to Proxmox or TrueNAS (Scale, I presume) as I do not have direct experience with with either. I’m using Debian for everything possible and Debian derivatives where necessary (RpiOS.) One advantage of Debian is that I’m working directly with ZFS at the CLI level. With Proxmox and TrueNAS (I suspect) you’re using some kind of GUI that makes things easier by shielding you from the details and that has advantages and disadvantages. I suppose other factors might be lack of familiarity with Linux. Straight Debian is kind of jumping into the deep end of the pool (but not as deep as Arch) whereas appliance style packages like Proxmox and TrueNAS make it easier to get going.

The only thing I did wrong WRT sanoid/syncoid was to not start using it soon enough.

HTH,

NB: I’m a huge fan of Debian, ZFS and Sanoid, in case you didn’t notice.

3 Likes

Thank you for your thoughts! Much appreciated. I think you’re right on the gradual transition. Given that 6 months ago, I didn’t REALLY understand IP addresses, and am now staring at a living rom full of old serer equipment, I guess I’d say “take it slow and be intentional” isn’t my instinct. haha. But I think you’re right. Thanks again for taking the time to respond!

Jump in and learn as you go. The things that you might get wrong that would then need to be set back up again differently aren’t “unsafe” types of things, they’re “unperformant” types of things (using ZFS both outside and inside your VMs being one such example).

You aren’t really going to be able to become a master of ZFS before using ZFS. But, again, it’s okay. Learning ZFS isn’t any more dangerous than learning any other file system, and it’s considerably easier to set yourself up for real success, so dive on in!

4 Likes

“as I go” is how we learn everything on linux, isn’t it? Nonetheless, I’ve found these to be useful references for when I was wondering how to do something or why zfs did something surprising:

1 Like

I would add that while you learn and figure things out, I would do “belt and suspenders” when it comes to especially backups. Sanoid and Syncoid are great, but boy will you be glad for your extra good old fashioned rsync-based (or other) backup if you somehow messed something up.

Arguably this is akin to the 2 different “media” in the 3-2-1 backup rule anyways but I think it’s worth spelling out.

2 Likes

A suggestion that I found when useful getting started with ZFS: experiment with pools/vdevs made of ‘virtual disks’ (a.k.a files)

https://docs.oracle.com/cd/E19604-01/821-0406/usingzfswithvirtualdisks/index.html

https://wiki.archlinux.org/title/ZFS/Virtual_disks

This way I experimented with (almost) everything that I new do on my server.

1 Like

Thank you all for your help!

Yeah using truncate to create raw files of whatever size and making pools out of them is a great way to play around with things like replication, resilvering, drive replacement.

A quick little for loop and you can set up whatever size vdev you need.

(EDIT: rephrased and fixed grammar)
I know I’m late to the party, but what I like most is “use the things and try it”.

So I build my first ZFS pools on trash hardware with different RAIDZ and mirror configurations, pulled a SATA cable here for a second, there for a few seconds (spoiler: unfortunately I had to reboot to recover from “too many I/O errors”, but ZFS was working, did never lost anything unexpected and protected the files well), swaped disks, used /dev/sd names instead of stables ones, used a controller and cables removed because of errors (to give ZFS something to do actually), used known bad disks in the pool (spoiler: made ZFS slow before the disk got pushed out as FAILED, replacment disk was synced very quickly), wrote random junk to the raw disks and observed how ZFS handles this (spoiler: it does it very well) and then try to challenge its features to understand how they work. Put a “Proxmox” pool into a Ubuntu machine and forcly imorted it. And so on.
I used an archive file containg three files: random data, a gigabyte fixed data, random data. I copied them to an ar archive (like a ZIP without compression, actually without everything), moved, copyied, unpacked and repacked everywhere around, most times with changing random data and then checked the SHA256 of the fixed data insde the ar file from time to time (spoiler: as hard as I tried, I was not able to show any single wrong SHA256 even after cycling through a petabyte or a few, or with several terabytes on bad disks: it way always correct).
Figuring out how to demonstrate the dedup or compression really works helped me a lot I think. I even bought some old cheap tiny Optane drives to play with. You can use ZFS on small partitions to get quick scrub times and make it quickly exeed space limits.

About the “danger” of ZFS, like using it on a single disk or without ECC RAM, I think it only means that ZFS gets as “dangerous” as ordinary file systems. It cannot guarantee anything if the hardware is not suited, and thus loses some of its guarantees. I think if you stack a VM ZFS on a ZFS host volume, you’ll just lose performance for a mostly useless “double safety net”, but I don’t think this is actually dangerous in any means? Please correct me if I’m wrong (I did not read such a text and would like to learn more).

After all, I think as long as you do things beside working with ZFS, like operating VMs or so, possibly you never will “REALLY understand” all of ZFS anyway, at least I will never ever do, so I think be sure to be able to handle the use cases you’ll need (replacing drives, observing state, finding and handling exeeded disk space causes) and for this working on a trash machine I think is best (for me at least)

1 Like

This is 100% correct. Not dangerous in any way, but may present vexing performance edge cases.

1 Like

I would get a high school diploma first, and get my hands dirty
Then work on my bachelor’s degree, complete a capstone project or two
If you fancy a Master’s degree, sure you can pursue it, but it’s most likely not gonna be a requirement for most jobs

This command will be your friend:

zpool checkpoint [pool_name]

More