Background:
I am not an I.T. professional, but I am the “tech guy” for my extended family. I am currently in charge of hosting/backing-up several dozen terabytes of files - mostly photos and home movies. Having looked at various options, done what I would consider to be a reasonable amount of reading-up on ZFS, and having played around with ZFS in a test environment for a couple of months, I have come to the conclusion that ZFS might be the way to go.
Problem:
Having read numerous discussions on GitHub, Reddit, StackOverflow, this website, and various other things that I found on the internet, I have come away with the following viewpoints:
ZFS is incredibly safe and robust, and it is perfectly fine to use it as your one and only filesystem.
ZFS has several long-standing bugs that cause silent data corruption that can’t be detected by scrubbing and will propagate into your backups via send/receive, silently corrupting them too; so you should have extra backups on other filesystems, such as BTRFS/XFS/EXT4 etc.
Certain versions of ZFS contain serious bugs and should be avoided.
Certain features or combinations of features are buggy and should be avoided.
All of the features of ZFS are fine and work as intended.
ZFS native encryption is known to cause data corruption when used with RAW send/recv.
ZFS native encryption is known to cause data corruption when used with NON-RAW send/recv.
ZFS native encryption is catastrophically poorly written code which isn’t even being maintained. If you want encryption, use a non-native solution like LUKS.
ZFS native encryption is fine, and is infinitely preferable to using non-native encryption solutions like LUKS.
ZFS has trouble handling sparse files, including VM images.
ZFS does NOT have trouble handling sparse files.
ZFS is more dependable on certain Linux distros (such as Proxmox) than it is on others.
ZFS is NOT more dependable on certain Linux distros (such as Proxmox) than it is on others.
ZFS is more dependable on FreeBSD than it is on Linux.
ZFS is NOT more dependable on FreeBSD than it is on Linux.
Question:
Is there any official or authoritative and up-to-date resource I can go to that lays out, in a clear and concise manner, what versions / features / combination-of-features / combination-of-features-on-specific-versions are known to have problems or are generally best to avoid?
Pre-emptive Replies:
If the answer is to follow the discussions on GitHub & Reddit, that’s how I ended up here, asking this question in the first place.
I’m guessing that some people are going to reflexively say that you should always use the latest release, but let’s be honest; most people, including myself, are going to use the version that’s packaged with the distro they’re using, unless they know that particular version has problems.
I’ve never seen any resource like what you’re talking about.
I’m also not aware of any distro that’s still pushing a bugged version of ZFS. Ubuntu I believe backported the fixes even into 20.04 and 22.04.
ZFS native encryption bugs are like chasing a ghost - people report them, but if they exist, they appear to be QUITE difficult to hit.
The only feature of ZFS I’d be cautious of enabling is deduplication - even with fast dedupe, it’s a narrow set of use cases where it will be a win.
Sparse files on ZFS are fine. There was a kerfuffle a few months back that was caused by a change in how the coreutils packages (cp, mv, etc) were accounting for “holes” in sparse files, but this has been fixed.
Here’s the rub - as with all things, bugs in ZFS are going to happen, as they will with any filesystem. It’s on you to make sure you have sufficient backups so that if you DO have an issue, you can recover from them.
That said, I’ve never seen or heard of a corruption issue so severe that it started corrupting earlier snapshots. (Not in ZFS, anyway.)
Thanks for the reply - it’s very much appreciated. Hearing that there’s never been a problem that resulted in earlier snapshots being affected is reassuring. Some of the things I’d read had led me to believe that entire pools could become unreadable/unrecoverable.
Just a few follow up questions, if you don’t mind:
Do you personally rely solely on ZFS as a backup solution, or do you have a secondary set of backups on an entirely different filesystem?
Given that native encryption bugs are QUITE difficult to hit, do you / would you personally opt for native encryption over non-native encryption, or are you in the “better safe than sorry” camp?
Is there an OS/distro that you personally lean towards when it comes to ZFS management, or conversely, are there distros you lean away from?
I personally like the way NixOS system management is done through a centralised config file, but like Arch, I sometimes get a “living on the edge” feeling with Nix, that I’m not hugely comfortable with when it comes to preserving videos of my granddaughter taking her first steps, etc, etc. Then again, that’s just a gut feeling; it’s not really backed up by anything substantial.
Understandable. In my case, I’m not about to go poking through the files of my kids, and their partners and children - I’m just going to assume that at least some of that stuff is sensitive.
Also, I don’t know how practical this is going to be long-term, but I like the idea of creating a separate dataset for each of them, encrypting each dataset with a unique password, and then giving each family member the password that unlocks their data and their data alone. I’m hoping this might make things easier and less dramatic when I eventually kick the bucket. Of course, I’m going to have to leave instructions on how to do things, regardless of what route I take.
I can sympathize with your questions and concerns; you’re basically me two years ago I have the same background as you and the same reasons for wanting to use ZFS.
I have no authoritative answer to your questions unfortunately. What I can say that I have been using it for two years and I love the features. Snapshots in combination with send/recv, compression, native encryption, all works great. Syncoid/Sanoid is a big help as well. I have yet to encounter any issue that’s not my own fault for not reading the documentation.
My opinion is that the corruption issues are blown way out of proportion, at least for a home user. I don’t worry about it at all. I do keep a copy of my data off site and offline on exFAT formatted drives as catastrophe mitigation anyway. The primary reason is that I want to be able to plug them into any available machine and access my data easily in case of fire/flood (or worse yet, my relatives needs to be able to access the data if I die for whatever reason).
I like Ubuntu server LTS (currently on 22.04). The reason is that it’s widely used, lots of resources out there, and in my opinion has a good balance of stability and modern features. As a plus, it comes with ZFS installed.
Thanks for the reply. Much appreciated. Just curious about the extra backups that you have stored on exFAT – are there any special tools that you’re using to manage them and keep them up to date with your main ZFS backups? Do you just rsync the contents of a ZFS snapshot to an exFAT drive or is there more to it? Are you generating checksum files and using them to periodically “scrub” your exFAT data? Thanks.
I use rsync to backup my data to the external exFAT drives.
There is a --checksum option that can be used with rsync to check that the files on the backup drives are exactly the same as your primary storage (I believe that size and timestamp are used by default). Any files actually transferred with rsync are checksummed in any case, but using --checksum you would detect and fix any errors that has occurred on the backup drives since the last backup. It’s not perfect of course, but it does offer some kind of error detection.
Using it does slow things down a bit. I’ve used it a few times to check but it’s not part of my usual workflow.
Here’s my experience with ZFS, I’ve been using it for just over 10 years. I started with FreeNAS (BSD) and as things developed have used it with Proxmox, and pure Debian.
TL,DR - Ignore the technical aspects and think about what you’re storing. For me, ZFS allows me to sleep well knowing my data is safe and secure.
Here’s some key thoughts
I’ve been encrypting my datasets for a good number of years and over that time have moved my ZFS pools from machine to machine as things evolved and I thought I’d come up with a better way to organize things and have never lost any data.
I did hit an error where I could not send & receive a specific dataset so I guesses it was the encryption bug (I never tried to verify if it was), it was concerning at the time but then I rolled back to the snapshot that was causing the error and copied snapshots back from my backup pool I didn’t loose any data.
The peace of mind of being able to do a send & receive and ‘know’ your data has safely transferred cannot be overstated, the confidence it gives is huge.
It is ridiculously easy to move to a new machine without having to spend hours verifying everything has moved successfully.
Mounting an old snapshot to find things you though you’d never need again is very easy and criticaly very quick.
Being able to have encrypted off-site datasets for backup that you can manage without the keys loaded is great.
I have had failures of disk in pools, mostly consumer grade SSDs and NVMEs and always recovered easily with a disk swap and resilvering.
I could go on but I think what I’m trying to say is ZFS means I’ve got trusted backups of my data going back years and years and I can quickly, confidently & securely get the data where I want it.
As an aside my ZFS journey started with 3x 4TB new WD disks in 2014, I then add 3x 4TB used HGST disks as an additional vdev in 2016. All 6 disks are still spinning, the used drives have 12 years on them. (needless to say this isn’t my primary pool anymore).
My recommendation:
stick with 3.5" disks without and special vdev on you critical backup - they add a layer of risk and complication not warranted without a specific need, build a non-critical pool to test if you want to scratch that itch.
Have at least a primary and backup pool - one pool is not a backup plan.
Whilst TrueNAS Scale seems attractive TrueNAS Core is very stable and stops you using it for more than a NAS appliance & adding risk.
Debian 12 or Proxmox are very stable for ZFS in my experience.
I’ve been using ZFS for almost 9 years, and native encryption since it was available.
My setup is probably sketchy by some people’s standards. My home ‘server’ is an old Thinkpad running Debian’s ‘unstable’ branch. The storage was two WD USB HDDs (recently upgraded to mirrored setup with four HDDs). My daily laptop is also Debian unstable with ZFS on an NVMe drive.
I do keep an offline backup pool I send/recv to from both the server and the laptop, and another luks/xfs backup if all else fails.
Sometime during ZFS 2.1.x, I did encounter a bug with raw send/recv. I had bought a new storage device for the laptop. I created a pool on the new storage, then send/recv to the new pool from the old pool. The received datasets were corrupted and couldn’t be mounted. After a few attempts, they received fine. This bug didn’t affect anything received to the backup pool.
In the time I’ve been using ZFS, I haven’t come across any corrupted files.