Is native encryption ready for production use?

Hi, all!

I was poking around when I saw this comment by muay_throwaway that said, in relevant part, “There are multiple ongoing bugs [with native encryption] (data will sometimes be written unencrypted, snapshots can become corrupted, etc.) that put your data at risk”.

I used a ZFS-on-LUKS setup for several years, which worked but was slightly less elegant than native encryption is. So when I upgraded my setup (and made a new pool), I went with native because my understanding was that it was ready to go.

On one hand, I’m immediately terrified by that comment, but on the other, all software – even filesystems – has bugs. So, I wonder what the community consensus is about the safety of using native encryption on ZFS.

My setup, if it’s relevant: 8x10 TiB WD Red Pros (CMR) in a RAIDz2 setup.

Don’t use native encryption if you’re not willing to lose the entire pool - and possibly any pool you zfs send an encrypted snapshot to. There are quite a few bugs/issues with native encryption, some of which generally result in the complete destruction of the pool. I use it on my laptop because the benefits of encryption on a mobile device outweigh the risks of the OpenZFS implementation.

Edit: A list of bugs related to native encryption, and their impact: OpenZFS open encryption bugs (public RO) - Google Drive

(“Add a comment” is not working so leaving this as a “reply”…)

I’m surprised to discover this topic. The linked discussion and links it contains are very concerning. I’m running TrueNAS, and they seem to support commercial/enterprise deployments using encryption. See Storage Encryption |. I couldn’t find any callout that encryption is experimental/buggy other than the warnings about lost keys == lost data.

I don’t understand the negativity either. I’m sure people have their reasons for speaking out against native encryption in zfs. I think it would be really helpful to see some references to recent / unsolved bugs regarding native encryption that aren’t really obscure edge-cases (Im not aware of any).

I’ve used it for a year or longer on my home server since it was introduced. I remember there was that kernel thing that killed encryption performance (sorry I can’t be bothered to look that up right now). Apart from that never had an issue. After a migration of the pool I didn’t see the point of using encryption on a home server, so I removed it. But then again, I’m just one example and I don’t really do weird things - at least not on my home server! :upside_down_face:

As a matter of fact I am in the process of setting up another server which will run as offsite storage via a wireguard vpn. I will definitely use encryption at rest on it as I will be backing up to it and possibly sending over snapshots etc.

One data point: I’ve used native encryption for about 4 years on a single drive pool (rpool) on my laptop. I have not yet lost the pool and I do have backups of stuff I wouldn’t want to lose. This is Debian with Root on ZFS and was first installed when Buster was Testing. I have reinstalled a couple times along the way, once to get to the two pool setup and another time for ???. IIRC.

The only issue I’ve had is when I configured syncoid to backup the entire rpool on an hourly basis. Eventually I saw a handful of “permanent errors” which were a little concerning. They eventually went away so I concluded they were in something related to snapshots. Nevertheless, I stopped that practice and only send/receive datasets that I really want to backup.

I did the same thing on a Desktop (hourly snapshots/backups of rpool and never saw this problem so I suspect it might have been related to native encryption.

I have used it at work for several years. However there is only about 5 users who have access to that share.

with software the problem is usually not the known bugs, its the unknown ones that are dangerous…

myself I’ve taken the safe route, i don’t use the native encryption, its a big code base to review that will take a lot of time, and the feature is pretty young compared to ZFS codebase.

so in this case i let others find the bugs, my biggest problem with the function from what i have understood of it is the fact that while we still do not have block pointer rewrite,
that means you cannot change a encryption key of existing data in case you need to rewoke it, without a full copy of the encrypted data to a temporary pool or otherwise storage, and then destroy all encrypted data on the pool (snapshots and all) then write data back with a new encryption key, cause as i have understood the encryption key of the pool cant change, when there is existing encrypted data, any preexisting data will still be unencrypted until the data changes, so any data encrypted with the old key will still require the old key to be decrypted…

i might be wrong in this case, but this is the info i gathered when i was looking into enabling the function and passed on it.

I cannot answer the production use question, but perhaps I can add some related info.

At home for the past six months, I have three QEMU/KVM hosts on Linux Mint 21.1 (Ubuntu kernel) each with a zpool of RAID 1+0 using ZFS native encryption. Each zpool has a filesystem-type dataset assigned for each guest, and within each dataset is one or more QCOW2 virtual disks. ZFS with native encryption has worked as well, if not better, than my previous storage stack of the following:

  • aligned partitions
  • mdraid RAID 6
  • dmcrypt/LUKS crypto block device
  • LVM
  • EXT4

Each of my zpools has only about twenty or thirty files (QCOW2 and ISO). My use case may be different from others.

Before building these, I scoured the ZFS bug reports as best I could…not being a developer. My novice understanding was that ZFS native encryption would be OK for me as long as I steered clear of ZFS snapshots and ZFS send. Understanding that I would need to continue using rsync across SSH until the ZFS bugs were squashed, I have had no issues.

Even if I can’t yet use ZFS snapshot and ZFS send, I think I have gained on the aspects of data integrity and transparent compression.

My duplicity backups are battle tested. Worst case scenario, if the zpools died tomorrow, my guests could be restored. – Component: Encryption

Without browsing any of the issues, it’s fair to say that for me: yes, it’s production-ready. A limited use case:

% zfs get encryption Transcend/VirtualBox
NAME                  PROPERTY    VALUE        SOURCE
Transcend/VirtualBox  encryption  aes-256-gcm  -
  • mostly virtual disk files for VirtualBox guests.

Wreckage (carelessness) and survival

The pool began life years ago, on a ~500 G mobile hard disk drive. One day I accidentally targeted the device as an of for dd. I soon realised my mistake, and stopped. No surprise: the overwrite included two of the four labels, and part of the encrypted dataset. Briefly:

  • I removed the very few files that were affected
  • in February, I created a new pool on a mobile HDD that’s twice the size
  • sent from the old pool, received at the new.

Some of the larger files that are now in the encrypted dataset (the vast majority existed before the send):

% bfs /media/t1000/VirtualBox -name "*.v*" -exec du -h '{}' + | grep "G"
 56G    /media/t1000/VirtualBox/Windows/Windows.vdi
5.4G    /media/t1000/VirtualBox/misc/LIVEstep/LIVEstep.vdi
3.2G    /media/t1000/VirtualBox/Linux/Kubuntu/Kubuntu.vdi
2.9G    /media/t1000/VirtualBox/Linux/Raspberry Pi/Rasbperry Pi.vdi
 33G    /media/t1000/VirtualBox/Windows/Snapshots/{492fdaf5-8bbe-483e-9b29-cdd191c19852}.vdi
1.3G    /media/t1000/VirtualBox/BSD/others/hello-0.8.0_0H240-FreeBSD-13.1-amd64 a074801/hello-0.8.0_0H240-FreeBSD-13.1-amd64 a074801.vdi
3.9G    /media/t1000/VirtualBox/BSD/others/SoloBSD/SoloBSD.vdi
6.5G    /media/t1000/VirtualBox/BSD/others/CultBSD/cultbsd-alpha2-ufocult.vdi
3.4G    /media/t1000/VirtualBox/BSD/others/NomadBSD/nomadbsd-130R-20210508.amd64.vdi
4.5K    /media/t1000/VirtualBox/BSD/others/STIGMA/STIGMA.vbox
4.5K    /media/t1000/VirtualBox/BSD/others/STIGMA/STIGMA.vbox-prev
 13G    /media/t1000/VirtualBox/BSD/others/MidnightBSD/MidnightBSD.vdi
1.6G    /media/t1000/VirtualBox/BSD/others/lua-httpd/lua-httpd.vdi
5.0G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/FreeBSD-12.2-RELEASE-amd64.vhd
1.0G    /media/t1000/VirtualBox/BSD/FreeBSD/12.3-dvd, 12.4-RELEASE-p2/12.3-dvd.vdi
 54G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 14.0-CURRENT 1400088/FreeBSD 14.0-CURRENT.vdi
2.1G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD-13.2-STABLE/FreeBSD-13.2-STABLE-amd64.vhd
 20G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 13.2-RELEASE, EFI, latest, UFS, KDE Plasma/FreeBSD-13.0-RELEASE-amd64-KDE-Plasma.vdi
 24G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD-13.1-RELEASE-amd64-dvd1/FreeBSD-13.1-RELEASE-amd64-dvd1.vdi
1.6G    /media/t1000/VirtualBox/BSD/others/hello-0.8.0_0H240-FreeBSD-13.1-amd64 a074801/Snapshots/{13df7bba-c589-4b20-a5a4-d2159a64708a}.vdi
8.2G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{c505f334-00ee-4669-947a-31a74b10505e}.vhd
 13G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{47244ba4-7a18-4f46-8327-66f7a00cdb84}.vhd
2.4G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{b31d111b-ef87-40aa-8b29-c641957894c0}.vhd
8.7G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{75d5ec1e-ce05-4eb3-90ec-2c5f2d75fcc1}.vhd
3.6G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{4bc29f0f-f9ec-4e16-88ad-17431314fece}.vhd
5.2G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{af089c11-8df4-474e-bd57-817a2f20b697}.vhd
3.3G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE, latest, UFS, aborted/Snapshots/{d3553ff2-7c05-4126-b583-268256f90a81}.vhd
3.8G    /media/t1000/VirtualBox/BSD/FreeBSD/12.3-dvd, 12.4-RELEASE-p2/Snapshots/{a8f8a5be-07ba-4a3e-ab86-52e6e333a0b9}.vdi
1.0G    /media/t1000/VirtualBox/BSD/FreeBSD/12.3-dvd, 12.4-RELEASE-p2/Snapshots/{ac315b21-30e7-446f-921d-328a2224f998}.vdi
1.7G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE-p2, UFS/Snapshots/{9d567f00-c024-4d8c-bf4b-c6d4b70cd1bf}.vhd
5.5G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 12.4-RELEASE-p2, UFS/Snapshots/{75d559bb-4a0e-454e-82c1-b6bcfd876fde}.vhd
2.2G    /media/t1000/VirtualBox/BSD/FreeBSD/263501-zdb/Snapshots/{d21794d4-7ed8-439e-863b-ac5024bc6847}.vdi
1.5G    /media/t1000/VirtualBox/BSD/FreeBSD/263501-zdb/Snapshots/{847a4dbb-897b-42d1-b750-c816136f2c98}.vdi
7.1G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 13.1-RELEASE-p5, quarterly, UFS/Snapshots/{eac75e3f-b653-46d8-adad-e6b50918279d}.vhd
6.1G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 13.1-RELEASE-p5, quarterly, UFS/Snapshots/{8ebe983b-7c3a-42bb-bee8-c8e48469fc6c}.vhd
 43G    /media/t1000/VirtualBox/BSD/FreeBSD/FreeBSD 13.2-RELEASE, latest, ZFS, crashed during or after the second freebsd-update for 13.2/Snapshots/{2bb9da39-ca9d-4aa5-be23-af53c87c2249}.vdi

I regularly ZFS send raw stream to a backend. The connection may be dropped at any time.

Shall I be worried?

If encryption is required, is ZFS on LUKS preferred to ZFS native encryption on Linux servers (neglecting the fact that some metadata data is not encrypted with ZFS native encryption)?

The clients may or may not use native encryption, and send unencrypted snapshots to the server.

Been using it for the last 2-3 years at work now. Multiple petabytes.

All partitions are encrypted using ZFS native encryption.

1 Like

Have you used frequent snapshots and send/recv features on a natively encrypted datasets and zvols?

We aren’t using snapshots or zvols at this stage. We are considering using snapshots for send/receive in the future but it’ll need application support to ensure the data is all clean beforehand (workload is Cyrus imap).

I have concerns that ZFS native encryption with unencrypted zfs send of a snapshots is going to work properly. Suffering from a constant permanent errors in zpools after trying to implement ZFS native encryption plus autosnapshots synchronization using sanoid/syncoid.

This works fine for me since years. I am using syncoid and I am not using “-w” send option. I am sending snapshots from one encrypted dataset to another encrypted dataset and the only send option I am using is ‘-L’.

The pools are internal on SATA drives (4 drives in RAID10) and external on USB (2 drives in RAID1 and 8 drives in RAIDZ2).

I am doing frequent snapshots and backups and never had an issue.

Sorry for the late reply, but I just joined this forum.

I just joined this forum