TL;DR: a long standing but very difficult to hit bug got a little easier to hit in OpenZFS 2.2. This one is kinda similar to the old hole_birth bug, in that it screws up (falsely) detecting holes in sparse files, and “preserves” those falsely detected “holes” in copies.
Now that the bug has been detected and identified, the above link is to the FreeBSD team’s patch for it. This work should carry over to Linux reasonably quickly, one way or another.
FYI ZFS 2.2.2 and 2.1.14 reportedly fix this bug
It is conjectured that this race condition should occur relatively rarely in normal filesystem usage prior to 2.2; it would require reading from a file essentially immediately after it is created or written to, resulting in an erroneous detection of a hole during the read.
However, in the use case of virtual disks sitting on a ZFS array, would this be a particularly important concern? Virtual disks may undergo many concurrent reads and writes, so that may be a particularly risky situation for this bug to occur. However, given rarity of previous reports, I don’t know if realistically this would take place. In particular, it seems to occur during highly parallel computations, like compilation.
What are your thoughts?
My thoughts are that nobody should try to use 2.2.0 in production; wait for the patch.
I mean in pre-2.2.
The bug goes all the way back to early ZFS (c. 2006), but it was very unlikely then. However, it was the introduction of
zfs_dmu_offset_next_sync in 2.1.4 that particularly increased the likelihood of this bug occurring.
“Back up your shit, monitor your backups, practice restoration.” Same advice as any other day. If you had any upgrades beyond 2.0 planned that you haven’t actually done yet… hold off a bit. Other than that, it’s just another day, and we’re just waiting for the bugs to get ironed out.
It’s almost impossible to hit in normal circumstances, or really is impossible to hit in normal circumstances unless it’s a really really specific type of workload the it’s almost impossible to hit because the race is so tight.
See this email of RobN’s on the zfs on Linux mailing list.
There's a really important subtlety that a lot of people are missing in this. The bug is _not_ in reads. If you read data, its there. The bug is that sometimes, asking the filesystem "is there data here?" it says "no" when it should say "yes". This distinction is important, because the vast majority of programs do not ask this - they just read.
Further, the answer only comes back "no" when it should be "yes" if there has been a write on that part of the file, where there was no data before (so overwriting data will not trip it), at the same moment from another thread, and at a time where the file is being synced out already, which means it had a change in the previous transaction and in this one.
And then, the gap you have to hit is in the tens of machine instructions.
This makes it very hard to suggest an actual probability, because this is a sequence and timing of events that basically doesn't happen in real workloads, save for certain kinds of parallel build systems, which combine generated object files into a larger compiled program in very short amounts of time.
And even _then_, all this supposes that you do all this stuff, and don't then use the destination file, because if you did, you would have noticed that its incomplete.
So while I would never say that no one has ever hit the problem unknowingly, I feel pretty confident that they haven't. And if you're not sure, ask yourself if you've ever had highly parallel workloads that involve writing and seeking the same files at the same moment.
For those on Ubuntu jammy a patch has been posted, if anyone wish to test :
Ubuntu launchpad : Multiple data corruption issues in zfs