Possible alternative snapshot scheme

malgam · April 20, 2025, 6:11am

I’m thinking about implementing a zfs snapshot scheme that takes a snapshot at regular intervals and then destroys it if zfs diff shows no difference from the previous snapshot. Some of my datasets only have infrequent changes, so the result would be a relatively sparse set of snapshots. However, any file system changes that span the snapshot interval would be retained indefinitely.

With the typical frequent/daily/monthly/yearly scheme, changes that are made and reverted within, say a month, could easily be lost when the more frequent snapshots are automatically destroyed.

I actually had a problem where a user made a large number of file changes over a weekend and then a poorly considered zfs rollback lost all of this work. This caused me to rethink my snapshot scheme.

I feel like I’m at odds with accepted thinking on this, so I would like to hear from more experienced people about what problems I might have with my proposed scheme.

charles · April 21, 2025, 10:11pm

What are you trying to accomplish? Why do you want to set this up?

In case you are unaware, ZFS snapshots do not consume any space if there has been no changes to the data.

malgam · April 22, 2025, 1:43am

Thanks for replying charles.

I understand that snapshots are very space efficient, but even the accepted snapshots schemes (like Sanoid) destroy snapshots over time, presumable to reduce the long list of snapshots that would otherwise be produced. As I see it, the problem is that only files that exist at the time of each snapshot end up being retained. By the time the snapshots are reduced to monthly or yearly, there is the possibility that a file will be lost if it is accidentally deleted or modified shortly after its creation. With my proposed scheme, any file that exists for longer than the snapshot time interval (say, 15 minutes) will be kept indefinitely and all “empty” snapshots will be removed.

I hope this makes sense.

Regards, Mal.

charles · April 22, 2025, 2:02am

Interesting. I understand the proposed idea but I still do not understand your goal. What are you trying to accomplish? I ask it very broadly because sometimes someone is too focused on “how do I setup XYZ” instead of “how to I increase my systems IOPS performance?” The point of my question is to step back to understand your why before focusing on the how.

That said, I do agree that although snapshots are highly space efficient, it’d generally recommended to not have an extreme quantity of snapshots. For example TrueNAS will throw an informational warning if your system has over 20k (IIRC) snapshots.

Every 15 minutes seems like a VERY frequent snapshot interval. If your goal is to keep snapshots forever and lose as little live data as possible, perhaps consider a two tiered approach:

Monthly or weekly snapshot, retained indefinitely
Snapshot every 15 minutes, retained for 24 hrs

Regardless, I’m happy to brainstorm and will reiterate my prior point: Stepping back to understand what problem you are trying to solve will help myself and other community members to share helpful advice.

If your goal is to never delete files ever- I do not know if possible but perhaps there is a way to set the ACL to prevent deleting files? I do not know why you would want this but I’m sure some applications exist.

malgam · April 22, 2025, 3:36am

I have some datasets which I consider important, including family photos, legal documents, family history, etc. For these, I want to be sure to preserve all files, even when they might be accidentally deleted or modified.

The important datasets have fairly infrequent changes, so I expect to have a relatively low number of retained snapshots.

I think files that are locked up in zfs snapshots are pretty immune to accidental deletion, particularly considering that I regularly replicate my datasets to another machine.

furicle · April 22, 2025, 2:12pm

I’m my opinion, the extra snapshots are deleted as a space saving. If the same file changed daily, then keeping only monthly snapshots saves space (and loses data)

Your idea sounds interesting - I wonder if I’m missing something though

mercenary_sysadmin · April 22, 2025, 5:31pm

You might be better off rigging up an incron-based system to automatically take a snapshot whenever something is changed or modified in one of your Super Most Important But Rarely Changed datasets. But you’d also need to be very careful about limits, and how it’s triggered.

I get what you’re going for, here, but it’s pretty obvious not everybody does. So let’s go over some basics:

Empty Snapshots are Cheap–Not Free

Although empty snapshots take essentially no real disk space, they aren’t quite “free.” The details vary based on hardware and workload, but essentially, you’ll begin to see some performance implications at around the 100-snapshots-per-dataset level, regardless of whether those snapshots are large, small, or even “entirely empty.”

This doesn’t directly manifest in poor performance of normal activity, but enumerating the list of snapshots–as Sanoid must do, to figure out whether or not it should be taking new snapshots, for example–starts getting painful. I’ve seen systems with a few thousand snapshots in one dataset take minutes to enumerate the list of snapshots.

This matters, because it explains why Sanoid doesn’t default to taking “frequentlies” every five minutes–and why Sanoid has relatively small defaults for total numbers kept of hourly, daily, and monthly in the default templates provided.

It also explains why somebody who does want to take a ton of “frequentlies” would want a way to thin the most useless ones.

How such a scheme might work

Basically, if you wanted to work this in conjunction with Sanoid, you’d run a helper script that goes behind Sanoid, does a zfs diff of every “frequently” periodicity snapshot, and–should said snapshot diverge from either the snapshot most recently taken before it, or the snapshot taken immediately after it–rename it to something outside Sanoid’s naming scheme.

This would prevent Sanoid from trying to destroy the frequently-periodicity snapshot that contains unique data. It would then be on you to eventually destroy said snapshots.

Why this approach isn’t widely useful

Any dataset which ends up being the target of any ephemeral data–for example, cache files, temp directories, database back-ends (full fat or SQLite file-based type, permanent or short-lived) will cause every single frequently to be kept, because there will always be new data.

Any mounted dataset will frequently also be a problem, since any number of operations you don’t really think about much on a day to day basis will also cause data or metadata updates. For example, simply lsing a directory in a dataset will be sufficient to cause metadata updates, forcing the frequently taken in that time period to be retained!

If this is really important, consider incron instead

Obviously, the datasets where this sort of approach even can be useful are pretty limited. In order for this idea to be helpful, we need to be looking at datasets which are very important, but also very rarely modified. And we have to be careful about simple metadata updates and changes forcing snapshot retention that we’d prefer not to happen!

So instead of taking frequent snapshots with Sanoid, you might instead want to consider using incron to force a snapshot to be taken whenever a file is added, changed, or deleted. As a bonus, if you make your helper script name those snapshots in Sanoid’s name format, Sanoid can manage pruning them for you–for example, you might have an incrontab like this:

/path/to/folder IN_MODIFY /path/to/snapshotscript.sh

Assuming you baked in enough logic to your “snapshotscript.sh” to figure out what to name a snapshot it takes according to your preferences and needs, that would cause a new snapshot to be taken each time a file in /path/to/folder/ was modified.

This approach has some downsides of its own, of course–most notably, last time I checked, incron does not support recursion, meaning that the incrontab above would fire for creating the directory /path/to/folder/childfolder, but would not fire for then creating the file /path/to/folder/childfolder/testfile.

Probably not an approach that works for most

Ultimately, I don’t think you’re going to have much real-world luck with your idea, even if taking a snapshot separately rather than monkeying around with Sanoid’s snapshots.

I absolutely do not want to discourage you from creating such a thing, if you still think it might be useful for you. I’m just advising some caution; there are some very real pitfalls to automating snapshot orchestration in terms of system operation, resource exhaustion, and potential privacy/infosec concerns (think Microsoft Recall, for an example of how to do it horribly, HORRIBLY wrong).

Good luck! If you get your idea off the ground and find it helpful, I’d love to hear more about it here.

malgam · April 23, 2025, 11:01am

Thanks Jim for taking the time for your detailed response. You’ve given me plenty of food for thought. I’ll play around with some of these ideas and post again when I have something to report.

Regards, Mal

charles · April 26, 2025, 6:53pm

I never knew this. thank you My system definitely has too many snapshots- when I try to browse via the WebUI, it hangs forever. I usually just drop to the CLI but I should probably fix the root cause

malgam · April 29, 2025, 5:08am

I actually started to implement this approach and quickly came to the conclusion that it’s not as simple as it sounds. Sanoid can produce a number of snapshots almost simultaneously (frequently, hourly, daily, etc) and renaming a frequent snapshot will take it out of the Sanoid namespace and possibly cause confusion when the script is next executed. For example, you don’t want to end up comparing a frequent snapshot with an hourly one that was created at about the same time. Rather than trying to figure this out, I am just taking a new snapshot with my own naming scheme when a frequent snapshot differs from the previous frequent snapshot.

As you suggested, Jim, this may not turn out to be practical.

mercenary_sysadmin · April 29, 2025, 5:12am

This one’s pretty simple to figure out, actually: there’s not much point in trying to eliminate anything but your most frequent periodicity, if you’re trying to implement this.

So, if your system takes frequentlies, only look at frequentlies–ignore the hourlies, dailies, and monthlies entirely.

If your system doesn’t take frequentlies, but does take hourlies–ignore the dailies and the monthlies, only look at the dailies when you’re looking for snaps to eliminate. (Hint: assuming you take a daily and a monthly that don’t diverge, yes, this means… the daily will be destroyed, but the monthly will remain.)

You get the idea. It’s still almost certainly not worthwhile, but this is how you’d do it.

mecifa · August 3, 2025, 5:27pm

Out of curiosity, do you know why it’s so expensive?

In my experience, with atime=off, reading a directory results in 0 bytes written. Also, zfs diff ignores atime updates.

mercenary_sysadmin · August 3, 2025, 5:48pm

No, I don’t. I’ve asked, but I’ve never gotten an answer that felt like an answer, if you know what I mean.

Yes, turning off atimes from being updated. It doesn’t do much for other issues that modify data in tiny ways that most people don’t spend much time thinking about. For example, every time you ls a directory with images from a Windows box, that Windows client updates a thumbnail index…

mecifa · August 3, 2025, 11:51pm

I think we’ve all been there.

I didn’t dispute the general point because there’s some truth in it, but I think you’re making this part of the problem out to be bigger than it often is.

Like atime updates, Windows thumbnail caches are easily turned off and fairly commonly turned off for unrelated reasons. We can add .DS_Store files to that list. Temporary files usually follow predictable patterns, so they can be filtered out, and incron would have the same problem. Creating a temporary file counts as modifying its directory, but zfs diff -F makes it easy to filter out directory modifications.

By all means, ask yourself if you really need this, audit your write patterns first, be careful, and maybe don’t mess with sanoid’s snapshots. But I think the time-based approach is worth exploring before involving inotify.

mercenary_sysadmin · August 4, 2025, 1:18pm

I don’t personally think it’s a good approach–too complex, too many ways to screw it up–but it sounds like you’re pretty determined and have thought about this a lot. And the stakes for getting it wrong are pretty low from what I can tell, so, hey, I’m going to stop being a negative Nate.

I would love to see some kind of thorough recipe that covers common cases once you’ve gotten it all up and working.