Removing redundant snapshots

mecifa · August 1, 2025, 4:23pm

Is there any tool to remove snapshots with no changes since the previous snapshot?

memnoch_proxy · August 2, 2025, 12:30am

I have not seen anyone mention a tool specifically for that. Even zero delta snapshots take space, so this is not an unreasonable idea. The general silence on this topic might be explained by two things that come to mind: religious use of a snapshot management utility like Sanoid. Also, the panicked delete ten thousand old snapshots response when your pool halts writing.

I manage my development VM snapshots entirely outside of Sanoid. But my home directories are definitely covered by Sanoid. I had a lot of trouble with ZAS (zfs auto snapshot) running roughshod on my pools where thousands of VM snapshots were piling up.

It would be great if someone else could chime in with any advice on when a ref=0 snapshot should not be deleted. The obvious standout condition is if it would break a send to your backup pool.

mercenary_sysadmin · August 2, 2025, 3:05pm

If you don’t have more than 100 snapshots (rough rule of thumb, not surgically precise) of a given dataset, you aren’t experiencing any performance issues from too many snapshots nor are you taking up a significant amount of extra data with snapshot metadata, so I would advise finding a more productive way to use your admin time and energy.

Once you acquire more than about 100 snapshots in a single dataset, the dataset itself will perform fine, but operations like zfs list -t snapshot will begin slowing down noticeably. Eventually, that becomes glacial. But until then, this is simply a non issue. And since destroying the wrong snapshot accidentally can cause you grievous issues while destroying the correct snapshot gains you essentially no benefit… Not something I’d recommend obsessing over.

The real cost, such as it is, in duplicate snapshots is not in HAVING them, it’s in taking them in the first place. You can accumulate a surprising number of metadata writes taking snapshots–enough, in some cases, to have measurable although minor impact on SSD write endurance. Destroying the snap doesn’t give you your write endurance back!

With that said, again, even the endurance thing is extremely minor and I do not recommend worrying about it unless you’ve got a VERY unusual environment (like thousands of datasets all being snapshot individually and frequently). With, say, a hundred datasets total and a standard sanoid hourly/daily/monthly routine, this just doesn’t add up to enough to bother with.

mecifa · August 3, 2025, 12:53am

To be clear, my interest in this is all about human factors.

When freeing up space,

zfs diff foo@2025-01-01_monthly foo@2025-06-01_monthly
zfs destroy foo@2025-01-01_monthly

is easier and has less opportunity for accidents than

zfs diff foo@2025-01-01_monthly foo@2025-06-01_monthly
zfs destroy foo@2025-01-01_monthly%2025-05-01_monthly

When looking for files to restore, ls .zfs/snapshots/* is sometimes more expedient than httm.
It would make it easier to see when datasets that don’t change often have changed unexpectedly.
sanoid at least used to create extra snapshots if it couldn’t write the snapshot list to a cache file. The good news is that it checks timestamps when pruning, so older snapshots aren’t lost; the bad news is that people have ended up with hundreds or thousands of extra snapshots this way, you have to delete them manually if you don’t want to wait for them to expire, and there can be snapshots you want to keep in between them.

mercenary_sysadmin · August 3, 2025, 2:34pm

The alternative to taking duplicate snapshots in the case you reference is to fail to take any when it should. You’re referring to a feature, not a bug.

mecifa · August 3, 2025, 6:21pm

It took me a second to realize that you were talking about sanoid. I figured it was intentional, but cleaning up the extra snapshots is the same either way. I’d be interested to learn why just working with the list in memory isn’t an alternative, though.

mercenary_sysadmin · August 3, 2025, 7:52pm

There is no “list in memory.” If there were, returning it wouldn’t be so expensive.

Sanoid isn’t a daemon; it’s run from cron–so its cache file IS the only cheap way for it to remember state from its last invocation.

More importantly, you can’t rely on memory for managing storage. Sanoid might know it tried to take a snapshot, but until it can verify that snapshot shows up in zfs list, it can’t verify the snapshot was actually taken. Similarly, it might know it DID successfully take a snapshot at some point… But until it checks zfs list, it can’t verify that snapshot hasn’t been destroyed or corrupted.

Sanoid cares about safety, and it cares a lot.