At home, I’ve got a linux server running samba and using zfs, and several macs using it as a network backup target via time machine. The server is snapshotted regularly, and is synced every night to an off-site server. This seems to be working well. There are a couple hat-on-a-hat aspects to this, though.
This setup is taking snapshots of snapshots. When time machine uses a network target, it creates a sparse disk image. It mounts that image, and every hour creates a new folder. It hard links unchanged files to the correct spot, and copies changed files. It keeps weekly snapshots until the target volume runs out of space.
ZFS just sees the sparse disk image. Since its snapshots and incrementals are at the block level, this doesn’t end up being horribly inefficient. Since the time machine backups include a long history, I only keep about a month of ZFS snapshots, reasoning that give me enough time to notice and fix issues with the backups, while the time machine snapshots reach back months/years, if that ever becomes relevant.
The other duplicated functionality is encryption. Time machine can use an encrypted disk image, and ZFS can encrypt datasets. I think it’s pointless to do both, so I use time machine’s encryption, vaguely reasoning that the earlier, the better.
Does this seem reasonable? Is there a way to reduce the snapshots-of-snapshots effect?
IME network Time Machine backups are very flaky, especially over Wifi. Having zfs snapshots is useful to have some sort of backup if the sparsebundle becomes corrupted. But if they do become corrupted, in practice you will need to start a fresh new Time Machine backup.
Yes, I have encountered that flakiness, and had to start new backups. That leads to a real big incremental sync to offsite :). In the unlikely event that a laptop dies right when the TM backup flakes, you can restore TM to an older ZFS snapshot, then be good to go.
I think one of the reasons for that flakiness is that there is a lot of folklore being passed around the internet on configuring samba as a time machine target. I’ve fiddled with those settings and it’s gotten more reliable.
I’m about to add to the folklore: I think that setting a ZFS quota is bad here; instead you want to set the samba share quota. I used to set a ZFS quota on the time machine datasets, and instead of pruning old snapshots, TM would quit and die when it got full. Maybe that’s because the zfs quota includes zfs snapshot overhead, and TM doesn’t understand that, but that’s just a guess.