Understanding Snapshot Behavior

greSTATi · July 1, 2024, 6:29pm

Apologies for the generic title. I’m trying to understand how ZFS snapshots work when moving them to different machines.

My ultimate goal is to copy snapshots from my main ZFS pool to two backup pools. I’m using Sanoid to take/manage snapshots. And I’d like to use Syncoid to move the snapshots.

Another fork in all this is that I first set this up a while back and I don’t remember exactly what I did. I know I could just blow away the backup pools and start over, but I’d like to try to keep what I have (this also tests my understanding).

So, I wanted to copy snapshots from my Main Pool to Backup Pool A. I ran a basic syncoid command and this was successful.

$ syncoid --recursive --skip-parent main_pool backup_pool_a

Now, I tried to run this same command on the other backup pool and I ran into an error.

CRITICAL ERROR: Target backup_pool_b/dir1 exists but has no snapshots matching with main_pool/dir1!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.

I looked into this and on both my Main Pool and on my Backup Pool A, I see that the datasets all have a snapshot with the suffix @syncoid_ServerName_<time_long_ago>. I’m guessing that that’s the “sync snapshot” that Syncoid takes before it copies data. When I look at Backup Pool B, none of the datasets have that snapshot. So the error makes sense, there’s no common starting point on Backup Pool B.

Now my questions:

In the past, how could I have copied data to Backup Pool B without a common snapshot? (Is the likely answer that the common snapshot got deleted from Main Pool?)
How do I move forward while preserving data already on Backup Pool B? After running Syncoid with Backup Pool A, I ended up with Backup Pool A containing my recent snapshots as well as the snapshots that were transferred a while back. How can I move that Syncoid “sync snapshot” over to Backup Pool B? (Is this a job for zfs send | zfs receive?)
What are the implications of not creating the “sync snap” (meaning running Syncoid with the --no-sync-snap option). First, I thought I should not create it because there is no benefit since I’m already copying over the Sanoid snapshots. Also, since I am moving snapshots to two destinations, two of those snapshots would end up getting created. But after running into the error above, I’m wondering if it’s a good idea to keep that sync snapshot so I will have multiple snapshots in common between the pools, in case something happens to that one common snapshot.

Hopefully I explained that well. Happy to provide additional info.

bladewdr · July 1, 2024, 7:09pm

In the past, how could I have copied data to Backup Pool B without a common snapshot? (Is the likely answer that the common snapshot got deleted from Main Pool?)

It likely got pruned by your retention policy (set in /etc/sanoid/sanoid.conf). Now that there’s no common snapshot, your only choice is to do another full replication.

How do I move forward while preserving data already on Backup Pool B?

You can’t, see above. If you have the space to do so, you can use zfs rename to move the files already on Pool B to another dataset temporarily while you wait for your full replication to finish. That way, you still have SOME sort of backup while you wait for the replication to finish, even if it’s woefully out of date.

What are the implications of not creating the “sync snap”

--no-sync-snap is there as a sort of guardrail, from what I understand. When Syncoid runs, it will create a snapshot with a particular name - and if there’s already a sync snap present, it will delete the old one once it’s done with the replication.

It’s basically a safety net for you not setting up your retention policies so that the backup server will always have a common snapshot to replicate from.

mercenary_sysadmin · July 1, 2024, 9:34pm

This is correct. The major reason --no-sync-snap even exists as an option is because it’s just not that simple when dealing with multiple target replication (eg A–>B–>C, or A–>B + A–>C) and in those cases, it’s frequently nicer not to have to cope with foreign sync snaps (eg the snapshots from replication A–>B winding up on machine C) at all, rather than having to go clean them up manually because there’s no mechanism in place to do it for you.

greSTATi · July 3, 2024, 8:47pm

Thank you for the replies, very helpful.

It likely got pruned by your retention policy (set in /etc/sanoid/sanoid.conf). Now that there’s no common snapshot, your only choice is to do another full replication.

You can’t [fix it], see above. If you have the space to do so, you can use zfs rename to move the files already on Pool B to another dataset temporarily while you wait for your full replication to finish. That way, you still have SOME sort of backup while you wait for the replication to finish, even if it’s woefully out of date.

Makes sense. I think that’s the most likely case.

Fortunately, I don’t need any of the older backps, and this is my second backup pool. I’m comfortable just wiping it. (I will probably just try --force-delete first.)

When Syncoid runs, it will create a snapshot with a particular name - and if there’s already a sync snap present, it will delete the old one once it’s done with the replication.

It’s basically a safety net for you not setting up your retention policies so that the backup server will always have a common snapshot to replicate from.

This is correct. The major reason --no-sync-snap even exists as an option is because it’s just not that simple when dealing with multiple target replication (eg A–>B–>C, or A–>B + A–>C) and in those cases, it’s frequently nicer not to have to cope with foreign sync snaps (eg the snapshots from replication A–>B winding up on machine C) at all, rather than having to go clean them up manually because there’s no mechanism in place to do it for you.

So if I understand correctly, the sync snapshot is the guardrail? And --no-sync-snap overrides this safety measure? And if I were to use --no-sync-snap, then it would be imperative that I make sure that my Sanoid retention settings maintain a snapshot in common?

And if I decide to not use --no-sync-snap (and do make the sync snapshots), I will definitely(?) have a snapshot in common with my backup pools? But then the issue would become that the sync snapshots sent to Backup Pool A would end up on Backup Pool B, and they won’t get removed?

Do I have that correct?

Either way, any advice on making sure I maintain a common snapshot between my pools? Of course I’ll plan it carefully when making the retention policies, I’m just trying to make sure I don’t end up shooting myself in the foot (and finding out down the road). Should I make manual snapshots outside of Sanoid (and just remember that I need to manually delete those)?

bladewdr · July 3, 2024, 10:05pm

Make sure that you’re doing another replication before the last snapshot that the systems have in common get pruned.

Snapshots don’t use much in the way or resources (unless your data has a TON of things being deleted constantly) so I normally err on the side of caution.

If you’re using Syncoid to do automated replication as you should, then there’s really no reason NOT to have it replicate at LEAST once a day. That’s typically what I do for my backups at home, but you may want to have it happen more frequently, at least for the onsite.

If you have a backup that’s offsite, you may want to limit that to just replicate once a day, or once a week, whatever fits your needs and risk profile for this data.

Just make sure that the production server isn’t deleting its last common snapshot before the next replication.

For example, say that the production server only retains 3 days worth of snapshots, but you do replication once per week. What’s going to happen next time your replication tries to run? No snapshots in common.