I’m new to Sanoid/Syncoid, so please bear with me…
Firstly, thanks and congratulations for such a nice tool!
Secondly, I’m in the process of setting up a PoC with the following:
server-A (where the main application runs)
server-B (backup server)
server-C (where the main application will run if server-A dies)
The main application generates data and logs continuously so, in the case of a DR, the server-C must have access to the latest information and logs, the faster the better.
To achieve this, I’ve written a Bash script which runs syncoid to push snapshots to both servers B and C:
Despite the application on server-A steadily produces stuff, syncoid shows this message:
===== SEND_TO: server-B
WARN: --no-sync-snap is set, and getnewestsnapshot() could not find any snapshots on source for current dataset. Continuing.
CRITICAL: no snapshots exist on source PRODPOOL, and you asked for --no-sync-snap.
NEWEST SNAPSHOT: autosnap_2023-08-08_14:00:42_hourly
INFO: no snapshots on source newer than autosnap_2023-08-08_14:00:42_hourly on target. Nothing to do, not syncing.
NEWEST SNAPSHOT: autosnap_2023-08-08_14:00:42_hourly
INFO: no snapshots on source newer than autosnap_2023-08-08_14:00:42_hourly on target. Nothing to do, not syncing.
===== SEND_TO: server-C
WARN: --no-sync-snap is set, and getnewestsnapshot() could not find any snapshots on source for current dataset. Continuing.
CRITICAL: no snapshots exist on source PRODPOOL, and you asked for --no-sync-snap.
NEWEST SNAPSHOT: autosnap_2023-08-08_14:00:42_hourly
INFO: no snapshots on source newer than autosnap_2023-08-08_14:00:42_hourly on target. Nothing to do, not syncing.
NEWEST SNAPSHOT: autosnap_2023-08-08_14:00:42_hourly
INFO: no snapshots on source newer than autosnap_2023-08-08_14:00:42_hourly on target. Nothing to do, not syncing.
…which should not be the case as data and logs are produced at all times…
Questions to you, Sanoid/Syncoid gurus out there:
am I doing it right? if not, which approach should be better?
WARN: --no-sync-snap is set, and getnewestsnapshot() could not find any snapshots on source for current dataset. Continuing.
Means exactly what it says on the tin–replication is snapshot-based, you’re not taking any snapshots, and you’ve disabled automatic sync snapshots. The best practice here is to run Sanoid on both source and target, using appropriate templates (production on source, backup or hotspare on target), to take snapshots on the source and prune stale ones on both source and target.
Again, this means Sanoid running locally on both source and targets.
If I remove the --no-sync-snap flag, I get this instead:
===== SEND_TO: server-B
CRITICAL ERROR: Target BACKUPS exists but has no snapshots matching with PRODPOOL!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target BACKUPS dataset is < 64MB used - did you mistakenly run
`zfs create syncoid@server-B:BACKUPS` on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
CRITICAL ERROR: Target PRODPOOL exists but has no snapshots matching with PRODPOOL!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target PRODPOOL dataset is < 64MB used - did you mistakenly run
`zfs create syncoid@server-C:PRODPOOL` on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
Run Sanoid on both source and target servers? I have questions then…
a) doesn’t Sanoid create local snapshots only?
b) how do I interconnect both Sanoid instances?
c) I tried to take snapshots every minute with Sanoid by using this template:
[template_production]
frequently = 0
hourly = 5
daily = 5
monthly = 1
yearly = 0
autosnap = yes
autoprune = yes
frequent_period = 1 <-- this here should do the trick, or?
but it didn’t work either, that’s why I tried my luck with Syncoid… (notice that my script runs syncoid every 60 s)
I really need to take and transfer snapshots from the primary node to both the backup and failover node every 60 seconds.
Is that possible at all? If so, could you please provide me/us with some working configs?
CRITICAL ERROR: Target BACKUPS exists but has no snapshots matching with PRODPOOL!
Again, means exactly what it says on the tin. Replication is snapshot-based. You cannot replicate from one dataset to another if they do not have any common snapshots.
This works:
root@box:~# zfs create POOLA/mystuff ; zfs snapshot POOLA/mystuff@1
root@box:~# zfs list POOLB/mystuff
cannot open 'rpool/fucknut': dataset does not exist
root@box:~# syncoid POOLA/mystuff POOLB/mystuff
INFO: Sending oldest full snapshot POOLA/mystuff@1 (~ 42 KB) to new target filesystem:
45.8KiB 0:00:00 [11.2MiB/s] [=================================] 107%
INFO: Updating new target filesystem with incremental POOLA/mystuff@1 ... syncoid_elden_2023-08-08:12:56:40-GMT-04:00 (~ 102 KB):
52.3KiB 0:00:00 [ 604KiB/s] [================> ] 50%
That works because POOLB/mystuff did not exist yet, so the initial replication created it (with matching snapshot to the oldest snapshot on POOLA/mystuff), then it caught up to the current state (as captured by the syncoid sync snapshot, since I didn’t use --no-sync-snap) with an incremental replication after the full (this is all part of the first and only syncoid command you saw there).
Trying to create the target dataset first does NOT work:
root@box:~# zfs destroy POOLB/mystuff ; zfs create poolB/mystuff
root@box:~# syncoid POOLA/mystuff POOLB/mystuff
CRITICAL ERROR: Target rpool/test2 exists but has no snapshots matching with rpool/test!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target rpool/test2 dataset is < 64MB used - did you mistakenly run
`zfs create rpool/test2` on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
Note that the error message tells you what you need to know here, just as it did when you received it.
a) doesn’t Sanoid create local snapshots only?
b) how do I interconnect both Sanoid instances?
c) I tried to take snapshots every minute with Sanoid by using this template:
a) yes
b) they don’t “interconnect”, you just define your policies on both ends as appropriate. For example, you might want both ends to keep 30 hourlies, 30 dailies, and 3 monthlies… or you might want that on the source, but want 60 hourlies, 90 dailies, and 12 monthlies on a much larger backup system. It’ll do what you want, as you tell it to, on both sides. They don’t have to be identical, they just have to maintain at least one matching snapshot (so as not to break the replication chain).
frequently = 0
This tells Sanoid to keep zero “frequently” snapshots. If you want them taken every minute, you probably want something more along the lines of frequently=60… or at the very least frequently=10, because you don’t want to lose the most recent common snapshot on either end.
I really need to take and transfer snapshots from the primary node to both the backup and failover node every 60 seconds.
Is that possible at all? If so, could you please provide me/us with some working configs?
Depends on your system load and network throughput, but it’s probably doable. You want frequently=10 (minimum) on both sides, and you want to schedule a syncoid run (with --no-sync-snap, since you’re taking all these frequentlies) once per minute.
If the system gets particularly overloaded, you may not be able to replicate twice in two minutes every now and then–but as long as it’s not crushed under overload (and neither is your network), it’s not a big deal, because an attempt at replicating again while a prior is underway will simply fail out safely. So the net effect is not “replication is broken” just “we only replicated eight times in the last ten minutes instead of ten.”
If you need a hard guarantee of data being synced in real time between systems, without any possibility of a missed sync, replication won’t cut it, you’ll need proper HA. That can still be on top of ZFS, but you’ll be looking for something like DRBD to sit on top of it, with DRBD handling real-time mirroring and locking as necessary.
Thanks for all your answers. I will try to implement your suggestions.
Perhaps the already informative warn/error messages could provide some (potential) solution to the problem they point out.
This assumes something that didn’t happen… so perhaps it could assume a solution as well
My next idea is to create another PoC using GlusterFS or Ceph instead for real-time data replication and not use ZFS-related tools because there’s always a gap among the servers, but still I want to get a better understanding on how ZFS does all this under the hood Of course, combining GlusterFS and ZFS is also an option for an extra layer of security.
Correct, everything worked but the mountpoint stuff… and the reason that didn’t work is because you both need ZFS permissions and standard system permissions in order to set mountpoints and mount things. So although you successfully delegated the ZFS side of it, you can’t do the regular system part of it without sudo.
This is a papercut that you see a lot with delegated replication; the easiest way around it is generally to set canmount=noauto on the targets, so ZFS won’t try to automatically mount them (using the syncoid UID’s permissions, which are insufficient without sudo) during the receive process.
These are the permissions and properties on the pool:
root@zfs-backup:~/zfs_test# zfs allow BACKUPS
---- Permissions on BACKUPS ------------------------------------------
Local+Descendent permissions:
user syncoid bookmark,compression,create,destroy,mount,mountpoint,receive,rollback,send,snapshot
root@zfs-backup:~/zfs_test# zfs get canmount BACKUPS
NAME PROPERTY VALUE SOURCE
BACKUPS canmount noauto local
root@zfs-backup:~/zfs_test#
Still, the snapshot dataset@test1 makes it way from zfs-node1 to zfs-backup and, eventually, to zfs-node2, as I already showed in my previous post.
Now, if I invert the polarity, that is: zfs-node2 --push-> zfs-backup <–pull-- zfs-node1, things don’t work as expected, despite ZFS permissions are set as described …