I’m in the process of setting up my 1st backup(machine 2 from now on) to my main nas(machine 1 from now on).
The setup is consisting of 2 machines both running zfs and i have sucessfully pulled snapshots for one of my datasets using the command, running it manually from the cli:
/usr/sbin/syncoid --no-privilege-elevation --recursive --no-sync-snap --cr eate-bookmark syncoid@<machine1>:tank/users/x backup/tank-replication/users/x
I have a couple of datasets on machine 1 and i only want to pull some of them since i don’t regard all datasets as backup worthy. Lets say the structure of the datasets that i want to pull look something like this:
tank/users/x -> backup/tank-replication/users/x tank/users/y -> backup/tank-replication/users/y tank/users/z -> backup/tank-replication/users/z tank/groups/x -> backup/tank-replication/groups/z services/x -> backup/services-replication/x services/y -> backup/services-replication/y
I also use ansible to set up my machines so i created a loop that will generate one cron job per dataset i want to pull, which resulted in 5 cron jobs that have the same execution time each day with the same command i listed above.
When the cron jobs run all of them crash.
I can see that zfs datasets have been created on machine 2 with a tiny amount, maybe 500MB of data in each.
This is the journal for machine 2 during that time for one of the jobs:
Aug 19 15:31:01 machine2 CRON: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Aug 19 15:31:01 machine2 CRON: (root) CMD (/usr/sbin/syncoid --no-privilege-elevation --recursive --no-sync-snap syncoid@<machine1>:tank/users/x backup/tank-replication/users/x) Aug 19 15:31:21 machine2 sSMTP: Creating SSL connection to host Aug 19 15:31:21 machine2 rsyslogd: action 'action-5-builtin:omfile' resumed (module 'builtin:omfile') [v8.2112.0 try https://www.rsyslog.com/e/2359 ] Aug 19 15:31:21 machine2 rsyslogd: action 'action-5-builtin:omfile' suspended (module 'builtin:omfile'), retry 0. There should be messages before this one giving the reason for suspension. Aug 19 15:31:21 machine2 sSMTP: SSL connection using ECDHE_RSA_AES_256_GCM_SHA384 Aug 19 15:31:21 machine2 cron: sendmail: 550 5.7.1 [M12] User [hosting@<mydomain>] not authorized to send on behalf of <root@<mydomain>> (72b7581b-3ea5-11ee-8622-55333ba73462) Aug 19 15:31:21 machine2 sSMTP: 550 5.7.1 [M12] User [hosting@<mydomain>] not authorized to send on behalf of <root@<mydomain>> (72b7581b-3ea5-11ee-8622-55333ba73462) Aug 19 15:31:21 machine2 CRON: (root) MAIL (mailed 284 bytes of output but got status 0x0001 from MTA Aug 19 15:31:21 machine2 CRON: pam_unix(cron:session): session closed for user root
Journal from machine 1
Aug 19 15:31:03 machine1 systemd: Started Session 13102 of User syncoid. Aug 19 15:31:03 machine1 sshd: Received disconnect from 192.168.2.180 port 48386:11: disconnected by user Aug 19 15:31:03 machine1 sshd: Disconnected from user syncoid 192.168.2.180 port 48386 Aug 19 15:31:03 machine1 sshd: pam_unix(sshd:session): session closed for user syncoid Aug 19 15:31:03 machine1 systemd: session-13103.scope: Deactivated successfully. Aug 19 15:31:03 machine1 systemd-logind: Session 13103 logged out. Waiting for processes to exit. Aug 19 15:31:03 machine1 systemd-logind: Removed session 13103.
I don’t see anything obvious here, except that an email isn’t being sent.
On top of that i tried commenting out all but one of the jobs and it ran to completion without issues.
So the command, connection and permission work, it synced 2.66TB of data during the night.
But somehow when running multiple at the same time they fail.
Is it even possible to pull multiple snapshots from the same pool in parallel using syncoid?
If so, can you give me an example of your setup ?
Thanks in advance
Edit: For now to get anything pulled at all i created a bash script that has has one command per dataset to pull concatenated with " && " to run everything sequentially instead. My original question still remains.