The details of why I executed sudo zfs destroy -r schwaemm/vm
are unimportant but after immediately realising my mistake and ctrl-c’ing a lot I ended up with some of the child datatsets gone.
I figured, no big deal, I have a backup that syncoid pulls every 15 mins, I’ll just syncoid it from the backup and all will be good.
prod = lorien
backup = proxima
Here’s the pull command that runs every 15 mins on backup: /usr/local/sbin/syncoid -r --skip-parent --sendoptions=w --no-sync-snap --no-privilege-elevation lorien:schwaemm tank/backup/schwaemm
(I have some encrypted data on schwaemm so I just set ‘w’ for everything)
This started well enough, I ran this command to restore backup to prod:
NEWEST SNAPSHOT: autosnap_2025-03-08_20:00:16_hourly
INFO: Sending oldest full snapshot tank/backup/schwaemm/vm/win11@autosnap_2025-01-01_00:00:11_monthly (~ 139.5 GB) to new target filesystem:
139GiB 0:49:30 [48.2MiB/s] [====================================================================>] 100%
cannot mount 'schwaemm/vm/win11': Insufficient privileges
INFO: Updating new target filesystem with incremental tank/backup/schwaemm/vm/win11@autosnap_2025-01-01_00:00:11_monthly ... autosnap_2025-03-08_20:00:16_hourly (~ 114.9 GB):
cannot hold: permission denied
cannot send 'tank/backup/schwaemm/vm/win11': permission denied
624 B 0:00:00 [2.01KiB/s] [> ] 0%
cannot receive: failed to read from stream
CRITICAL ERROR: ssh -S /tmp/syncoid-proxima-1741466254-2463 proxima ' zfs send -I '"'"'tank/backup/schwaemm/vm/win11'"'"'@'"'"'autosnap_2025-01-01_00:00:11_monthly'"'"' '"'"'tank/backup/schwaemm/vm/win11'"'"'@'"'"'autosnap_2025-03-08_20:00:16_hourly'"'"' | lzop | mbuffer -q -s 128k -m 16M' | mbuffer -q -s 128k -m 16M | lzop -dfc | pv -p -t -e -r -b -s 123409717208 | zfs receive -s -F 'schwaemm/vm/win11' failed: 256 at /usr/local/sbin/syncoid line 585.
I’m fine with the above error, I needed to zfs allow user hold schwaemm
. However, this is when I noticed that my backup no longer had all the snapshots and the latest data it held was dated mid December.
zfs list -t snapshot tank/backup/schwaemm/vm/win11
NAME USED AVAIL REFER MOUNTPOINT
tank/backup/schwaemm/vm/win11@autosnap_2025-01-01_00:00:11_monthly 0B - 127G -
tank/backup/schwaemm/vm/win11@autosnap_2025-03-08_21:30:02_monthly 0B - 127G -
tank/backup/schwaemm/vm/win11@autosnap_2025-03-08_21:30:02_daily 0B - 127G -
tank/backup/schwaemm/vm/win11@autosnap_2025-03-08_21:30:02_hourly 0B - 127G -
sudo ls -la /tank/backup/schwaemm/vm/win11/images/410/
total 133023603
drwxr----- 2 root root 5 May 18 2024 .
drwxr-xr-x 3 root root 3 May 18 2024 ..
-rw-r----- 1 root root 274920112128 Dec 14 17:26 vm-410-disk-0.qcow2
-rw-r----- 1 root root 917504 Dec 14 17:02 vm-410-disk-1.qcow2
-rw-r----- 1 root root 4194304 Dec 15 17:48 vm-410-disk-2.raw
I realised at this point, I probably should have stopped sanoid on prod and the syncoid pull service on my backup server but I cannot quite understand why even if one syncoid was pulling backup → production and another was pulling production → backup why there would be an issue. production would always have been older than backup woudln’t it? Unless sanoid created a new snapshot and that then wiped my backup on the next pull.
My questions:
- For the datasets that remained after ctrl+c, are they ok? They appear to have all the expected snapshots and the vms are still running.
- What exactly happened here? Why was my backup overwritten?
- Obviously apart from being more careful with what I execute, is there a safer way to restore datasets?