Backup vault sanity check

I would like help to sanity check the results of my ZFS backup googling, this is my understanding of attainable best practice and would like to put into action if correct.

  • zpool [source] (mirror)
    • Drive A
    • Drive B
  • zpool [rotating backup 1] (online, onsite)
    • Drive C
  • zpool [rotating backup 2] (offline, offsite)
    • Drive D

[source]/datasets/[useful_data]
[source]/datasets/[important_data]
[source]/datasets/[unimportant_data]

[useful_data]/[important_data] deserve offline/offsite backup
[unimportant_data] only deserves redundant access of being on a mirror.

So that data is never all in one place, I move the live [rotating backup 1] offsite, then bring [rotating backup 2] onsite.

  • Is cron continuously running zfs send/recv [useful_data]/[important_data] to [rotating backup 1]?
    • Is this what sanoid is for?
    • Are there sync issues if snapshots are deleted?
  • To move [rotating backup 1] offsite, is that just zpool offline [rotating backup 1] then disconnect drive?
    • Does it need snapshot something? Scrub?
  • To bring [rotating backup 2] onsite, is that just connect drive and zpool online [rotating backup 2]?
  • I would rather not manage whether cron is pointing zfs send at [rotating backup 1] or [rotating backup 2], can it be ignorant of this somehow?
  • I’m concerned with encryption
    • Can I just turn on encryption for a zpool after the fact? For a dataset? I assume not
    • When I am ready can I zfs send [source] to [encrypted source]?
    • How would [encrypted source] go being sent to [rotating backup n]?
    • Once both offline backups have been rotated through and [source] is scuttled, will this all look the way I expect it would?

With this set up I avoid the resilvering and DEGRADED issues of just having 3 and 4 in the mirror.

My lack of experience with ZFS means I don’t understand the implications of the steps I’m doing, but I want setting this pipeline up to be my learning exercise. I want this to be where I put my resilient long term data, but the first pass of all of this would be entirely with dummy data just to see it works and would likely rebuild everything from scratch. I’m concerned with the implications of potentially including encryption later, but I’m not in a position to start out with it in place.

Commentary and advice welcome, thanks. Hopefully you can see my question through the ramble, I couldn’t figure out how better to explain.

Can I just turn on encryption after the fact? For a dataset? I assume not.

You can turn it on per-dataset, but not after the fact. Encryption, like many ZFS properties, only affects data written after the property is set. (Encryption is somewhat special though, and must be enabled at dataset creation, because it has implications on how the blockpointer is interpreted.)

To encrypt an existing dataset you would: (a) create an encrypted dataset, (b) send/recv the unencrypted dataset into the encrypted dataset, (c) destroy the unencrypted dataset. (With the usual caveat that those unencrypted blocks are still on-disk, just in unallocated spacemaps, waiting to be overwritten.)

  • To move [rotating backup 1] offsite, is that just zpool offline [rotating backup 1] then disconnect drive?
  • Does it need snapshot something? Scrub?

If it’s a removable drive with its own pool you would want zpool import and zpool export, offline is for removing drives from an imported pool. You only really want to use zpool device level commands (like online/offline) when changing the redundancy / geometry of a vdev.

You would definitely want to scrub the disk periodically. Lord only knows how many solar flares hit your safety deposit box in the interim! There is nothing particularly magical about a scrub: it literally just reads every allocated block in the pool, that’s it. If it encounters errors, and there is sufficient parity, replicas, or copies to fix the errors, ZFS will fix the blocks as they are read. Otherwise it will report permanent data errors for the affected datasets.

Since you plan to have singleton vdevs for the rotating drives: a scrub will only report errors in your case, it won’t have any ability to recover from those errors. (Unless they are in datasets w/ copies=n; where n > 1 or in metadata which has implicit copies by default.)

  • Is cron continuously running zfs send/recv [useful_data]/[important_data] to [rotating backup 1]?
  • Is this what sanoid is for?
  • Are there sync issues if snapshots are deleted?

sanoid is run periodically (by cron, systemd timers, your fingers, whatever) and creates a snapshot of monitored datasets if no snapshot matching the retention policy already exists. If you ask it to prune snapshots it will also delete snapshots which (a) match a prefix and (b) are no longer wanted by the retention policy.

syncoid replicates snapshots from one pool to another pool. It will automate away a lot of the suck like: resumeable send/recv, maintaining a snapshot chain, falling back to full sends if the incremental chain is destroyed, buffering, recursing down a list of datasets, etc.

Also you are correct that ZFS needs a common snapshot on both pools to do an incremental send/recv. By default sanoid will only prune snapshots that have a specified prefix, and ideally syncoid would be configured to use a non-conflicting prefix, in which case syncoid will manage its own snapshots and keep a snapshot around just long enough to maintain the incremental-chain.

I personally run sanoid on both the sender and receiver, with different retention policies on either side. I keep ~3 months of snapshots on the sender, including 15-minute interval snapshots of key directories. However in some cases I retain multiple years of snapshots on the receiver.

  • To move [rotating backup 1] offsite, is that just zpool offline [rotating backup 1] then disconnect drive?
  • Does it need snapshot something? Scrub?
  • To bring [rotating backup 2] onsite, is that just connect drive and zpool online [rotating backup 2]?
  • I would rather not manage whether cron is pointing zfs send at [rotating backup 1] or [rotating backup 2], can it be ignorant of this somehow?

Ok, first concern: it sounds like you want to have relatively long term connection of your rotating backups rather than connect, watch a backup happen, then disconnect. That means USB is not going to be appropriate. You’re going to need proper SATA or SAS connectivity for your removable.

Second: if you don’t want to manage detection of which removable is attached in your syncoid cron job, just name each removable’s single disk pool the same name, eg “backup.” syncoid -r mypool/mystuff backup/mystuff will work regardless of which of the two “backup” pools is currently imported.

You will need to do a little manual cli work when you swap backup pools, eg zpool export backup ; [ physically remove backup drive 1 ] ; [ physically insert backup drive 2 ] ; zpool import backup

To encrypt an existing dataset you would

Got it, thanks. I think I was also confused about whole disk encryption vs dataset encryption. When I send from [important_data] to [encrypted_important_data], it takes all the snapshots etc with it, it’s just now encrypted on the disk?

If it’s a removable drive

Likely these would be internal drives on sleds rather than a USB enclosure. I assume server still knows about the pool, it just knows it isn’t connected right now?

it won’t have any ability to recover from those errors

Do successive zfs send operations correct errors in the destination uncovered by scrub?

(Unless they are in datasets w/ copies=n; where n > 1 or in metadata which has implicit copies by default.)

I don’t think I understand this. A dataset that just has the data in more than one place?

sanoid

I see, sanoid replaces all of the DIY cron scripting you’d do if you wanted to set up your own snapshots. A configurator for what snapshots should exist, and a service that makes it happen.

syncoid

Definitely sounds useful, rather than having my own cron constantly calling zfs send I just specify that I want snapshots of this variety sent from [source] to [rotating backup 1] using the config and service just makes it happen?

ZFS needs a common snapshot on both pools

sanoid on both the sender and receiver

I think that makes sense though, have relatively conservative retention on the live dataset and send everything to the backup, then let the backup decide the actual data retention policy.

I suppose this would mean that the offsite backups are out of sync then? They’d perhaps have all the daily backups, but each offsite backup would miss any backups [source] did not retain?

Thanks!