Dataset/Dataset Conent Migration Across Pools? Best Practice: ZFS Replication vs. Rsync via Network Client Between SMB Shares?

SinisterPisces · July 21, 2024, 9:57pm

Okay. That topic was probably sufficiently all over the place to raise some eyebrows. My lack of experience has me thinking I’m not explaining this well already.

I have an 8-bay server. Right now, it looks like this:
Pool 1: Single mirror. Used to evacuate all the data from an old 2 bay QNAP drive that I needed to retire. Right now, all the data is stored in a single “evac” dataset with a single shared folder that a single user/group has access to.

Pool 2: Main storage tank. Right now, 3x mirror (6 drives). Empty. I haven’t even created any datasets yet.

Goal:

With as little friction and as much safety as possible, get all the data from the “evac” dataset to appropriate dataset(s) on Tank.
Change existing share (and create new shares) to point to new dataset(s) as needed.
Empty Pool1 and destroy it.
Add 4th mirror to Tank, using now freed up disks that were part of Pool 1.
Finally be done with pool setup on this server.

So, the hamfisted way to do this that I’m pretty confident will work would be to set up new datasets and shares on Tank and use my 10 GbE client machine to mount the existing “evac” dataset as an SMB share, and then rsync data across my client and back to a new dataset on Tank via rsync commands on my client.

But that seems like an awkward, potentially slow and error-prone boomerang show.

I know I’ve got access to ZFS replication tools, which I, of course, have never used and don’t really know how to use. But it seems like I should be able to use replication to … somehow (?) … do all of this on the server?

If that second approach is The Way, could someone recommend a guide that would conceivably keep me from accidentially torching all my stuff?

Thanks!

EDIT: My question is similar to this one, which seems not to have gotten any replies. Linking it here in case it helps anyone else searching for this sort of thing later. Migrating Data From Failing Pool

waltar · July 22, 2024, 10:51am

When you have booth pools in the one server why don’t use this ?
zfs create tank/my-new-dataset perhaps with other properties as default and then
cp -a /old-pool/evac-dataset/* /tank/my-new-dataset/.

mercenary_sysadmin · July 22, 2024, 8:46pm

ZFS replication ranges from “as efficient as rsync” on the absolute worst workloads for it, to “1,000x or more faster than rsync” on the best possible workloads for it.

Any time you’ve already got ZFS on both sides, replication is almost always going to be the best answer.

SinisterPisces · July 22, 2024, 9:09pm

Thank you. I’m definitely more concerned about it being reliable and safe than fast.

Is there a guide or tutorial you’d recommend for someone who hasn’t used ZFS replication before?

bladewdr · July 22, 2024, 9:41pm

mercenary_sysadmin · July 22, 2024, 9:50pm

hey, I know those guys!

mercenary_sysadmin · July 22, 2024, 10:09pm

and in addition to the webinar, if you prefer article-style presentation, this isn’t the worst introduction to replication, especially as compared to rsync: rsync.net: ZFS Replication to the cloud is finally here—and it’s fast | Ars Technica

SinisterPisces · July 23, 2024, 7:42pm

Thank you both. I’ll check these out.

I enjoy videos, but often find articles more useful when I need to refresh myself later.

(Also, I just noticed the typo I left in the topic title. Sorry. )

SinisterPisces · August 18, 2024, 11:42pm

Thanks again for all the help.

I had to step away from this effort for a few weeks, as job hunting has become a priority, but between the video and the article links, I feel really confident about what I need to do to make this work.

Even better, I actually feel like I understand how and why ZFS replication works, whereas before it was magic that I had to twiggle the right mysterious toggles of to get the desired result.

To clarify, the video and articles discuss manipulating ZFS via the CLI via ZFS commands, or more preferably via an orchestrator like Syncoid.

I’m assuming that, since I’m using TrueNAS, I should use its replication task GUI as my orchestration tool and avoid trying to issue direct ZFS replication commands?

mercenary_sysadmin · August 19, 2024, 1:53pm

Up to you. Simple zfs send piped to zfs receive works fine from the truenas cli, but I’m given to understand that installing new packages is kind of a pain unless you set up a container for that. I don’t really know how pleasant-or-not the in-interface tooling is in the truenas UI, myself.

bladewdr · August 19, 2024, 7:39pm

It’s fairly trivial to set up replication in the GUI. The only thing I had to do on the CLI with my TrueNAS box is set ZFS delegation permissions, since the GUI doesn’t support that yet (or at least it didn’t when I set up my replication jobs.)

But yeah it’s pretty much:

create SSH credentials
create replication job, you can do either push or pull.
If you’d like to avoid using a privileged user, you can drop to the shell and issue a few zfs allow commands.

SinisterPisces · August 19, 2024, 10:56pm

Thanks. I’m glad to know the ZFS CLI commands won’t give TrueNAS the vapors.

I’m very glad I watched the video linked up-thread so I actually know how all this works. I could have learned how to configure TrueNAS without actually underatnding why it was working, which isn’t what I wanted. Relying on magic hidden behind GUIs has gotten me into trouble when I’ve done other Linux things, and this is a lot more mission-critical than any of that was.

… The downside is, now I’ve actually snapshotted and replicated a dataset and a trio of nested datasets from one pool to another, and have not just drank the ZFS kool-aid, but am fully submerged in a vat of it. After having used QNAP’s byzantine interface to transfer data from one storage pool to another, I am enamored. It was so easy and so fast that I wasn’t even sure the snapshot happened at first, and even the replication was significantly faster than rsync–though the TrueNAS UI doesn’t tell you how fast it’s going–I had to pop over and look at disk I/O and add up the individual disk stats to get an idea of the total write speed.

So, for anyone who finds this later and is using TrueNAS, absolutely watch the video above, and then these two from Tom Lawrence on how to make TrueNAS behave. TrueNAS’ interface makes perfect sense once you’ve done it once, but I’m not sure I’d have figured it out on my own with getting it wrong a few times.

TrueNAS Snapshots:
ZFS Snapshots Explained: How To Protect Your Data From Mistakes, Malware, & Ransomware - YouTube

TrueNAS Replication:
Backup & Recovery Made Easy: TrueNAS ZFS Replication Tutorial - YouTube

Aside: TrueNAS does support SSH+netcat for replication, but it’s in the advanced settings for the replication job.