Pull backup from a TrueNAS system to Synology

bladewdr · July 31, 2023, 7:58pm

Hey all,

Currently I’m using an rsync script on my synology that I wrote myself in order to pull backups from my TrueNAS system. It automatically runs on a schedule, and has been working fine, with the exception of the speed.

The backup takes hours, even though this is a local network connection. Granted, it’s only a 1Gbps link, but in the end it should only be transferring a couple of gigabytes.

The folder it’s copying has a large number of big video files.

This is the actual command it runs - it runs a simple for loop and iterates over a few directories.

rsync -rvs --delete --exclude="*@eaDir*" rsync@10.13.37.3:"/mnt/Pool_1/${dirsCopy[$dir]}" "/volume1/${dirsCopy[$dir]}" --log-file=/var/log/backup.log

I expect the issue is that it’s having to go through and compare the entire filesystem for changes, which with a directory that’s 10+ terabytes in size will take a while.

I’ve been looking into the various command line options available with rsync, and the one that seems most likely to help is --inplace. My concern with that is that some data may end up in an inconsistent state if a file is in the middle of being copied to the source directory while the rsync job is running.

Open to any other tools that may do this job better / more quickly as well.

mercenary_sysadmin · July 31, 2023, 8:54pm

It’s never going to get a whole lot quicker unless you replace rsync with replication, which would require a ZFS filesystem on both ends. You are correct that rsync must grovel over each individual file looking for potential differences; it also has to chunk the changed files on both ends, individually hash each chunk, then compare the hashes between source and destination to figure out which bits to move or not move.

On a 1gbps LAN, the last bit–chunking and hashing the files so you don’t need to move the entire file, just the changes–is usually a lose. You can make rsync skip that part–which decreases disk load significantly–with the ‘W’ flag (whole-file), which is safe as houses.

Using --inplace is dangerous on non-CoW filesystems, because if your replication gets interrupted partway through, you’re left with a corrupt file. (Rsync’s default behavior is to create an entirely new file, and work with that, then unlink the old and rename the new into place after it’s finished.)

If you can stomach ditching the Synology in favor of another ZFS system, you can start doing replication, which can be entire orders of magnitude faster than rsync–replication neither needs to grovel over the filesystem looking for potentially changed metadata, nor does it need to grovel over the insides of changed files. Replication already knows exactly which bits of which files need updating, and immediately begins chucking them down the wire at the target.

As an example, my /opt directory has 540,767 files in it (because that’s where my Steam repo lives). It gets backed up via replication every night, across a 1Gbps LAN like yours. Actually that’s not true… it gets backed up via replication every hour across a 1Gbps LAN:

root@jrs-dr0:/# crontab -l | grep opt
10 * * * * /usr/local/bin/syncoid -r --no-sync-snap root@banshee:banshee/opt    data/backup/jrs/banshee/opt

And it does not take very long when it does back up, despite the target being simple rust drives, thanks to ZFS replication:

root@jrs-dr0:/# /usr/local/bin/syncoid -r --no-sync-snap root@banshee:banshee/opt    data/backup/jrs/banshee/opt
NEWEST SNAPSHOT: autosnap_2023-07-31_16:00:01_hourly
Sending incremental banshee/opt@autosnap_2023-07-30_04:00:03_hourly ... autosnap_2023-07-31_16:00:01_hourly (~ 314.2 MB):
 313MiB 0:00:12 [25.1MiB/s] [================================> ] 99%

That twelve second backup run was actually painfully slow by my standards… because when I manually ran it just now, three or four other backup jobs were also running, to the same rust target pool.

bladewdr · July 31, 2023, 9:05pm

I’m not particularly attached to the Synology - it was my primary NAS, then got moved to backup status when I built my current fileserver running TrueNAS. It does nothing except pull backups once a day, I’m not using any of the apps or other features.

However, building another NAS is not in the cards for the moment just due to funds. It is definitely the goal to eventually move to ZFS replication.

Yeah, that’s exactly what I was afraid of when I read the description for the “–inplace” flag. I’ll give the “-W” flag a shot and see if it helps at all. Even if it cut the job down to an hour I’d be happy - currently it takes from 7am in the morning until usually around 2pm for the job to finish, which is utterly ridiculous.

Appreciate you taking the time, Jim!

Gerald · August 4, 2023, 1:05pm

I’m running the same thing but in the reserve direction: pulling from QNAP/Synology to ZFS (and then creating snapshots after each pull) instead of the other way around like you are doing. My performance is significantly better than yours though, so perhaps I might share what I’m doing:

I use the following commands (on the Ubuntu/ZFS system) to fetch data from a QNAP NAS and a Synology NAS which is then backupped to the QNAP NAS running Ubuntu with ZFS (TS-864eU-RP):

QNAP TS-431X3
rsync -ax --del --exclude ‘.streams’ --exclude ‘@Recycle’ admin@SOURCE-IP:/share/CACHEDEV1_DATA/SourceDirName/ /mnt/zfs-pool/NAS01/SourceDirName/

Synology RS1219+
rsync -ax --del --rsync-path=/usr/bin/rsync -e “ssh -c aes128-ctr” --exclude ‘#recycle’ --exclude ‘@eaDir’ --exclude ‘User-Data’ Administrator@SOURCE-IP:/volume1/SourceDirName/ /mnt/zfs-pool/NAS02/SourceDirName/

To check >3M files on the QNAP and >600K files on the Synology as well as sync a few differences that were made during the work day it takes about two hours (total size of the source data is roughly 70-80TB). Of course if large files have changed it will take longer.

When files are transferred I discovered that the speed was limited by the ssh encryption speed of the source devices (both have CPUs which are pretty crap), for the Synology I was able to achieve 1 GBit speed by adding the -e “ssh -c aes128-ctr” option which forces SSH to use an encryption that can be accelerated by the Intel CPU in the Synology, which unfortunately is not what SSH choses by default. For the QNAP, it has an older ARM cpu without AES encryption acceleration so I don’t have this parameter there (and can achieve only 50-60% of 1GBit).

All systems use some kind of RAID5/6/Z without any SSD caches and despite this the performance for scanning such a large numer of files is very acceptable. Especially since the 2019 Rackmount Synology has such utter crap hardware (a 2013 cpu with 2GB ram to manage a 96TB array).

Long story short: While I’m doing what you are doing in reverse, my performance of doing this is really not bad.
You maybe should check with top or other io/cpu-stat tools which device is limiting the transfer and if it’s I/O limited or CPU limited or network limited doing actual file transfers. How many files are there to check? Millions?

Jim is of course right here, if you want a really fast transfer you need ZFS on both sides. But the rsync way works really well for me.

bladewdr · August 6, 2023, 5:17pm

Unfortunately the Synology I’m using has a Realtek RTD1296 SoC. Not sure how I would go about checking the featureset of that chip to see if it has a built in AES module… somehow I doubt it.

May I ask what point of your “-x” flag is? Do you have mount points inside the backup job?

kaihp · August 6, 2023, 7:00pm

Ssh cypher speed is a really good point. I benchmarked the f out of my rpi4 before settling on one.

Gerald · August 11, 2023, 11:51am

Enable SSH, login as admin and run cat /proc/cpuinfo

processor       : 0
model name      : ARMv8 Processor rev 4 (v8l)
BogoMIPS        : 54.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

This is from a DS418j which has an Realtek RTD1293 which is either a predecessor or a lower end variant of the RTD1296. And it has AES support, so I would say it’s extremely likely the RTD1296 has this as well.

You really need to watch your rsync job to find out where it spends the most time:

does it take the most time just checking millions of files?
does it take long to actually transfer the changed files?

If it’s the latter then testing different encryption settings could improve transfer speeds and thus overall time needed a lot.

You are correct, that really makes no sense here, I think I put that there because my script pulls from serveral sources and one of the first linux servers I implemented this for had some special mountpoints and this then just got copy&pasted over to my synology rsync lines where it had no reason to be.

bladewdr · August 11, 2023, 10:29pm

Interesting, in 10 years of using Linux I’d never realized that it exposed the featureset of the CPU in this way.

does it take the most time just checking millions of files?

It’s definitely this. When I run the job manually I can watch it sit there for minutes sometimes before I see any progress.

I’ll give that a shot, thanks!