Syncoid over ssh throughput advice

I’m working on two FreeBSD 14 servers. Using Syncoid to backup some vm-bhyve VM’s.

Each have Xeon CPUs.
Origin server has two HDD’s in mirror config, all data is on zroot.
Backup server pool is 4 SSD in a 2 vdev mirror config, separate data pool.

I’m transfering zvols. One has 8K volblocksize (don’t ask, it’s too late now), others are 128K.

syncoid -R --delete-target-snapshots --no-sync-snap -sshport=2222 -c aes128-ctr host:zroot/vm backup/vm

While both servers were on LAN I tried to zfs send this dataset but it overloaded the Mikrotik RB3011UiAS Router. I could not SSH into anything anymore while zfs send was running, all I had left was the console that was already connected so CTRL+C worked and I figured I’ll just do the backup once the server is shipped offsite as I won’t have this bandwidth problem since it will be slower.

Backup server is now 45ms away from Origin and yes it is slower, my problem overloading the Router isn’t a problem anymore. I get about 5MiB/s max 12MiB/s at times.

Tried aes128-ctr SSH cipher. The fastest according to Internet.
I’ve tried editing mbuffer. Changed -s 128k to 8K and put it back, upped the Memory to -m 1G and put it back to 16M. No difference.

Since my volbocksize of this VM’s zvol is 8K, does the mbuffer -s flag need to be 8K?

iperf3 speed between servers is 500 Mbps but file transfer, scp and rsync is still 100 Mbsp max.

I’ve read about people using Netcat, so I thought using that over Wireguard. Since SSH is using 1CPU that might be faster.

I also didn’t try HPN-SSH, wouldn’t know how to set that up yet with Syncoid.

If anyone has any advice, feel free.

I wouldn’t really want to use Netcat if that means I can’t use Syncoid.
HPN-SSH sounds good if I can set up Syncoid to use that. How would I go about that?

I’m now sending multiple datasets and I can get 150-300 Mbps total. The 128K seem to be twice as fast as the 8K one.

With about 250 Mbps the Origin Mikrotik is using 2.5% CPU, the remote Mikrotik (CRS326-24G-2S+) is using about 25%. That means 1G would take 100%.

I wasn’t able to see CPU usage when I overloaded the main Mikrotik over LAN.

The first sentence tells you that you’ve got a network problem (since multiple TCP streams are outperforming a single stream). The second one tells you something is likely bottlenecking on pps in between your source and destination (because your storage can go considerably faster than this).

This may not be possible to fix at the LAN level; your problems could be coming in with traffic shaping from your ISP(s) and the interconnects between them (and/or your ISPs’ backbones), because it’s EXTREMELY common IME to see single TCP threads rate limited to about 100Mbps.

Hey thanks for replying. It’s been a journey since I read and re-read your articles on Arstechnica when discovering ZFS :slight_smile:

Yeah now that I think about it you might be right on traffic shaping because when on LAN the link was saturated and even overloaded the Router. I always kept thinking I misconfigured volblocksize among other things since I never could get top speeds.

Speedtests without any file transfer show about 800 Mbps on each.

When downloading a test file (OpenBSD ISO) on the backup server it starts about 400 Mbps then it slows down to 200 Mbps or a bit less. I know the files are good because one is a test file from Github and the other is OpenBSD ISO, and they use CDN and at home they download at 1Gbps constantly.

On Main server the Github test file does download at 700, then half way through it goes down to 200 Mbps and up again. The backup server just does about 200 Mbps.

These both are AT&T Routers with Mikrotiks attached to them.

So speedtest doesn’t mean anything anymore? How does traffic shaping function, how is a thread recognized as a single thread by a router or firewall. Time for a Rabbit hole.

Thanks again for informing me out of your experience. I did also end up testing nc filetransfer over wireguard, it was 50 Mbps.

1 Like

Welp, there we have it. :slight_smile:

IME all residential ISPs* traffic shape on the per-thread level pretty aggressively. So does, for example, Linode–who will sell you a “10Gbps” interface on a VM that consistently can’t manage more than about 50Mbps to a datacenter where you can demonstrate >500Mbps… not on a single thread, anyway. Open up multiple TCP threads, and you get a lot closer to that promised interface speed.

This isn’t always pure fuckery; you have to remember that TCP window scaling is also a thing, and in particular when you’ve got high latency links, window scaling can get pretty wonky. (This is why VPN protocols are almost always UDP, not TCP–it avoids a double TCP window scaling problem, which absolutely murders throughput on, eg, OpenVPN tunnels established over TCP instead of UDP).

  • note: you may very well be paying for business connections, but IME AT&T small business plans–as opposed to the AT&T Big Boy Corporate plans that come in at >$700/mo, with big rackmount routers with 10Gbps interfaces–use the same equipment with the same configurations as residential plans, to the point they’re utterly identical. In my neck of the woods, I can’t even keep AT&T techs from constantly resetting small business gateway devices to their standard defaults, which among other things means dynamic WAN and NAT to a private subnet on the LAN, rather than providing a direct static IP to whatever real router is on the LAN side. I am not a big fan of theirs. :slight_smile:
1 Like

Yeah these are Small Business routers, I think Arris or similar. It passes a small subnet of IPs to the Mikrotik, so yeah just like home equipment.

Hmm I wonder if it’s in their interest to even let people use their own Equipmnet since maybe the local AT&T router does traffic shaping and if everyone used their own equipment that would mean all traffic shaping would have to be done at the ISP backbone and that could be CPU intensive. Unless it already is, I wouldn’t really know, just thinking.

These aren’t mine, I’m just working on things for these people and this is what I’m working with. Transfer is almost done, and by almost I mean hopefully less than a day now, it’s been two days :slight_smile: . Thanks for not making me manually input zfs resume tokens :slight_smile: , Syncoid does that.

Yeah I guess I’ve seen enough to also not be a
925993
of AT&T.

Yeah but even though, I still fell better now knowing the Internet is broken and not my server.

2 Likes

I just did Rclone to S3 (iDrivee2) and it was doing 600 Mbps uploads, then it would go down to 200 and back up. I guess that’s because Rclone uses multiple threads by default unlike Rsync for example.

But my own MinIO on backup server only receives 80 Mbps :slight_smile:

*Edit:

Oh nice, well I can do 282Mb currently with rclone sync --multi-thread-streams=20 --transfers 20

Well I don’t need Rclone but it’s nice to confirm suspicions.

Edit:

Now I’m just playing and testing the speed. It does 650 Mbps with 40 threads but after one minute it goes down to 230.

I don’t think I will want to try to do multithread with Syncoid and HPN-SSH for this because I’m happy with 50 Mbps over SSH for incremental sends. Would that even be possible since as I understand it zfs dataset send is a stream, and Rclone syncs actual files and it can do multiple files in each stream?

And I guess their algorithm is: throttle one thread and if multiple threads reach the promised and paid for speeds throttle all of them to 25% of total speed. :slight_smile:

This is very sneaky of ISPs because each user wouldn’t ever use Rclone in multi thread mode if they wanted to use S3. For backups it’s kind of OK.

In my younger days I thought the reason Upload speeds are 10th of the Download speed usually is because Broadcasting lobbyists don’t really want to put Media Broadcasting power into the people, broadcasting is centralized on YouTube for example where someone can be responsible for taking down unwanted content. There is no valid mathematical reason for having different speeds while signal is the same in the wire going each direction. Well I guess it could utilization and maxing out the bandwidth of a coax and they just decide the Download should receive priority.