Cluster across geography (not HA), timezones and ping

joelishness · July 11, 2023, 9:50pm

I have three boxes with proxmox on them, and I was gearing up to cluster them, but without HA. I would like some of the conveniences (especially single interface), similar to this discussion:
https://www.reddit.com/r/Proxmox/comments/sgyosg/cluster_or_not/

However, I see a couple things on the proxmox cluster manager wiki page that give me pause, due to one of the boxes being in a different geographic location (remote backup target).
Could you please help me understand if I should still go ahead or not?

Concern #1:
Under the requirements it says “Date and time must be synchronized.”

Well, one of the three boxes is in a different timezone. Is this a showstopper or is there a solution?
Would I have to change the time on the box in other timezone?

Concern #2:
Under Network Requirements it says “The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.”

The remote backup target definitely doesn’t meet this latency requirement, smokeping indicates 40~90 ms.

Is this a dealbreaker? Or does it not matter because I will not be using HA?

Topslakr · July 12, 2023, 3:14pm

I can’t speak to Concern #2, but Concern #1 is a non-issue.

Date and Time being synchronized is extremely important, but that isn’t the same as them being in the same timezone. 9AM Eastern is the same time as 8AM CST, but we messy humans like time to be linked to the sun.

So long as all the servers are sync’d to some kind of agreed upon time source, like NTP, you’ll be fine in that regard.

mercenary_sysadmin · July 24, 2023, 11:32pm

Under Network Requirements it says “The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.”

This is likely because Proxmox clustering is intended to provide true HA, which means that a write can’t be considered fully committed until it lands on all clustered systems.

Since you don’t want HA, you might be better off doing simple OpenZFS replication as a way of keeping your geographically-distant spares up to date. There are literally no network requirements involved here beyond “packets need to arrive eventually”; I’ve used replication for offsite disaster recovery on systems with tens of TiB of data (in the form of live VM images) over 1Mbps consumer DSL many, many times in the past. (The only reason that’s “in the past” is because 1Mbps DSL is pretty thin on the ground these days… but it would still work fine!)

The simplest way IMO to automate this is using my own tool syncoid and a cron job or systemd task. It can be as simple as the following:

root@machineB:~# syncoid -r root@machineA:poolA poolB/poolA

Stick that in a cron job, and Bob’s your uncle. Just monitor the remote system to make sure the snapshots are arriving (sanoid --monitor-snapshots works excellently for this, allowing you to configure policy for when to WARN and finally CRIT due to age of newest snapshot) and you’re good to go.

joelishness · July 25, 2023, 12:17am

Thanks, Jim!
Actually I have ZFS replication with syncoid setup already and it’s working great!
Both sanoid and syncoid have been good to me, and I appreciate you for providing them.
And remote backup is to a location that is not much better than DSL.
(side note: Issue #250 would be nice someday, to save the bandwidth by not sending sending hourlies, but I digress)

Nagios monitoring (–monitor-snapshots) is next on my to do list, have been dragging my feet since it looks like a bit of a pain to setup. Thanks for the encouragement to get on it.

Motivation for cluster (without HA) is primarily laziness: So that I can have one proxmox UI to log into instead of three.