Restoring a zfs replicated VM from a dead node

martinl · July 3, 2023, 3:52am

I am thinking of creating a two node cluster without HA with the secondary node being off most of the time (to save power bill).

Single VM, ZFS replication enabled to the other host. Thinking of booting up the secondary host every night to run replication and then powering it off.

So now if the primary host dies, and I want to use the secondary host. How can I migrate / run the VM on the secondary host? I have given it a try but getting the “no route” error when trying to migrate it. I understand that this is essentially what HA crm does when a node with the live VM dies and HA with zfs replication is enabled. So I believe it should be possible.

Am I having a gross misunderstanding? Thanks!

ik5pvx · July 3, 2023, 5:09am

Did you update the DNS to point to your standby VM?

martinl · July 3, 2023, 7:42am

I did not change any DNS setting. I am still at the stage where I unplug the primary host where the VM usually lives. Then go to the web UI of the secondary host (which is a zfs replication target for the VM). and select the (dead) primary host from the web UI, choose the VM, and try to migrate it to the secondary host.
Clicking migrate brings up the error “Connection error 595: no route to host”. Which is understandable since that (primary) host is inaccessible from the secondary host.

So in that case, how do I spin up the VM on the secondary host ? Keeping in mind that the secondary host is a ZFS replication target for the VM

ik5pvx · July 3, 2023, 7:53am

I need a bit more context. Are the two VMs on the same hypervisor? Does the clone have the same IP address of the main one?

martinl · July 3, 2023, 8:28am

There is only one VM. It is typically running on the primary node. I have setup a zfs replication for that Vm with the secondary node as a target. No cloning has been done

ph0t0nix · July 3, 2023, 9:50am

Do I understand correctly that the primary host is down, and then, on the secondary host, you try to migrate the VM from the (disconnected) primary host? That sounds impossible to me. Once the primary host is down, clicking ‘migrate’ won’t work because the machine from which you want to migrate no longer exists.

The ZFS replication has replicated the data to the secondary host, but I don’t think the VM config has been replicated.

(Note, it has been a while since I have actively used Proxmox, so I might be off here and/or there).

martinl · July 3, 2023, 12:21pm

You are correct in describing the situation. However, the reason I suspect that this should be possible is due to proxmox’s HA feature (using ZFS replication) being able to do exactly this automatically:

the VM ZFS replication is sent to the target host on a regular interval. This target host does not have the VM on it as of yet
If the current VM host gets suddenly disconnected, the HA feature will spin up the VM in the target host automatically, picking up where the last replication job happened.

wyrdough · July 3, 2023, 9:35pm

At that point you aren’t migrating anything, you’re just running the VM. It just happens to be on a different host.

Without any fancy tools except libvirt, you accomplish this by configuring two hosts with the same network bridges, storage paths, etc, somehow replicating the config and VM images periodically, then when the primary host fails manually starting the VM on the backup host with “virsh start ”.

Migration is only a thing when both hosts are running, the VM is started on one, and you want to make it run on the other. livbirt’s not-live migration will transfer the config and optionally the storage before killing the VM on the source server and starting it on the destination. Live migration does the same, but also transfers the memory and starts the VM on the destination in a paused state so that the VM and the rest of the world don’t see any interruption.

(I’m describing what libvirt does because it’s simpler than what Proxmox and the like do, which makes it easier to explain and to wrap your head around the basic concepts)

martinl · July 5, 2023, 4:54am

I guess you are right. I think i’ve seen forum post where you need to have a copy of the VM configuration as well. Thanks

codiflow · February 13, 2025, 10:56pm

For others coming across this topic:

Let’s say you have two PROXMOX nodes where one node replicates a VM to the other using ZFS replication. Now the node with the running VM dies and is not accessible anymore. As we don’t have HA tasks the VM seems gone.

BUT there is HOPE

Just CREATE a HA task now in your cluster (Datacenter => HA => Add) for the VM. The task recognizes that the node has died and the VM is not running. It just spins up the VM automatically on one of the other nodes with the most recent snapshot.

A real life saver