Proxmox host system crashes when guest VM with physical GPU reboots

I have Windows 7 running in a virtual machine with access to an GeForce 9800 GT (ancient).
The host is a Gigabyte B450 AORUS-M with 16GB of memory and a Ryzen 5 2600X.

I’ve had this rather odd issue since installing the NVIDIA drivers where the Windows 7 VM will start up just fine the first time since host power-on, but if I dare reboot it (for updates or something), the host system supposedly kernel panics, and the POST LEDs start going nuts until I reset the system.

It only happens when the 9800 GT is passed through. Yes, it is in its own IOMMU group. Yes, I have blacklisted the NVIDIA drivers in Proxmox. The thing is the VM boots up just fine. It’s only when I try to boot it a second time without rebooting the host that it gets mad. So as a test, I started the VM, and immediately killed it when I saw the POST screen. I then tried starting it again, and it started fine. I waited for the Windows login screen (at this point, the GPU has been fully initialized), tried restarting it, and the host crashed.

From what I can tell, the problem likely has to do with the GPU not resetting properly after having been initialized by the NVIDIA drivers. What backs this up is that I was able to reboot the VM multiple times prior to installing the drivers with no issues, so my suspicion is that once the GPU has been initialized by the driver, it fails to reset.

I have no clue how to fix this. Has anyone else had a similar problem or knows what’s going on?

There are basically two angles of attack here: fix the issue on the host making it care what’s going on in hardware that technically doesn’t belong to it in the first place, or fix the issue in the guest causing it not to reset the hardware.

You state that you’ve already blacklisted the drivers in Proxmox. Did you blacklist both nouveau and nvidia*?

Assuming no joy on that easy try… Do you have the latest version of the Windows driver installed in your VM? If not, try it. If you do have the latest version–is there a slightly older WHQL version? If so, try that.

Are you seeing anything in journalctl to give you a clue what’s breaking?

Are you running any pre and post hooks on your VM?