In this post, I’ll show you how I fixed my dead GPU by simply heating it up.
Some time ago the GPU in my old computer died(black screen). I managed to repair it by simply slowly heating it up, keeping up the heat for some time and then slowly lowering the temperature back down again. As my previous PC was quite old when this happened I got myself a new one and haven’t used the old PC very much since then. So I don’t know how much longer the fixed GPU might have lasted after that.
To clear things up a graphics card can break because of multiple reasons GPU, VRAM, DC-DC converters, dirt on the card, bad connections, etc. Ideally, you would troubleshoot it like any other piece of electronics check the voltages, make sure there are no shorts and nothing is overheating, … But if you lack the knowledge, time or skill you can simply heat it up and hope that this will fix the problem(and in some cases it does). You could potentially scorch the board, melt connectors and damage components. But hey if you were going to throw it away might as well try right?
This might help only if your actual GPU die went bad. It won’t help if other components are bad or there is a short on the board. Also, a common misconception is that the BGA(ball grid array) connections of the GPU to the graphics card PCB(printed circuit board) are cracked and that by reflowing the solder those bad connections get fixed. But in most cases, there are no cracks in the solder connections so reflowing the board won’t fix the issue. Instead only the GPU has to be heated up for some time and its internals will “magically” fix themselves.
Also, your “fixed” GPU might not last very long(maybe a few days, weeks or months) if the IC(integrated circuit) continues to operate in the same conditions that led to the failure in the first place.
How Does The GPU Actually Get Fixed?
Well, this would be quite a lengthy write-up(I might make a post about it sometime in the future) to fully explain what goes on inside the GPU and why it can “magically” fix itself if heated up.
But to put it briefly what happens is the metalization lines(internal interconnects between transistors) erode due to a phenomenon called electromigration. With time the electrons will move or push atoms in the metal interconnects down the line literally eroding them in one spot causing a void and then depositing the material making a hillock. This effect only gets worse with higher temperatures as diffusion is more likely to happen.
A hillock can short two lines together and a void can break a line but I believe the most common mode of failure due to electromigration are timing errors that arise from the impedance mismatch caused by voids/hillocks. Also, an impedance mismatch in the transmission line will cause reflections to cour which will further degrade the signal integrity.
Impedance is frequency dependant which means the higher the frequency is the worse the timing errors will get. This is why sometimes a GPU/CPU might appear to work until you load it down and your PC crashes. The reason for this is that under load the clock frequency will get boosted causing the timing errors to become significant enough to cause an error.
Now let’s go back to the initial question of how does the chip fix itself. When the atoms are deposited somewhere down the metalization line this causes internal stresses or “pressure” in the material. So when we reheat it unpowered some of the atoms will migrate back to where there is less “pressure” in the material. This can improve the impedance of the transmission line enough to make the IC work again. However, this will usually not last very long(maybe a few days, weeks or months) if the IC continues to operate in the same conditions that led to the failure in the first place.