Little PSA, derBauer is reporting that he has measured worrying temps in the 12VHPWR connector when using the 5090 Founders Edition. It seems like the card isn’t drawing evenly from all the available conductors, so one in particular is getting hot.
If you’re lucky enough to have found a 5090 FE to buy, please check the video to determine if you might be affected.
Jay from Jayz2cents has pointed out that the 12V rails in the 5090 FE go to a single busplate. He theorises that the issue is either a manufacturing problem (cables with low amps on them are not properly contacting the bus inside the card) or the PSU is not able to provide full amperage to all cables in the plug.
Both Jay and derBauer are using Corsair PSUs, but the person who experienced the burning was using an Asus Loki PSU.
I am less inclined to think PSU myself based on the initial reports… but at this stage it’s hard to tell. Allegedly there are people with third-party 5090s that do individual pin sensing that are complaining about power delivery - this would have happen before the 12V cables are bussed together - but if that part is made by nvidia then maybe it’s exposed to the same manufacturing problem? Or maybe it is the PSU somehow?
This is extremely damning. I have always been suspicious of the connectors design as sending this much power through such a small connector can’t leave a big safety margin and it seems I was right. Someone on Reddit (yeah I know…) did a very comprehensive post about the design and power limits of the 12VHPWR cable and it’s not good. It appears that it should at best have a rating of a maximum 375W, not 600W and even that leaves less safety margin than the regular old PCIe cable standard…
Edit: I will go into more detail about the nonexistant safety margins once I am off work but basically cards like the 4090 and especially the 5090 run the cable close to the absolute maximum rating which incredibly is just a smidge over the design rating. Insanity, especially without monitoring each phase.
First the TLDR: The 12VHPWR cable should have never been allowed to hit the market as a 600W or even 450W cable. If it was designed as a 375W cable there wouldn’t be a problem as that leaves a sufficient safety margin when things go wrong but 600W is so close to the absolute power limit of the pins that things have to go only slightly out of whack in order to cause potential problems.
The PCIe 8-pin and 6-pin cables offer huge safety margins. The standard asks for 150W at 12V for the 8-pin cable and 75W at 12V on the 6-pin cable. Since each pin of the connector is rated for at least 9A that means each phase can safely deliver 9Ax12V=108W. This means that an 8-pin connector can safely deliver its rated power over two of its four 12V pins and the 6-pin connector simply needs one of its three 12V pins in working order to be safe (it’s absolutely foolproof!). Most reputable PSU manufacturers use connectors that are rated for 10A on each pin!
The connectors on the 12VHPWR cables on the other hand use a smaller form factor and they come with a maximum rating of 8.5A or 9A. There aren’t any connectors in that size that are rated for more current so we can safely assume that this is the same for the pins of the 12VHPWR connectors even if we don’t have an exact molex equivalent that we can easily look up.
Some quick math tells us that in order to deliver 600W over that single cable each pin has to continuously run 8.33A at 12V (12Vx8.33Ax6=600W). Even in a best case scenario there is no room for error. If even one of the pins doesn’t make proper contact - be it user error, a manufacturing defect, some dirt that somehow got into one of the connectors or any other cause that leads to improper contact - then the other pins will be forced to run higher current than they are designed for and the way Nvidia designed their power delivery on the GPU means that you can’t detect when something goes wrong until you smell burning plastic. The only cards that I know of that monitor all six phases are the ASUS 5090 Astral cards and I don’t consider that a fix, rather a bandaid to warn you when things go south so you can shut off the system before you ruin your hardware or burn down the house…
Monitoring individual phases can’t be a fix if the phases are still bussed together on the board. It’s just a warning, if you even care to look, that there’s a manufacturing problem with your board or maybe a PSU problem… but my PSU has all of those 12V wires going to the same busbar as well (yes, through two of the older 8 pin ports)
Yes, however even if you have split phases like on the 3090TI, with nonexistent safety margins like running a 600W card over a single cable like the 12VHPWR the best you can hope for is that the system doesn’t turn on or limits the card to something like 150W when there isn’t enough power coming from one of the phases.
The 5090 needs a cable with proper safety margins or simply more cables with a power delivery system that isn’t run over a single shunt resistor so the card knows when there’s a bad connection somewhere. The failure points here are not the cables themselves, they do have fairly decent safety margins (still a little dicey though), it’s the connectors where things go horribly wrong. For a case like shown in der8auer’s video the card should be able to tell there’s something wrong and simply refuse to boot but you are not going to balance the load over a connector that has a problem that big (two pins running ~20A, one ~12A, others only 2A)
I only know enough about electrical circuits to minimize my chances of getting zapped.
It seems to me, that from a marketing point of view, it is better that 3% of cards ‘melt’ the cable due a dodgy connector and running a 600W card over a single cable, than it is to build in some type of protection on the card and have 90% of cards failing to boot because they detect a voltage or current issue?
NB. Percentages quoted are not based on actual data and a WAG in an attempt to make a point.
That holds until the first actual fire and everybody with a heat-damaged connector joins the class action against you
What they could have done, of course, is design a card that actually works - the more I read the more I think this is a design fault in the card itself…
like 10 years later, PCI-SIG still hasn’t taken any of the initial testing feedback seriously.
we ALL told them the design is flawed, the gauge of the lines and the size of each connecting lead are too small for the current they want to be sustained through it.
So get the hardware into vendors hands in ample supplies, optimize the drivers, see what nVidiAI is doing, release in a much better state, because drivers / software would be significantly better, and vendors would have full stock.