- May 16, 2002
- 25,668
- 14,676
- 136
i mean, from your screenshot it looks like it's not running windows 10 "no problem" since it doesn't detect the 4090 at all. the problem is not the OS. you are not understanding the big picture here. what's different between the 4090 and a titan V that might be a clue?So, the motherboard will run Windows 10, no problem, and 3 Titan V video cards running linux @100% for months, no problem at all, but its bad ???
This motherboard WAS running windows 10 a few years ago on a 1080TI, the later 3 Titan V's, then linux with the same Titans, no problem until I put in this card. And linux actually see the card and loads drivers, but due the the fan sensor will not run the card at full speed.i mean, from your screenshot it looks like it's not running windows 10 "no problem" since it doesn't detect the 4090 at all. the problem is not the OS. you are not understanding the big picture here. what's different between the 4090 and a titan V that might be a clue?
i didn't say the motherboard is bad, i said it's the problem.
There is no other hardware in there. Also, I can't find the text, but it said that 4090's were being stripped to be used on AI devices. Its somewhere in the graphics card thread.That article has nothing to do with your issue.
Device manager is showing you a bunch of devices for which you haven’t installed drivers. This is also unrelated to your issue.
well, crap, after trying the 535-open (or something like that) even with nomodeset the screen is blank, I have to reinstall linux.So what about the output of dmesg?
What is the model name of the motherboard?
Forgetting all of that, this motherboard runs both cards, but not the 4090 correctly and nvidia gives an error on the fans, which do NOT turn using Linux.You’re misinterpreting the article. You really don’t understand how GPUs work at all. What you’re suggesting is bordering on some kind of conspiracy. The article is basically saying that they are taking these high power GPUs and using them for AI workloads. Which is what a lot of people are doing.
Again, nothing to do with your issue, even if it were true. You bought this card from Amazon, in the US market, not China. You’re getting further and further away from the truth, your motherboard.
But windows does not even see the card, and linux nvidia-smi reports an error trying to read fan rpm. The card is bad.the GPU fans don't spin because the card isn't working hard enough to need to spin. all modern GPUs act like this with zero RPM fan modes, which is baked into the GPU fan controller firmware and again not the root cause of your issue. it's only a symptom. the fan error itself is not causing any problems, nvidia-smi is just having problems communicating with the GPU.
OK. I asked because earlier SP3 motherboard series supported PCIe v3 only (EPYCD8 is one o those), whereas later SP3 motherboards such as ROMED8-NT gained PCIe v4 support. The V100 is a PCIe v3 card, whereas the 4090 is a PCIe v4 card.The mothrboard is the ASRock EPYCD8
https://www.newegg.com/asrock-rack-epycd8-amd-epyc-7000-series-processor-family/p/N82E16813140010
The PCIe encryption controllers are a function of AMD's I/O die, from what I understand.especially with the message from device manager that does not see the card, but DOES see a bunch of encryption/decription devices ????)
$ /sbin/lspci | grep Encryption
01:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
02:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
21:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
22:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
22:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
44:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
45:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
62:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
63:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
81:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
82:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
a1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
a2:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
a2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
c1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
c2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
e1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
e2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
$ /sbin/lspci | grep Encryption
02:00.5 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 14ca
Good points ! My next test is to put it in a 7950x motherboard/cpu combo that has run a 2080TI just fine and has all the drivers. I am sure its PCIE V4. I hope it works !OK. I asked because earlier SP3 motherboard series supported PCIe v3 only (EPYCD8 is one o those), whereas later SP3 motherboards such as ROMED8-NT gained PCIe v4 support. The V100 is a PCIe v3 card, whereas the 4090 is a PCIe v4 card.
I do use my 4090 GPUs in PCIe v3 slots as well (on Z270 based consumer PC boards), so I don't expect the GPU to give issues from having to downgrade itself to PCIe v3 mode. In contrast, if this was a PCIe v4 capable board, the the fact that it worked for you earlier with more than one PCIe v3 GPU would not mean that stability with a PCIe v4 GPU is a given.
(Even server motherboards sometimes are shipped with bug ridden BIOSes. Or rather, it's probably fair to say that all BIOSes are buggy, it's just that production server BIOSes *tend* to have not too many too severe bugs.)
One other thing though, did you use the same *combination* of PCIe slots before, when you had only V100's in there?
it will work in that motherboard. because like i said, the problem is with your EPYCD8.Good points ! My next test is to put it in a 7950x motherboard/cpu combo that has run a 2080TI just fine and has all the drivers. I am sure its PCIE V4. I hope it works !
From which computer is this screenshot?It is working,
As far as I know, all of the mainboards which support Milan is a newer generation which (a) no longer supports Naples and (b) was designed from the start to support Rome's and Milan's PCIe v4 capability. (Genoa mainboards go as far as supporting PCIe v5 and CXL; not all of the PCIe v5 slots are CXL compatible of course which is a limitation of Genoa's IOD; and some PCIe connectors might only support PCIe v4 instead of v5 which would be a limitation of the specific board. Genoa's IOD also has a few "bonus" PCIe v3 lanes, but I haven't seen them used for slot connectors on the random few SP5 boards which I looked at.)And I do have video cards in my Milan and Genoa boxes. They all work fine, but are all PCIE V4
show me where i said it's "because it's a server motherboard". that doesnt make any sense.It is working, for the reasons Stefan said, not just because its a server motherboard.
show me where i said it's "because it's a server motherboard". that doesnt make any sense.
i said it was a problem with YOUR motherboard. it's a problem with the EPYCD8 when used with a Rome processor and a BIOS misconfiguration
I think you still don't see the big picture. the reason it would work on your Milan and Genoa boxes is the same reason it works on your 7950X and the same reason it DOESNT work on your EPYCD8. the PCIe gen. which is determined by... wait for it... the motherboard.
pro tip: you could have gotten this card to work on your EPYCD8, but you need to make some changes to... the motherboard.
as i said all along, it was the motherboard that was the root cause. and more specifically a problem with the EPYCD8 when combined with the Rome processor and the PCIe link speed set to Auto in the BIOS which you undoubtedly have it set. the Rome CPU supports PCIe gen 4. the GPU supports PCIe gen 4. what's between them? THE MOTHERBOARD, which only supports gen3. when set to auto with a Gen4 card installed, the CPU tries to negotiate gen4, but it can't because the motherboard doesnt support it. it's a bug in the BIOS handling of this setting with this particular combination of parts. the BIOS is on, the motherboard. set the PCIe link speed to Gen3 and all your problems would go away on that EPYCD8 system.
More specifically, these are Z270 boards with Kaby Lake CPUs. That is, not only the board but also the CPU's PCIe controller doesn't support anything beyond PCIe v3. It is therefore impossible that the RTX 4090 would pick PCIe v4 mode in this combo. The RTX 4090 therefore runs securely downgraded to PCIe v3 in these computers of mine right away after power-on.I do use my 4090 GPUs in PCIe v3 slots as well (on Z270 based consumer PC boards), so I don't expect the GPU to give issues from having to downgrade itself to PCIe v3 mode.
Screenshot of 7950x-6 system. Its an MSI motherboard. I can't find the model in my purchase history. But here is a pic :One of the customer reviews on the Newegg page of EPYCD8 mentions that a 5700 XT was impossible for them to get to work. That's a PCIe v4 card too.
Oh wait... EPYCD8 was initially designed for EPYC 7001 Naples which had only PCIe v3 support. Later, ASRock implemented an EPYC 7002 Rome compatible BIOS for EPYCD8, but of course the board is still only specified for PCIe v3. The big question is, is PCIe v4 support of Rome's IOD properly disabled by the BIOS *before* PCIe device probing happens?
Things to investigate:
– dmesg output
– whether there is an option in the BIOS to choose between PCIe generations (don't let it use PCIe v4; the CPU's IO die is built for that, but the motherboard's physical design is not)
From which computer is this screenshot?
As far as I know, all of the mainboards which support Milan is a newer generation which (a) no longer supports Naples and (b) was designed from the start to support Rome's and Milan's PCIe v4 capability. (Genoa mainboards go as far as supporting PCIe v5 and CXL; not all of the PCIe v5 slots are CXL compatible of course which is a limitation of Genoa's IOD; and some PCIe connectors might only support PCIe v4 instead of v5 which would be a limitation of the specific board. Genoa's IOD also has a few "bonus" PCIe v3 lanes, but I haven't seen them used for slot connectors on the random few SP5 boards which I looked at.)
Long story short, Milan capable mainboards are fully PCIe v4 compatible. Rome capable mainboards which were derived from Naples boards in contrast are physically designed for PCIe v3 only, but I don't know if the BIOSes of these boards properly prevent Rome's IOD from establishing PCIe v4 link mode.
not exactly, the board doesnt *need* gen4. you just need to explicitly set the lane speed to GEN3 in the EPYCD8 BIOS when using the Rome CPU.Edir: The problem was that the motherboard was NOT PCIE V4. It works in a 7950x board.