New toy ! Fixed ! (PCIE V4 required)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
Think about the parts you’re using and it will make sense. The problem is the motherboard.
So, the motherboard will run Windows 10, no problem, and 3 Titan V video cards running linux @100% for months, no problem at all, but its bad ???
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
And more thoughts: Did you get the card new in a sealed box? Or was it an open box offer? Or even used?

Can you see whether or not the fan cables (should be two, plus maybe one or another lighting cable) are actually attached to the graphics card?

There, brand new, and it was sealed.
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
So, the motherboard will run Windows 10, no problem, and 3 Titan V video cards running linux @100% for months, no problem at all, but its bad ???
i mean, from your screenshot it looks like it's not running windows 10 "no problem" since it doesn't detect the 4090 at all. the problem is not the OS. you are not understanding the big picture here. what's different between the 4090 and a titan V that might be a clue?

i didn't say the motherboard is bad, i said it's the problem.
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
i mean, from your screenshot it looks like it's not running windows 10 "no problem" since it doesn't detect the 4090 at all. the problem is not the OS. you are not understanding the big picture here. what's different between the 4090 and a titan V that might be a clue?

i didn't say the motherboard is bad, i said it's the problem.
This motherboard WAS running windows 10 a few years ago on a 1080TI, the later 3 Titan V's, then linux with the same Titans, no problem until I put in this card. And linux actually see the card and loads drivers, but due the the fan sensor will not run the card at full speed.

Look at this, and I wonder if this card had been modified and erroneously shippedespecially with the message from device manager that does not see the card, but DOES see a bunch of encryption/decription devices ????)

 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
That article has nothing to do with your issue.

Device manager is showing you a bunch of devices for which you haven’t installed drivers. This is also unrelated to your issue.
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
That article has nothing to do with your issue.

Device manager is showing you a bunch of devices for which you haven’t installed drivers. This is also unrelated to your issue.
There is no other hardware in there. Also, I can't find the text, but it said that 4090's were being stripped to be used on AI devices. Its somewhere in the graphics card thread.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
So what about the output of dmesg?

What is the model name of the motherboard?
well, crap, after trying the 535-open (or something like that) even with nomodeset the screen is blank, I have to reinstall linux.
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
You’re misinterpreting the article. You really don’t understand how GPUs work at all. What you’re suggesting is bordering on some kind of conspiracy. The article is basically saying that they are taking these high power GPUs and using them for AI workloads. Which is what a lot of people are doing.

Again, nothing to do with your issue, even if it were true. You bought this card from Amazon, in the US market, not China. You’re getting further and further away from the truth, your motherboard.
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
You’re misinterpreting the article. You really don’t understand how GPUs work at all. What you’re suggesting is bordering on some kind of conspiracy. The article is basically saying that they are taking these high power GPUs and using them for AI workloads. Which is what a lot of people are doing.

Again, nothing to do with your issue, even if it were true. You bought this card from Amazon, in the US market, not China. You’re getting further and further away from the truth, your motherboard.
Forgetting all of that, this motherboard runs both cards, but not the 4090 correctly and nvidia gives an error on the fans, which do NOT turn using Linux.

This motherboard also runs the Titan V fine using windows.

So why is the motherboard at fault ?
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
the GPU fans don't spin because the card isn't working hard enough to need to spin. all modern GPUs act like this with zero RPM fan modes, which is baked into the GPU fan controller firmware and again not the root cause of your issue. it's only a symptom. the fan error itself is not causing any problems, nvidia-smi is just having problems communicating with the GPU.
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
the GPU fans don't spin because the card isn't working hard enough to need to spin. all modern GPUs act like this with zero RPM fan modes, which is baked into the GPU fan controller firmware and again not the root cause of your issue. it's only a symptom. the fan error itself is not causing any problems, nvidia-smi is just having problems communicating with the GPU.
But windows does not even see the card, and linux nvidia-smi reports an error trying to read fan rpm. The card is bad.
 

StefanR5R

Elite Member
Dec 10, 2016
5,591
8,013
136
OK. I asked because earlier SP3 motherboard series supported PCIe v3 only (EPYCD8 is one o those), whereas later SP3 motherboards such as ROMED8-NT gained PCIe v4 support. The V100 is a PCIe v3 card, whereas the 4090 is a PCIe v4 card.

I do use my 4090 GPUs in PCIe v3 slots as well (on Z270 based consumer PC boards), so I don't expect the GPU to give issues from having to downgrade itself to PCIe v3 mode. In contrast, if this was a PCIe v4 capable board, the the fact that it worked for you earlier with more than one PCIe v3 GPU would not mean that stability with a PCIe v4 GPU is a given.

(Even server motherboards sometimes are shipped with bug ridden BIOSes. Or rather, it's probably fair to say that all BIOSes are buggy, it's just that production server BIOSes *tend* to have not too many too severe bugs.)

One other thing though, did you use the same *combination* of PCIe slots before, when you had only V100's in there?
 
Reactions: Skillz and gsrcrxsi

StefanR5R

Elite Member
Dec 10, 2016
5,591
8,013
136
especially with the message from device manager that does not see the card, but DOES see a bunch of encryption/decription devices ????)
The PCIe encryption controllers are a function of AMD's I/O die, from what I understand.

From Supermicro H11DSi with dual 7452:
Code:
$ /sbin/lspci | grep Encryption
01:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
02:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
21:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
22:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
22:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
44:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
45:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
62:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
63:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
81:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
82:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
a1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
a2:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
a2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
c1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
c2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
e1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
e2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA
From Supermicro H13SSL with 9554P:
Code:
$ /sbin/lspci | grep Encryption
02:00.5 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 14ca
I don't recall whether or not memory encryption is enabled in the BIOS of these two machines; I believe I have it disabled.

Edit:
The two "Cryptographic Coprocessor PSPCPP" devices of the dual-7452 and the one "Device 14ca" of the 9554 are associated with AMD's Platform Security Processor (PSP). The 2x 8 "Starship/Matisse PTDMA" devices are part of AMD's PassThru DMA Engine feature for memory<-->memory and memory<-->device transfers.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
OK. I asked because earlier SP3 motherboard series supported PCIe v3 only (EPYCD8 is one o those), whereas later SP3 motherboards such as ROMED8-NT gained PCIe v4 support. The V100 is a PCIe v3 card, whereas the 4090 is a PCIe v4 card.

I do use my 4090 GPUs in PCIe v3 slots as well (on Z270 based consumer PC boards), so I don't expect the GPU to give issues from having to downgrade itself to PCIe v3 mode. In contrast, if this was a PCIe v4 capable board, the the fact that it worked for you earlier with more than one PCIe v3 GPU would not mean that stability with a PCIe v4 GPU is a given.

(Even server motherboards sometimes are shipped with bug ridden BIOSes. Or rather, it's probably fair to say that all BIOSes are buggy, it's just that production server BIOSes *tend* to have not too many too severe bugs.)

One other thing though, did you use the same *combination* of PCIe slots before, when you had only V100's in there?
Good points ! My next test is to put it in a 7950x motherboard/cpu combo that has run a 2080TI just fine and has all the drivers. I am sure its PCIE V4. I hope it works !
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
Good points ! My next test is to put it in a 7950x motherboard/cpu combo that has run a 2080TI just fine and has all the drivers. I am sure its PCIE V4. I hope it works !
it will work in that motherboard. because like i said, the problem is with your EPYCD8.
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
it will work in that motherboard. because like i said, the problem is with your EPYCD8.
It is working, for the reasons Stefan said, not just because its a server motherboard.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
And I do have video cards in my Milan and Genoa boxes. They all work fine, but are all PCIE V4
 

StefanR5R

Elite Member
Dec 10, 2016
5,591
8,013
136
One of the customer reviews on the Newegg page of EPYCD8 mentions that a 5700 XT was impossible for them to get to work. That's a PCIe v4 card too.

Oh wait... EPYCD8 was initially designed for EPYC 7001 Naples which had only PCIe v3 support. Later, ASRock implemented an EPYC 7002 Rome compatible BIOS for EPYCD8, but of course the board is still only specified for PCIe v3. The big question is, is PCIe v4 support of Rome's IOD properly disabled by the BIOS *before* PCIe device probing happens?

Things to investigate:
– dmesg output
– whether there is an option in the BIOS to choose between PCIe generations (don't let it use PCIe v4; the CPU's IO die is built for that, but the motherboard's physical design is not)

It is working,
From which computer is this screenshot?

And I do have video cards in my Milan and Genoa boxes. They all work fine, but are all PCIE V4
As far as I know, all of the mainboards which support Milan is a newer generation which (a) no longer supports Naples and (b) was designed from the start to support Rome's and Milan's PCIe v4 capability. (Genoa mainboards go as far as supporting PCIe v5 and CXL; not all of the PCIe v5 slots are CXL compatible of course which is a limitation of Genoa's IOD; and some PCIe connectors might only support PCIe v4 instead of v5 which would be a limitation of the specific board. Genoa's IOD also has a few "bonus" PCIe v3 lanes, but I haven't seen them used for slot connectors on the random few SP5 boards which I looked at.)

Long story short, Milan capable mainboards are fully PCIe v4 compatible. Rome capable mainboards which were derived from Naples boards in contrast are physically designed for PCIe v3 only, but I don't know if the BIOSes of these boards properly prevent Rome's IOD from establishing PCIe v4 link mode.
 
Reactions: Skillz

gsrcrxsi

Member
Aug 27, 2022
46
26
51
It is working, for the reasons Stefan said, not just because its a server motherboard.
show me where i said it's "because it's a server motherboard". that doesnt make any sense.

i said it was a problem with YOUR motherboard. it's a problem with the EPYCD8 when used with a Rome processor and a BIOS misconfiguration

I think you still don't see the big picture. the reason it would work on your Milan and Genoa boxes is the same reason it works on your 7950X and the same reason it DOESNT work on your EPYCD8. the PCIe gen. which is determined by... wait for it... the motherboard.

pro tip: you could have gotten this card to work on your EPYCD8, but you need to make some changes to... the motherboard.

as i said all along, it was the motherboard that was the root cause. and more specifically a problem with the EPYCD8 when combined with the Rome processor and the PCIe link speed set to Auto in the BIOS which you undoubtedly have it set. the Rome CPU supports PCIe gen 4. the GPU supports PCIe gen 4. what's between them? THE MOTHERBOARD, which only supports gen3. when set to auto with a Gen4 card installed, the CPU tries to negotiate gen4, but it can't because the motherboard doesnt support it. it's a bug in the BIOS handling of this setting with this particular combination of parts. the BIOS is on, the motherboard. set the PCIe link speed to Gen3 and all your problems would go away on that EPYCD8 system.
 
Reactions: Skillz

Skillz

Senior member
Feb 14, 2014
941
964
136
And this home run just scored you an infraction on the left field even.
show me where i said it's "because it's a server motherboard". that doesnt make any sense.

i said it was a problem with YOUR motherboard. it's a problem with the EPYCD8 when used with a Rome processor and a BIOS misconfiguration

I think you still don't see the big picture. the reason it would work on your Milan and Genoa boxes is the same reason it works on your 7950X and the same reason it DOESNT work on your EPYCD8. the PCIe gen. which is determined by... wait for it... the motherboard.

pro tip: you could have gotten this card to work on your EPYCD8, but you need to make some changes to... the motherboard.

as i said all along, it was the motherboard that was the root cause. and more specifically a problem with the EPYCD8 when combined with the Rome processor and the PCIe link speed set to Auto in the BIOS which you undoubtedly have it set. the Rome CPU supports PCIe gen 4. the GPU supports PCIe gen 4. what's between them? THE MOTHERBOARD, which only supports gen3. when set to auto with a Gen4 card installed, the CPU tries to negotiate gen4, but it can't because the motherboard doesnt support it. it's a bug in the BIOS handling of this setting with this particular combination of parts. the BIOS is on, the motherboard. set the PCIe link speed to Gen3 and all your problems would go away on that EPYCD8 system.

(Removed Homerun Gif which is totally unappropreate)

Moderator Aigo
 
Last edited by a moderator:
Reactions: gsrcrxsi

StefanR5R

Elite Member
Dec 10, 2016
5,591
8,013
136
I do use my 4090 GPUs in PCIe v3 slots as well (on Z270 based consumer PC boards), so I don't expect the GPU to give issues from having to downgrade itself to PCIe v3 mode.
More specifically, these are Z270 boards with Kaby Lake CPUs. That is, not only the board but also the CPU's PCIe controller doesn't support anything beyond PCIe v3. It is therefore impossible that the RTX 4090 would pick PCIe v4 mode in this combo. The RTX 4090 therefore runs securely downgraded to PCIe v3 in these computers of mine right away after power-on.

Edit, in case of EPYC Rome in boards like ASRock EPYCD8 or Supermicro's H11 series, a PCIe v4 capable processor riskily sits on a board with PCIe v3 trace layout/ muxes/ retimers and whatnot, and it's crucial that firmware configures the IOD to not go beyond PCIe v3.
 
Last edited:
Reactions: gsrcrxsi and Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
One of the customer reviews on the Newegg page of EPYCD8 mentions that a 5700 XT was impossible for them to get to work. That's a PCIe v4 card too.

Oh wait... EPYCD8 was initially designed for EPYC 7001 Naples which had only PCIe v3 support. Later, ASRock implemented an EPYC 7002 Rome compatible BIOS for EPYCD8, but of course the board is still only specified for PCIe v3. The big question is, is PCIe v4 support of Rome's IOD properly disabled by the BIOS *before* PCIe device probing happens?

Things to investigate:
– dmesg output
– whether there is an option in the BIOS to choose between PCIe generations (don't let it use PCIe v4; the CPU's IO die is built for that, but the motherboard's physical design is not)


From which computer is this screenshot?


As far as I know, all of the mainboards which support Milan is a newer generation which (a) no longer supports Naples and (b) was designed from the start to support Rome's and Milan's PCIe v4 capability. (Genoa mainboards go as far as supporting PCIe v5 and CXL; not all of the PCIe v5 slots are CXL compatible of course which is a limitation of Genoa's IOD; and some PCIe connectors might only support PCIe v4 instead of v5 which would be a limitation of the specific board. Genoa's IOD also has a few "bonus" PCIe v3 lanes, but I haven't seen them used for slot connectors on the random few SP5 boards which I looked at.)

Long story short, Milan capable mainboards are fully PCIe v4 compatible. Rome capable mainboards which were derived from Naples boards in contrast are physically designed for PCIe v3 only, but I don't know if the BIOSes of these boards properly prevent Rome's IOD from establishing PCIe v4 link mode.
Screenshot of 7950x-6 system. Its an MSI motherboard. I can't find the model in my purchase history. But here is a pic :
 

Attachments

  • 20231201_155511[1].jpg
    245.8 KB · Views: 5

gsrcrxsi

Member
Aug 27, 2022
46
26
51
I'm not sure if the SM H11 board has this problem. the Auto setting might work correctly to cap at GEN3.

I've had gen4 cards in my H11DSi with Rome CPUs and don't recall this ever being a problem. but it was a problem on all my EPYCD8 boards with gen4 GPUs. BIOS bug on a "$500 server board"
 
Reactions: Skillz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,639
14,629
136
I found it. $150 I paid for

MSI PRO X670-P WIFI AM5​

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |