Discussion My 6900xt starting to have issues :( - update: probably amd driver issue

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
My 6900xt was purchased March 4 2021 directly from AMD.


If left at factory default settings, it will freeze on the desktop every 15 seconds or so. Mouse freeze, desktop freeze, everything just stops. About 5 seconds later it unfreezes and things start responding again.


A 2% undervolt + upping the power limit +15% fixes everything.


This card is 2.5 years old, a pandemic card, and it was used for mining. I know I am not going to get sympathy on a mining card.


But I am completely shocked, I thought it would last for many years and it is now telling me the end is soon.
 
Last edited:
Reactions: Pohemi

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
Run it cold with solution.
I am not going to water cool a dying card, but you make a really good point.

Turning off the zero fan speed feature will cool all the vrms and all the components that never get cooling with the default profile at idle.

And this issue is happening at idle.
 
Last edited:

In2Photos

Golden Member
Mar 21, 2007
1,644
1,658
136
What are the temps like? Does it need a re-paste?

Edit: Just saw this was at idle, so likely not a temp problem. Interesting that avoltage drop helped.
 
Last edited:
Reactions: Leeea

biostud

Lifer
Feb 27, 2003
18,280
4,801
136
I am not going to water cool a dying card, but you make a really good point.

Turning off the zero fan speed feature will cool all the vrms and all the components that never get cooling with the default profile at idle.

And this issue is happening at idle.
It was a reference to an old joke
 
Reactions: MrPickins and Leeea

GodisanAtheist

Diamond Member
Nov 16, 2006
6,933
7,347
136
I wondered a while ago if GPUs manufactured during the supply shortages of the pandemic would have higher failure rates than previous GPUs thanks to subbing in lower quality parts.

Add in the power spiking on the 6xxx and 3xxx series of GPUs which must be rough on caps etc and you got yourself a stew going.

Wonder if anyone out there is or is capable of gathering data on this.
 
Reactions: Leeea

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
What are the temps like? Does it need a re-paste?

Edit: Just saw this was at idle, so likely not a temp problem. Interesting that a voltage drop helped.
I am not a fan of re-pasting, but that is an interesting question. This is a big vapor chamber design, so easy for me to screw up.


The 2% voltage drop likely just took a touch of pressure off the one vrm that runs at idle. I think biostud accidentally called the solution, turn off zero fan speed feature and keep that vrm cold at idle.


But still, driver updates are suddenly scary, as they always reset to default settings.
 
Last edited:
Reactions: GodisanAtheist

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,450
2,488
146
What is the warranty on the card, 2 years or 3 years? I suspect only 2 years, but it shouldn't matter if you mined on it or not if it is a 3 years warranty. That said, it is probably only a 2 year warranty, in which case you are out of luck.

I am curious though, why would you undervolt and turn up the power limit at the same time? It seems to me that if the stability issue is with voltage too high, just undervolting should be enough. Adding to the power budget may not increase maximum voltage, but it seems to me that it may increase clocks and power consumption a bit, assuming the core voltage isn't too low. If the card is unstable stock, I find it odd that this would help the situation.

I am not saying you are wrong, I am just curious about how you came to that setting, and how in the heck is it working better for you.
 

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
What is the warranty on the card, 2 years or 3 years? I suspect only 2 years, but it shouldn't matter if you mined on it or not if it is a 3 years warranty. That said, it is probably only a 2 year warranty, in which case you are out of luck.

I am curious though, why would you undervolt and turn up the power limit at the same time? It seems to me that if the stability issue is with voltage too high, just undervolting should be enough. Adding to the power budget may not increase maximum voltage, but it seems to me that it may increase clocks and power consumption a bit, assuming the core voltage isn't too low. If the card is unstable stock, I find it odd that this would help the situation.

I am not saying you are wrong, I am just curious about how you came to that setting, and how in the heck is it working better for you.
It was the default overclock setting I had prior to a driver update resetting it to default.


I did the driver update, which reset to overclocking to default. I didn't care about overclocking any more so I just let things be, and then I noticed things started failing.

Restarted the computer, didn't really help.

Started shutting down programs ( steam, etc ), that didn't help.

Finally figured it had to be the GPU crashing and recovering itself. I knew it had just been reset to default profile by a driver update, became suspicious, so I reapplied my previous overclocking profile.

And my problems just disappeared.



Anyway, AMD has already released another driver update for the GPU with in days of the previous, which I updated to yesterday - 23.9.3. Interestingly, it does not exhibit the behavior with the new driver on default settings. The only thing I have changed with the current updated driver is turn off zero fan speed feature.

I don't install beta drivers, but only the WHQL ones. -eyes AMD suspiciously-


Poking around on reddit yielded this for the 23.9.2 driver:
https://www.reddit.com/r/Amd/comments/16t3tal/amd_software_adrenalin_edition_2393_release_notes/
seems I was not the only one


I am curious why applying an overclocking profile would fix it.

Feeling a bit disappointed with AMD at the moment. I realize both manufacturers have their issues, but they are supposed to confine those issues to the games.
 
Last edited:

GodisanAtheist

Diamond Member
Nov 16, 2006
6,933
7,347
136
This is why, generally, if a driver works for me and it doesn't have any game specific enhancements for something I'm currently playing (rare, since I tend to play older stuff) I just stick with the driver I have.

Might do a once a year or every couple year update just to catch up on anything I missed, but also only when the driver has been out a while.

Hasn't steered me wrong, AMD or NV.
 
Reactions: Indus and Leeea
Dec 10, 2005
24,153
6,969
136
What is the warranty on the card, 2 years or 3 years? I suspect only 2 years, but it shouldn't matter if you mined on it or not if it is a 3 years warranty. That said, it is probably only a 2 year warranty, in which case you are out of luck.
Depending on how it was purchased, you could get lucky with warranty extension coverage, if it was a benefit on the credit card used to purchase it. Most that offer that tend to double the manufacturer warranty up to 1 year.

---
Anyway, if the updated driver works fine at default, seems like the problem is solved. Though, if it did start to act up, I'd probably look for a warranty repair sooner rather than later, if only because the time for that could be running short, and having to move from stock settings to get it to work right shouldn't be a long term fix.
 
Reactions: Shmee and Leeea

coercitiv

Diamond Member
Jan 24, 2014
6,256
12,189
136
@Leeea I had a similar issue recently, actually may still be ongoing as it's very hard to trigger. In my case the system freezes for a few seconds when doing specific actions, such as hovering over one specific thumbnail in a video list or starting a browser. It was 100% linked to UI focus, meaning that if I moved focus away from the problem window, the system would start working after a few seconds. Nothing can be reproduced with confidence, once I had one YT thumbnail trigger the event in Firefox (while the rest in the page were fine), another time I got the freeeze from starting Chrome or Edge (I use both only on occasion, work related) Later Edit: I see the Chrome browser issue is widely reported on reddit as well.

No program was able to register the freeze, whether something like Process Explorer or LatencyMon. I uninstalled everything I could think of that was recently added, until the GPU driver came up in line - so it was DDU time. Using just the basic driver from Windows update the problem "went away" in the sense that I could no longer reproduce it. I then re-installed 23.9.2 and everything seemed fine. Later that day I did however just one similar event, but just one. I can no longer reproduce it using the usual tricks.

Poking around on reddit yielded this for the 23.9.2 driver:
https://www.reddit.com/r/Amd/comments/16t3tal/amd_software_adrenalin_edition_2393_release_notes/
seems I was not the only one
The experience described by redditors seems to fit mine, though I see some did not realize they could recover by moving the mouse away, changing window focus, or pressing Ctrl-Alt-Del and waiting a few seconds.

I am curious why applying an overclocking profile would fix it.
Yeah, I'm on a custom profile as well. MIght be why I was not able to reproduce it anymore. When it started I might have had the profile disabled.


This is a big black ball for AMD, the average user won't know what hit them and their systems may become completely unusable.
 
Last edited:
Reactions: Leeea

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
@Leeea I had a similar issue recently, actually may still be ongoing as it's very hard to trigger. In my case the system freezes for a few seconds when doing specific actions, such as hovering over one specific thumbnail in a video list or starting a browser. It was 100% linked to UI focus, meaning that if I moved focus away from the problem window, the system would start working after a few seconds. Nothing can be reproduced with confidence, once I had one YT thumbnail trigger the event in Firefox (while the rest in the page were fine), another time I got the freeeze from starting Chrome or Edge (I use both only on occasion, work related) Later Edit: I see the Chrome browser issue is widely reported on reddit as well.

No program was able to register the freeze, whether something like Process Explorer or LatencyMon. I uninstalled everything I could think of that was recently added, until the GPU driver came up in line - so it was DDU time. Using just the basic driver from Windows update the problem "went away" in the sense that I could no longer reproduce it. I then re-installed 23.9.2 and everything seemed fine. Later that day I did however just one similar event, but just one. I can no longer reproduce it using the usual tricks.


The experience described by redditors seems to fit mine, though I see some did not realize they could recover by moving the mouse away, changing window focus, or pressing Ctrl-Alt-Del and waiting a few seconds.


Yeah, I'm on a custom profile as well. MIght be why I was not able to reproduce it anymore. When it started I might have had the profile disabled.


This is a big black ball for AMD, the average user won't know what hit them and their systems may become completely unusable.
I have had the issue come back on the 23.9.3 driver. Much more rare, but it does happen once in a while.

It is like you say, random. Online they are saying downgrade to 23.9.1, but I have not done that yet.

I flipped the custom profile on, and it still occurs with the custom profile.


I have experimented a bit. 2x Single threaded 2d GPU accelerated applications like to trigger it the most it seems. Thunderbird + MS Edge sort of thing.


I submitted an official bug report to AMD. This really is as you say a big black ball for AMD. The average user is not going to know what hit them. I certainly didn't, I thought my hardware was failing.
 
Last edited:

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
@Leeea when it happens, check if disabling Freesync helps. (assuming you have it enabled already)
Interesting, I just turned if off.

Will let you know if makes any difference.

I have not rolled back the driver. The issue is pretty rare. The biggest thing that helps it is just loading a custom performance profile.

I don't think it matters what the custom performance profile is. Just any custom performance profile.
 

In2Photos

Golden Member
Mar 21, 2007
1,644
1,658
136
Interesting, I just turned if off.

Will let you know if makes any difference.

I have not rolled back the driver. The issue is pretty rare. The biggest thing that helps it is just loading a custom performance profile.

I don't think it matters what the custom performance profile is. Just any custom performance profile.
That almost sounds like the "default" profile is corrupt or something.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,450
2,488
146
Hmm that is interesting, I know some games freesync compatibility may be bugged, but generally I have it on, though I have tried turning it off for Quake Champions sometimes, depending on the driver version, sometimes that helped. Problem is if I turn it off globally, at least with my setup, idle TBP seems to increase from about 50W to 100W with my card.
 

Leeea

Diamond Member
Apr 3, 2020
3,645
5,379
136
Hmm that is interesting, I know some games freesync compatibility may be bugged, but generally I have it on, though I have tried turning it off for Quake Champions sometimes, depending on the driver version, sometimes that helped. Problem is if I turn it off globally, at least with my setup, idle TBP seems to increase from about 50W to 100W with my card.
It is super disappointing to turn it off. But desktop experience comes first.

But for a 4k 120 Hz hdr display on an HDMI connection my idle desktop GPU wattage is 35 watts. Which is way less then what your seeing.

It was quite a bit lower when I was on 1440p 144hz display port, but 4k hdmi did jump the idle wattage up quite a bit.
 
Last edited:

MrPickins

Diamond Member
May 24, 2003
9,017
585
126
Hmm that is interesting, I know some games freesync compatibility may be bugged, but generally I have it on, though I have tried turning it off for Quake Champions sometimes, depending on the driver version, sometimes that helped. Problem is if I turn it off globally, at least with my setup, idle TBP seems to increase from about 50W to 100W with my card.
That idle wattage seems awfully high. I know they're different GPUs, but my 7800xt is currently idling at 10-20W with my full development environment running on a triple 1920x1200 monitor (non Freesync) setup via daisy-chained DisplayPort.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,450
2,488
146
Yep. Dell business class IPS monitors @ 1920x1200 60Hz. I suppose 120Hz could increase the wattage, but still seems high 🤷‍♂️
Yeah, it has to do with high resolution and high refresh I think. My monitor has a 270Hz refresh rate at 2560x1440.
 
Reactions: MrPickins

coercitiv

Diamond Member
Jan 24, 2014
6,256
12,189
136
I suppose 120Hz could increase the wattage, but still seems high 🤷‍♂️
I never bothered to read on what exactly affects idle power, but it's clearly affected by overall bandwidth requirements: after a certain threshold is reached, the card enters a less efficient power state.

Example using my 6800XT and a single 1440p 165Hz monitor, idle power with FSR off:
  • 60Hz > 7W
  • 120Hz > 7W
  • 165Hz > 33W
Enabling FSR brings 165Hz back to 7W.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |