Linux Mint 21.x - apparent graphics card/driver burp - how should I have handled this?

mikeymikec

Lifer
May 19, 2011
17,767
9,727
136
A week or so ago I was using my computer for basic apps usage (office / browsing apps) when the screen went blank. No caps lock response, I tried waiting for a bit, so I pressed the reset button.

In hindsight I should have tried harder to regain control of the system but Linux troubleshooting is not a field that I have much experience in (unlike Windows). I'm asking now because I want to learn more.

After rebooting, I took a copy of /var/log/kern.log, and here are the bits that I think are most relevant:

Code:
Jan 22 14:55:56 mikepc kernel: [30967.242712] [drm] VRAM is lost due to GPU reset!
Jan 22 14:55:56 mikepc kernel: [30967.242714] [drm] PSP is resuming...
Jan 22 14:56:02 mikepc kernel: [30972.542393] [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed.
Jan 22 14:56:02 mikepc kernel: [30972.542520] [drm:psp_resume [amdgpu]] *ERROR* Failed to process memory training!
Jan 22 14:56:02 mikepc kernel: [30972.542620] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Jan 22 14:56:02 mikepc kernel: [30972.542712] amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Jan 22 14:56:02 mikepc kernel: [30972.542732] [drm] Skip scheduling IBs!
<last message repeated many times>
Jan 22 14:56:02 mikepc kernel: [30972.656377] amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62
Jan 22 14:56:02 mikepc kernel: [30972.656379] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
Jan 22 14:56:11 mikepc kernel: [30981.799107] [UFW BLOCK] IN=eno1 OUT= MAC=01:00:5e:00:00:01:80:1f:02:fb:41:54:08:00 SRC=192.168.0.200 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 TTL=1 ID=53711 PROTO=2 
Jan 22 14:56:12 mikepc kernel: [30982.658193] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3498097, emitted seq=3498099
Jan 22 14:56:12 mikepc kernel: [30982.658414] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox-bin pid 29856 thread firefox:cs0 pid 29918
Jan 22 14:56:12 mikepc kernel: [30982.658583] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Jan 22 14:56:12 mikepc kernel: [30982.658862] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate

The system evidently was still responding on some level because the log contains firewall activity info after the gpu going nuts.

In hindsight, I think I should have tried:

Ctrl+Alt+F1 to switch to a command terminal
Ctrl+Alt+F7 to switch back to X
Ctrl+Alt+Backspace - does this still restart the X server? A quick google suggests I'd lose any apps that were running in the X session?

I could also have tried reading kern.log if I could switch to a command terminal and picked up the clue about Firefox, and terminated any processes to do with firefox.

Any more suggestions/advice would be appreciated. I've been using Mint for a few years (since 2021/2022?) and on my Haswell rig this kind of incident only happened once. My latest setup (AMD 7000, RX 6700 XT, now 6.5 kernel updated from 6.2) is only a few months old and this has happened just this one time so far.
 

manly

Lifer
Jan 25, 2000
11,099
2,204
126
If it happens very rarely, I wouldn't worry about it. Any alternative you would have chosen would have wasted more of your time. The reality is that although you might have been able to cleanly restart the OS, it's highly unlikely you would have been able to preserve your graphical session. FWIW my XPS 15 has a 4k display, and virtual console text is all but impossible to read. I've never corrected this configuration, so I rarely drop into a virtual console.

Generally speaking, if there's no Caps Lock response, none of the key combos will work either. If you had sshd running and the network stack was working, you could potentially login and issue the reboot command.

As a last resort on busy servers (to minimize filesystem data loss), there was this if all else failed:

My new desktop build is similar to yours. Out of curiosity, do you use Linux or Windows more?
 
Last edited:

Tech Junky

Diamond Member
Jan 27, 2022
3,436
1,156
106
I'm on kernel 6.8-rc1 as I update weekly manually to patch things. Firmware tends to be outdated compared to what's available. I'm working a wifi BE setup right now and bleeding edge firmware is needed to get things working.

AMD doesn't make it easy either when it comes to seeing behind the curtain for quite a bit of what's contained in their products. I'm using an arc a380 and in dmesg it shows it looking for a newer firmware than what's listed or available as well. There will be some clues there as well for amdgpu. Since the 7900x uses it as well for the igpu it loads whatever the kernel includes as well.

Code:
sudo dmesg | grep amdgpu
[    5.531205] [drm] amdgpu kernel modesetting enabled.
[    5.531236] amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
[    5.531440] amdgpu: ATPX version 1, functions 0x00000000
[    5.543769] amdgpu: Virtual CRAT table created for CPU
[    5.543787] amdgpu: Topology: Add CPU node
[    5.543929] amdgpu 0000:66:00.0: enabling device (0006 -> 0007)
[    5.546062] amdgpu 0000:66:00.0: amdgpu: Fetched VBIOS from VFCT
[    5.546065] amdgpu: ATOM BIOS: 102-RAPHAEL-008
[    5.555563] amdgpu 0000:66:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[    5.555616] amdgpu 0000:66:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
[    5.555618] amdgpu 0000:66:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[    5.555694] [drm] amdgpu: 512M of VRAM memory ready
[    5.555696] [drm] amdgpu: 15599M of GTT memory ready.
[    5.556644] amdgpu 0000:66:00.0: amdgpu: Will use PSP to load VCN firmware
[    5.641797] amdgpu 0000:66:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    5.647688] amdgpu 0000:66:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    5.647691] amdgpu 0000:66:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    5.649346] amdgpu 0000:66:00.0: amdgpu: SMU is initialized successfully!
[    5.650843] snd_hda_intel 0000:66:00.1: bound 0000:66:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[    5.689738] amdgpu: HMM registered 512MB device memory
[    5.690848] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    5.690865] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    5.691426] amdgpu: Virtual CRAT table created for GPU
[    5.691549] amdgpu: Topology: Add dGPU node [0x164e:0x1002]
[    5.691551] kfd kfd: amdgpu: added device 1002:164e
[    5.691562] amdgpu 0000:66:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 2, active_cu_number 2
[    5.691567] amdgpu 0000:66:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    5.691568] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    5.691570] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    5.691571] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    5.691572] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    5.691574] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    5.691575] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    5.691576] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    5.691577] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    5.691579] amdgpu 0000:66:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[    5.691580] amdgpu 0000:66:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    5.691581] amdgpu 0000:66:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[    5.691582] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[    5.691584] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[    5.691585] amdgpu 0000:66:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[    5.692569] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:66:00.0 on minor 1
[    5.694681] amdgpu 0000:66:00.0: [drm] Cannot find any crtc or sizes

Code:
 sudo dmesg | grep i915
[    5.527804] i915 0000:03:00.0: [drm] VT-d active for gfx access
[    5.528088] i915 0000:03:00.0: vgaarb: deactivate vga console
[    5.528113] i915 0000:03:00.0: [drm] Local memory IO size: 0x000000017c800000
[    5.528115] i915 0000:03:00.0: [drm] Local memory available: 0x000000017c800000
[    5.543839] i915 0000:03:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    5.547629] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[   10.673992] i915 0000:03:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin (70.12.1) is recommended, but only i915/dg2_guc_70.bin (70.8.0) was found
[   10.673996] i915 0000:03:00.0: [drm] GT0: Consider updating your linux-firmware pkg or downloading from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[   10.678989] i915 0000:03:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.8.0
[   10.678992] i915 0000:03:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.7
[   10.690557] i915 0000:03:00.0: [drm] GT0: GUC: submission enabled
[   10.690561] i915 0000:03:00.0: [drm] GT0: GUC: SLPC enabled
[   10.690794] i915 0000:03:00.0: [drm] GT0: GUC: RC enabled
[   10.712909] i915 0000:03:00.0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[   10.722313] [drm] Initialized i915 1.6.0 20230929 for 0000:03:00.0 on minor 0
[   10.723131] i915 display info: display version: 13
[   10.723133] i915 display info: cursor_needs_physical: no
[   10.723134] i915 display info: has_cdclk_crawl: no
[   10.723135] i915 display info: has_cdclk_squash: yes
[   10.723137] i915 display info: has_ddi: yes
[   10.723138] i915 display info: has_dp_mst: yes
[   10.723139] i915 display info: has_dsb: yes
[   10.723140] i915 display info: has_fpga_dbg: yes
[   10.723141] i915 display info: has_gmch: no
[   10.723142] i915 display info: has_hotplug: yes
[   10.723143] i915 display info: has_hti: no
[   10.723144] i915 display info: has_ipc: yes
[   10.723145] i915 display info: has_overlay: no
[   10.723145] i915 display info: has_psr: yes
[   10.723146] i915 display info: has_psr_hw_tracking: no
[   10.723147] i915 display info: overlay_needs_physical: no
[   10.723148] i915 display info: supports_tv: no
[   10.723149] i915 display info: has_hdcp: yes
[   10.723150] i915 display info: has_dmc: yes
[   10.723151] i915 display info: has_dsc: yes
[   10.748987] fbcon: i915drmfb (fb0) is primary device
[   10.748950] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [i915])
[   10.892550] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[   11.461697] i915 0000:03:00.0: [drm] GT0: HuC: authenticated for all workloads
[   11.461702] mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:03:00.0 (ops i915_pxp_tee_component_ops [i915])
 
Last edited:

mikeymikec

Lifer
May 19, 2011
17,767
9,727
136
My new desktop build is similar to yours. Out of curiosity, do you use Linux or Windows more?

I only use Windows for gaming or data recovery work; Linux is my primary OS.

It would be nice to recover my X session by say killing Firefox and switching back to the X session, but I'd settle for say restarting the X session.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |