Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Sporadic crashes in nouveau driver
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
lue
n00b
n00b


Joined: 09 Jun 2014
Posts: 14

PostPosted: Tue Mar 06, 2018 11:07 pm    Post subject: Sporadic crashes in nouveau driver Reply with quote

This problem first came up a few months ago, but because I was able to work around it I had forgotten about it until possibly more iterations of this problem came up again recently. To put my main question up front: are all these seemingly different crashes in fact related? My intuition says they are, but I really don't know much about graphics cards drivers.

These crashes always present the same way: the screen is completely unresponsive, except for the cursor that responds to mouse movement but doesn't change its appearance over different parts of the screen (not too surprising considering I'm using the hardware cursor). If I had audio playing it'll still work as long as the program playing it is going to work. I can't even switch over to a virtual terminal to try and see the state of things after a crash.

So the problems that came up at first were in Krita when I tried making a new document, specifically with color types other than 8-bit integer, and in Plasma when hovering over an application in the task bar long enough to activate the thumbnail popup thing (but only occasionally would that crash it).

Here's an example of what the logs showed when Krita would crash the graphics card:

Code:
Dec 26 17:39:54 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00800000
Dec 26 17:39:55 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00010000: 00000003
Dec 26 17:40:25 AMD64 kernel: [TTM] Buffer eviction failed
Dec 26 17:40:58 AMD64 kernel: nouveau 0000:02:00.0: fifo: PBDMA0: 00000002 [] ch 2 [003fc71000 X[296]] subc 0 mthd 001c data 00000002
Dec 26 17:42:17 AMD64 kernel: [TTM] Buffer eviction failed
Dec 26 17:42:47 AMD64 kernel: [TTM] Buffer eviction failed
Dec 26 17:44:39 AMD64 kernel: [TTM] Buffer eviction failed
Dec 26 17:45:10 AMD64 kernel: [TTM] Buffer eviction failed


And here's an example of how Plasma would crash:

Code:
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: gr: TRAP ch 2 [003fc71000 X[288]]
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: gr: GPC0/TPC0/TEX: 80000049
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: fifo: read fault at 0004dfb000 engine 00 [PGRAPH] client 01 [GPC0/TEX] reason 02 [PAGE_NOT_PRESENT] on channel 2 [003fc71000 X[288]]
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: fifo: gr engine fault on channel 2, recovering...
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: X[288]: channel 2 killed!
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: gr: TRAP ch 11 [003f8b9000 plasmashell[7147]]
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: gr: GPC0/TPC0/TEX: 80000049
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: fifo: read fault at 0001244000 engine 00 [PGRAPH] client 01 [GPC0/TEX] reason 02 [PAGE_NOT_PRESENT] on channel 11 [003f8b9000 plasmashell[7147]]
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: fifo: gr engine fault on channel 11, recovering...
Dec 28 02:09:05 AMD64 kernel: nouveau 0000:02:00.0: plasmashell[7147]: channel 11 killed!


So that was back at the end of the year. I avoided the problem with Krita by just not using it, and the Plasma issue by turning off the popups for hovering over applications in the taskbar. But just recently when I tried running Runescape on my computer and came across another graphics card crash:

Code:
Feb 23 05:37:50 AMD64 kernel: nouveau 0000:02:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Feb 23 05:37:50 AMD64 kernel: nouveau 0000:02:00.0: fifo: gr engine fault on channel 13, recovering...
Feb 23 05:37:50 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00800000
Feb 23 05:37:50 AMD64 kernel: nouveau 0000:02:00.0: plasmashell[19821]: channel 13 killed!
Feb 23 05:37:50 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00010000: 00000002


With that last line being constantly printed to the log, eventually with systemd-journald constantly noting missed kernel messages. (In another attempt, "channel 13" is instead "channel 14".) And just today when I tried to open gwenview the plasma panel at the bottom of my screen decided to disappear. While I was able to go to a virtual terminal to try to look at what was going on, soon afterwards the graphics card crashed:

Code:
Mar 06 08:42:22 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 01000000: 00000005
Mar 06 08:42:22 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00800000
Mar 06 08:42:22 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 01000000: 00000005
Mar 06 08:42:22 AMD64 kcminit[6800]: Initializing  "kcm_input" :  "kcminit_mouse"
Mar 06 08:42:22 AMD64 kcminit[6800]: kcm_input: Using X11 backend
Mar 06 08:42:22 AMD64 kwin_x11[2325]: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41888, resource id: 27262980, major code: 18 (ChangeProperty), minor code: 0
Mar 06 08:42:23 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 01000000: 00000005
Mar 06 08:42:53 AMD64 kernel: [TTM] Buffer eviction failed


So now to trying to figure out what's causing these failures. First of all, graphics card info:

Code:
02:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Device 196e:104a
        Flags: bus master, fast devsel, latency 0, IRQ 24, NUMA node 0
        Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e8000000 (64-bit, prefetchable) [size=128M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 1100 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau


Now, between the first boot where I had crashing and the one before it, I had gone through updating my system, so I went through all of the packages I had emerged during that last good boot and picked out the ones that could have any possible relation to graphics card handling. Of those, these ones had a different version between what I was running when I did these updates and what I was running when I experienced my first crash:


  • dev-qt/qtcore: 5.9.2 -> 5.9.3
  • dev-qt/qtgui: 5.9.2 -> 5.9.3
  • dev-qt/qtopengl: 5.9.2 -> 5.9.3
  • dev-qt/qtx11extras: 5.9.2 -> 5.9.3
  • kde-frameworks/kinit: 5.40.0 -> 5.41.0
  • kde-frameworks/plasma: 5.40.0 -> 5.41.0
  • kde-plasma/kwin: 5.11.3 -> 5.11.4
  • kde-plasma/plasma-desktop: 5.11.3 -> 5.11.4
  • media-libs/mesa: 17.3.0_rc5 -> 17.3.0
  • sys-kernel/linux-firmware: 20171123 -> 20171206
  • x11-libs/libXxf86misc: 1.0.3 -> 1.0.3-r1


The reason why I include Qt5/KDE5 stuff in consideration is because everything that's broken thus far is connected to KDE in some way (including Runescape, which features some unspecified support for KDE). I have checked the two versions of linux-firmware, but there were no changes to nvidia firmware between them.

I also have to point out that this is the time when I went through the 10.0 -> 17.0 profile change, so a lot of stuff had to be remerged. Unfortunately I don't know how to tell if remerged stuff magically broke from being remerged (it's a lot easier to guess that an update broke something).

Last point, I didn't think it's a hardware failure because it always seems to be caused by KDE-using applications, it's consistently reproducible (except for gwenview, trying to use that after reboot worked just fine), and it doesn't happen randomly or often enough to seem hardware-based to me. The possibly-gwenview-related crash has got me worried it's in fact the hardware. I could theoretically switch over to the official nvidia drivers to see if problems still occur, but I've had it with trying to make nvidia drivers work in the first place.

I know I've thrown a lot of somewhat-disjointed information out there, and if more info is needed I have that sitting around. I hope someone can shed some light on what the deal is here.
Back to top
View user's profile Send private message
lue
n00b
n00b


Joined: 09 Jun 2014
Posts: 14

PostPosted: Fri Apr 27, 2018 4:51 am    Post subject: Reply with quote

I tried running Krita the other day to see if it works now, and in fact it still crashes the video card. It crashed when trying to create a new file. In this case I was trying to create an 18"x24" @ 300dpi image with a 32-bit floating-point color depth.

Code:
Apr 24 18:56:16 AMD64 krita[31359]: OpenGL Info
                                      Vendor:  nouveau
                                      Renderer:  "NVD9"
                                      Version:  "3.0 Mesa 18.0.0-rc4"
                                      Shading language:  1.30
                                      Requested format:  QSurfaceFormat(version 3.0, options QFlags<QSurfaceFormat::FormatOption>(DeprecatedFunctions), depthBufferSize 24, redBufferSize -1, greenBufferSize >
                                      Current format:    QSurfaceFormat(version 3.0, options QFlags<QSurfaceFormat::FormatOption>(DeprecatedFunctions), depthBufferSize 24, redBufferSize 8, greenBufferSize 8>
                                         Version: 3.0
                                         Supports deprecated functions true
                                         is OpenGL ES: false
Apr 24 18:56:16 AMD64 krita[31359]: krita has opengl true
Apr 24 18:56:16 AMD64 krita[31359]: Setting XDG_DATA_DIRS "/usr/bin/../share:/usr/local/share:/usr/share"
Apr 24 18:56:19 AMD64 krita[31359]: Available translations QSet("cs", "tr", "pl", "cy", "he", "pt", "da", "ast", "hi", "de", "lt", "lv", "hne", "ug", "hr", "uk", "hu", "ia", "mk", "uz", "mr", "ms", "pt_BR",>
Apr 24 18:56:19 AMD64 krita[31359]: Available domain translations QSet("cs", "tr", "pl", "cy", "he", "pt", "da", "ast", "hi", "de", "lt", "lv", "hne", "ug", "hr", "uk", "hu", "ia", "mk", "uz", "mr", "ms", ">
Apr 24 18:56:19 AMD64 krita[31359]: Override language: ""
Apr 24 18:56:32 AMD64 krita[31359]: libpng warning: iCCP: too many profiles
Apr 24 18:56:32 AMD64 krita[31359]: libpng warning: iCCP: too many profiles
Apr 24 18:56:32 AMD64 krita[31359]: libpng warning: iCCP: too many profiles
Apr 24 18:56:32 AMD64 krita[31359]: libpng warning: iCCP: too many profiles
Apr 24 18:56:34 AMD64 krita[31359]: QLayout: Attempting to add QLayout "" to QWidget "", which already has a layout
Apr 24 18:56:34 AMD64 krita[31359]:         falling back on QIcon::FromTheme: "edit-clear-locationbar-rtl"
Apr 24 18:59:10 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23411, resource id: 121634832, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:10 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23505, resource id: 73400383, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:10 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23546, resource id: 50331787, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:11 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23613, resource id: 71303613, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:24 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23802, resource id: 39845894, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:26 AMD64 plasmashell[621]: QXcbConnection: XCB error: 2 (BadValue), sequence: 23966, resource id: 121634832, major code: 141 (Unknown), minor code: 3
Apr 24 18:59:53 AMD64 krita[31359]: libpng warning: iCCP: too many profiles
Apr 24 18:59:54 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 01000000: 00000005
Apr 24 18:59:54 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00800000
Apr 24 18:59:54 AMD64 kernel: nouveau 0000:02:00.0: fifo: INTR 00010000: 00000003
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 502
Location: NRW, Germany

PostPosted: Fri Apr 27, 2018 7:09 pm    Post subject: Re: Sporadic crashes in nouveau driver Reply with quote

lue wrote:
I hope someone can shed some light on what the deal is here.

I have not much to contribute, but after my own experiences with nvidia, I'll make damn sure my next laptop will not have one of their products in it.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum