Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] Intel HD Graphics 5500 (Broadwell) Troubleshooting
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Zeault
n00b
n00b


Joined: 03 Aug 2019
Posts: 5
Location: New England, United States

PostPosted: Tue Aug 27, 2019 7:03 pm    Post subject: [Solved] Intel HD Graphics 5500 (Broadwell) Troubleshooting Reply with quote

Hello Gentoo Forums,

I've been slowly installing Gentoo on my 3-year-old Dell laptop to eventually replace Windows 7. Right now I'm stuck on establishing a GUI environment with X that uses my Intel HD Graphics 5500 GPU for 2D and 3D acceleration. I'm experiencing graphical issues and hangups with both the xf86-intel and generic modesetting drivers. I've been reading the following wiki pages to install and configure the X packages:
Here are my main Gentoo configurations and logs:

uname: Linux ltca 4.19.57-gentoo #14 SMP Sun Aug 4 20:44:34 EDT 2019 x86_64 Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz GenuineIntel GNU/Linux
kernel .config: http://dpaste.com/1MXR5DM
dmesg output (redacted): http://dpaste.com/2ZX23R0
lspci output: http://dpaste.com/1MECBQ0
make.conf: http://dpaste.com/3W5AC29
package.use: http://dpaste.com/3ZCXWSS
xorg-server build info:
Code:
x11-base/xorg-server-1.20.5::gentoo was built with the following:
USE="glamor ipv6 libressl suid udev xorg -debug -dmx -doc -elogind -kdrive -minimal (-selinux) -static-libs -systemd -unwind -wayland -xcsecurity -xephyr -xnest -xvfb" ABI_X86="(64)"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,lazy"

xf86-video-intel build info:
Code:
x11-drivers/xf86-video-intel-2.99.917_p20190301::gentoo was built with the following:
USE="dri sna tools udev uxa xvmc -debug" ABI_X86="(64)"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,lazy"


I set the VIDEO_CARDS variable and then merged xorg-server, xterm, xclock, and twm. I logged in as a member of the audio/video/wheel groups, and then ran startx. The screen went black and the system became unresponsive. I had errors in Xorg.log for missing modules, so I created 10-modesetting.conf with the following contents:
Code:

Section "Device"
   Identifier   "modesetting"
   Driver      "modesetting"
   Option      "AccelMethod"   "glamor"
   Option      "DRI"      "3"
EndSection

The log was now error free, but the screen was still black and the system was mostly unresponsive. Sometimes I could press the power button to execute my ACPI script and safely shutdown the system, but other times it wouldn't work and I had to hard reset. Next, I tried merging the deprecated xf86-video-intel package and replacing 10-modesetting.conf with this 10-intel.conf:
Code:

Section "Device"
   Identifier   "Intel Graphics"
   Driver      "intel"
   Option      "AccelMethod"   "sna"
EndSection

This appeared to work, but when I switched to TrueType fonts in xterm the system would inconsistently freeze. I blamed this on my lack of font configuration, but experienced the same thing later when I ran glxgears or glxinfo. Sometimes glxgears would run as expected. Other times, it would cause part of the screen to go black and completely hang the system. And other times still, the hang would occur, but X would return to normal after a minute and glxgears would close with the error message "i965: Failed to submit batchbuffer: Input/output error". When I checked the logs for the times that X recovered, it would say something like "GPU hang detected. Switching to software renderer." Unfortunately I did not save a log from the last case, but the ones generated during the most common case (hang with no recovery) contain no errors or warnings.

I'd love it if there was some configuration that could resolve my problem but due to the inconsistency I'm guessing there is a bug that affects my GPU/system. I'll try any configuration you suggest, but I'd also like to know how I can start troubleshooting or debugging GPU issues like this. If I can't fix a bug myself I'll at least try to research/isolate it before I file a bug report.
Thank you.


Last edited by Zeault on Sat Sep 14, 2019 9:31 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Tue Aug 27, 2019 8:41 pm    Post subject: Reply with quote

Zeault,

Welcome to Gentoo.

Put your 10-modesetting.conf and 10-intel.conf to one side meanwhile.
Run startx, which should start three xterms and an analogue clock all wrapped up in twm.

If it fails, good. Regardless, once you are back in control, put /var/log/Xorg.0.log onto a pastebin site.
This will tell what Xorg detected and what it did when it tried to start.

Your dmesg is "mostly harmless" and your kernel .config looks OK too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zeault
n00b
n00b


Joined: 03 Aug 2019
Posts: 5
Location: New England, United States

PostPosted: Wed Aug 28, 2019 12:04 am    Post subject: Reply with quote

Quote:
Put your 10-modesetting.conf and 10-intel.conf to one side meanwhile.
Run startx, which should start three xterms and an analogue clock all wrapped up in twm.

If it fails, good. Regardless, once you are back in control, put /var/log/Xorg.0.log onto a pastebin site.
This will tell what Xorg detected and what it did when it tried to start.


I removed the config files and ran startx. To my surprise it worked just fine. I tried to get the problematic behavior to show up again by opening lots of windows and running lots of glxgears, but it was stable for the time that I used it. I'll have to work with it some more to see if it is truly 'fixed.'

Here is Xorg.0.log.

Removing those configuration files is not the only thing I changed since the original post. Before I saw your reply, I disabled some Kernel configuration options on a whim hoping they would improve the situation: CONFIG_DRM_I915_USERPTR, CONFIG_AMD_IOMMU, and CONFIG_AMD_IOMMU_V2.

After I disabled those but before I cleared xorg.conf.d, X ran much more stable. I thought it was fixed at first, but I ran into graphical issues after using it for a few minutes. The system did not become unresponsive though, and I was able to VT switch and stop the server before it got too bad.

If you think I should change these configs back I will, even if it's just for the sake of clean testing. I don't even know what CONFIG_DRM_I915_USERPTR does.

Thank you for your assistance
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Wed Aug 28, 2019 9:00 am    Post subject: Reply with quote

Zeault,

You only have one GPU
Code:
[    82.764] (--) PCI:*(0@0:2:0) 8086:1616
so no danger of Optimus confusing things.

This is the auto blackmagic detecting your the best GPU driver for your hardware.
Code:
[    82.961] (==) Matched intel as autoconfigured driver 0
[    82.961] (==) Matched modesetting as autoconfigured driver 1
[    82.961] (==) Matched fbdev as autoconfigured driver 2
[    82.961] (==) Matched vesa as autoconfigured driver 3


Code:
[    83.106] (EE) Failed to load module "fbdev" (module does not exist, 0)
[    83.107] (EE) Failed to load module "vesa" (module does not exist, 0)

Can be ignored. Its Xorg checking all the auto drivers. You don't have them, so its expected.

Code:
[    83.115] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20180719
...
[    83.525] (II) intel(0): switch to mode 1366x768@60.0 on eDP1 using pipe 0, position (0, 0), rotation normal, reflection none
[    83.525] (II) intel(0): Setting screen physical size to 361 x 203

That's the piece of the kernel in use and your Xorg screen setup.
There is nothing wrong with any of that.

The only downside to letting tho automatic everything do its thing in that you get a right handed three button mouse and an American QWERTY keymap.

Code:
CONFIG_AMD_IOMMU, and CONFIG_AMD_IOMMU_V2.
are both useless baggage. You have an Intel motherboard chip set.

I don't know what
Code:
CONFIG_DRM_I915_USERPTR
does. DRM_I915 in in use, from your log.
Have a read of the help on DRM_I915_USERPTR.

You can add some xorg.config fragments in /etc/X11/xorg.conf.d/ to fine tune your mouse and keyboard if you prefer.
Meanwhile, the automatics are doing the right thing for your GPU.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zeault
n00b
n00b


Joined: 03 Aug 2019
Posts: 5
Location: New England, United States

PostPosted: Wed Aug 28, 2019 6:08 pm    Post subject: Reply with quote

Hello again NeddySeagoon,

I was using X today for quite a while with no problems, but finally the issue occurred again: windows would not redraw, portions of the screen going black. I killed X and saved the log. In it, I found the message about software rendering I was describing in my original post:
Code:
[  1519.751] (EE) intel(0): Failed to submit rendering commands (Input/output error), disabling acceleration.

Here is the full log, though I don't see anything out of the ordinary besides that error message. Do you know of any verbose debugging options I can enable for the intel module?

Thank you
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Wed Aug 28, 2019 7:08 pm    Post subject: Reply with quote

Zeault,

Its too late to be asking now but dmesg would be useful. It will show kernel errors, if any were detected.

Some things to try ...

The modesetting driver.
The testing gentoo-sources kernel. Thats 5.2 or even 5.3.
The testing Intel video driver
Testing mesa,

Its quite possible that there are no newer Intel drivers and mesa. I didn't check.

Don't do them all at once. Its one at a time or a binary search to avoid the problem.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zeault
n00b
n00b


Joined: 03 Aug 2019
Posts: 5
Location: New England, United States

PostPosted: Tue Sep 03, 2019 9:43 pm    Post subject: Reply with quote

NeddySeagoon,

I've made some progress, but the issue persists. I first tried to collect some info from dmesg after one of the hangs. I could see some DMA errors, but they didn't really help, so I upgraded to kernel 5.2.10. Like before, I was able to use X for a few hours before I encountered the issue. The new kernel output more information visible in dmesg and detected the hang much more consistently, creating a crash dump and causing X to disable acceleration. I pasted the tail of dmesg below.

Code:
[  190.704701] DMAR: DRHD: handling fault status reg 2
[  190.704709] DMAR: [DMA Write] Request device [00:02.0] fault addr 46a000 [fault reason 23] Unknown
[  190.704714] DMAR: DRHD: handling fault status reg 3
[  190.704718] DMAR: [DMA Read] Request device [00:02.0] fault addr 283000 [fault reason 23] Unknown
[  190.704722] DMAR: DRHD: handling fault status reg 3
[  190.704726] DMAR: [DMA Read] Request device [00:02.0] fault addr 2e6000 [fault reason 23] Unknown
[  190.704730] DMAR: DRHD: handling fault status reg 3
[  201.590577] Asynchronous wait on fence i915:X[1611]:742 timed out (hint:intel_atomic_commit_ready+0x0/0x4c)
[  207.278825] i915 0000:00:02.0: GPU HANG: ecode 8:2:0xfffffffe, in X [1611], no progress on bcs0
[  207.278829] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  207.278831] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  207.278832] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  207.278834] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  207.278836] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  207.278850] i915 0000:00:02.0: Resetting bcs0 for no progress on bcs0
[  207.278880] dmar_fault: 1460 callbacks suppressed
[  207.278881] DMAR: DRHD: handling fault status reg 2
[  207.278886] DMAR: [DMA Write] Request device [00:02.0] fault addr 421000 [fault reason 23] Unknown
[  207.278890] DMAR: DRHD: handling fault status reg 2
[  207.278894] DMAR: [DMA Write] Request device [00:02.0] fault addr 423000 [fault reason 23] Unknown
[  207.278897] DMAR: [DMA Write] Request device [00:02.0] fault addr 424000 [fault reason 23] Unknown
[  207.278903] DMAR: DRHD: handling fault status reg 2
[  207.278907] DMAR: [DMA Write] Request device [00:02.0] fault addr 425000 [fault reason 23] Unknown
[  212.470594] Asynchronous wait on fence i915:X[1611]:744 timed out (hint:intel_atomic_commit_ready+0x0/0x4c)
[  223.270642] i915 0000:00:02.0: Resetting bcs0 for no progress on bcs0
[  223.270671] dmar_fault: 1889 callbacks suppressed
[  223.270673] DMAR: DRHD: handling fault status reg 2
[  223.270679] DMAR: [DMA Write] Request device [00:02.0] fault addr 45a000 [fault reason 23] Unknown
[  223.270690] DMAR: DRHD: handling fault status reg 2
[  223.270694] DMAR: [DMA Write] Request device [00:02.0] fault addr 45d000 [fault reason 23] Unknown
[  223.270697] DMAR: [DMA Write] Request device [00:02.0] fault addr 45e000 [fault reason 23] Unknown
[  223.270700] DMAR: DRHD: handling fault status reg 2
[  223.270704] DMAR: [DMA Write] Request device [00:02.0] fault addr 45f000 [fault reason 23] Unknown
[  234.230593] Asynchronous wait on fence i915:X[1611]:746 timed out (hint:intel_atomic_commit_ready+0x0/0x4c)
[  239.270642] i915 0000:00:02.0: Resetting bcs0 for no progress on bcs0


I have been installing more and more of the testing software and collecting crash dumps each time because they don't fix the issue. I think its time to upload one of my crash dumps to bugs.freedesktop.org as the error log suggests, but they want me to try using the drm-tip kernel before I file the report. Is there any supported way to get the drm-tip kernel through Portage/Gentoo or should I just do it myself? If this does fix the issue or even if the issue is eventually fixed by a future patch, I want to try and keep my system/packages managed the Gentoo way if I can. Would this be something I should/could write a custom ebuild for?
Thank you.
Back to top
View user's profile Send private message
Zeault
n00b
n00b


Joined: 03 Aug 2019
Posts: 5
Location: New England, United States

PostPosted: Sat Sep 14, 2019 9:30 pm    Post subject: Reply with quote

I'm writing this from my new Gentoo laptop as a final update to help anyone else who might be experiencing graphical bugs on a Broadwell system.
First, it turns out that managing multiple kernel versions on Gentoo is super easy, but I found that the bugs I was experiencing did not go away even with the very latest DRM-Tip kernel+modules installed. I collected all of my logs and dumps and was about to upload them to bugs.freedesktop.org, when I found a similar bug already on the frequent duplicates list: https://bugs.freedesktop.org/show_bug.cgi?id=89360. For some reason I was unable to find this in my original search engine queries.

A patch has yet to be submitted, and I don't think there will be one. To me it looks like it's a hardware bug that affects Broadwell 8th generation GPUs. The part of the GPU that has the issue is the IOMMU and luckliy it can be disabled with the kernel command line:
Code:
intel_iommu=igfx_off

This is the recommended fix from Intel for any DMAR/IOMMU problems. https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt. Hardware accelerated graphics and video decoding are now working on my machine with the IOMMU disabled (this has not been the case for everyone. See the bug report above). From what I can tell the IOMMU is central to the Intel VT-d feature which allows virtual machines to access the GPU, and although I haven't tested it I'm guessing VT-d wouldn't work with that command line option set.

I don't plan on running virtual machines with hardware accelerated graphics on my laptop, so for me this is a solution.
Thank you to NeddySeagoon and the Gentoo Forums.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum