Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Random crashes [Solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Sun Feb 26, 2017 5:19 am    Post subject: Random crashes [Solved] Reply with quote

So I've messed around a lot with the kernel config more like trying to fine tune stuff, enabling drivers that were previously missing and also compiled kernel loaded with bluetooth firmware and was successful in enabling bluetooth support.

I also replaced a few packages that were marked stable with their latest git version (9999) for example rxvt-unicode, mpv, i3 window manager, etc.

I know these are a lot of variables so its hard to figure out what exactly I did that made the whole system freeze. Is there a way to look at the logs so as to traceback and find the source of the crash?

Here's my hardware info:
Code:

# lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:2280] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: iosf_mbi_pci
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:22b1] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: i915
   Kernel modules: i915
00:0b.0 Signal processing controller [1180]: Intel Corporation Device [8086:22dc] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: proc_thermal
00:13.0 SATA controller [0106]: Intel Corporation Device [8086:22a3] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: ahci
   Kernel modules: ahci
00:14.0 USB controller [0c03]: Intel Corporation Device [8086:22b5] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: xhci_hcd
00:1a.0 Encryption controller [1080]: Intel Corporation Device [8086:2298] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: mei_txe
00:1b.0 Audio device [0403]: Intel Corporation Device [8086:2284] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: snd_hda_intel
   Kernel modules: snd_hda_intel
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:22c8] (rev 21)
   Kernel driver in use: pcieport
00:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:22cc] (rev 21)
   Kernel driver in use: pcieport
00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:22ce] (rev 21)
   Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:229c] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: lpc_ich
   Kernel modules: lpc_ich
00:1f.3 SMBus [0c05]: Intel Corporation Device [8086:2292] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: i801_smbus
   Kernel modules: i2c_i801
02:00.0 Network controller [0280]: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter [168c:0036] (rev 01)
   Subsystem: Foxconn International, Inc. QCA9565 / AR9565 Wireless Network Adapter [105b:e091]
   Kernel driver in use: ath9k
   Kernel modules: ath9k
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
   Subsystem: Acer Incorporated [ALI] RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1025:100f]
   Kernel driver in use: r8169
   Kernel modules: r8169



And this is running kernel config:
https://paste.pound-python.org/show/cra6nP0Pd3h7O4EA5fXV/

Output of 'emerge --info'
Code:

emerge --info                                                                                                               
Portage 2.3.3 (python 2.7.12-final-0, default/linux/amd64/13.0, gcc-5.4.0, glibc-2.23-r3, 4.4.39-gentoo x86_64)
=================================================================
System uname: Linux-4.4.39-gentoo-x86_64-Intel-R-_Pentium-R-_CPU_N3700_@_1.60GHz-with-gentoo-2.3
KiB Mem:     3959164 total,   2548460 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Sun, 26 Feb 2017 00:45:01 +0000
sh bash 4.3_p48-r1
ld GNU ld (Gentoo 2.25.1 p1.1) 2.25.1
app-shells/bash:          4.3_p48-r1::gentoo
dev-java/java-config:     2.2.0-r3::gentoo
dev-lang/perl:            5.22.3_rc4::gentoo
dev-lang/python:          2.7.12::gentoo, 3.4.5::gentoo
dev-util/cmake:           3.7.2::gentoo
dev-util/pkgconfig:       0.28-r2::gentoo
sys-apps/baselayout:      2.3::gentoo
sys-apps/openrc:          0.23.2::gentoo
sys-apps/sandbox:         2.10-r3::gentoo
sys-devel/autoconf:       2.13::gentoo, 2.69::gentoo
sys-devel/automake:       1.11.6-r1::gentoo, 1.14.1::gentoo, 1.15::gentoo
sys-devel/binutils:       2.25.1-r1::gentoo
sys-devel/gcc:            4.9.3::gentoo, 5.4.0::gentoo
sys-devel/gcc-config:     1.7.3::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1::gentoo
sys-kernel/linux-headers: 4.4::gentoo (virtual/os-headers)
sys-libs/glibc:           2.23-r3::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: webrsync
    sync-uri: rsync://rsync2.cn.gentoo.org/gentoo-portage
    priority: -1000

hamper-overlay
    location: /var/lib/layman/hamper-overlay
    masters: gentoo
    priority: 50

pg_overlay
    location: /var/lib/layman/pg_overlay
    masters: gentoo
    priority: 50

tlp
    location: /var/lib/layman/tlp
    masters: gentoo
    priority: 50

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync webrsync-gpg xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://ftp.lanet.kr/pub/gentoo/ http://ftp.jaist.ac.jp/pub/Linux/Gentoo/ http://gentoo.aditsu.net:8000/ http://mirrors.xmu.edu.cn/gentoo http://ftp.iij.ad.jp/pub/linux/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X aac acl amd64 berkdb branding bzip2 cli consolekit cracklib crypt cxx dbus dri ffmpeg fortran gdbm glamor gnutls gtk gtk3 hscolour iconv infinality jpeg modules mpeg multilib ncurses networkmanager nls nptl opengl openmp openssl pam pcre perl png pulseaudio python readline sasl seccomp session ssl tcpd truetype unicode vaapi xattr zlib zsh-completion" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="synaptics evdev" KERNEL="linux" L10N="en bn bn-BD th" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en en_US bn bn_BD th th_TH" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" RUBY_TARGETS="ruby21" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON



I've also recompiled kernel again removing all the bluetooth support and firmware that I had enabled previously but the issue is still there so it means bluetooth modules/firmwares are not to be blamed.

My netbook is baytrail architecture and I used to have crashes on Arch Linux before I switched to gentoo. When I first switched to gentoo I usually sticked to the stable versions of software for the most part which is probably why I never had those annoying crashes.

Please let me know what else can I do to help fix my issue.

Edit1:
I have also enabled support for these modules, before I just ignored them and didn't have crash. Not sure if they are to be blamed though.
1) mei_txe module (1025:100f)
2) proc_thermal driver (8086:22dc)


Last edited by irenicus09 on Fri Mar 10, 2017 7:03 am; edited 5 times in total
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13836

PostPosted: Sun Feb 26, 2017 8:58 pm    Post subject: Reply with quote

Please describe the random crashes. What exactly crashes? What do you see that leads you to think a crash happened? Does the system spontaneously reboot? Does the GUI freeze? Do specific programs die? Does the machine remain accessible over the network? Does magic sysrq cease to work (assuming it is configured to work when the system is not crashed)?
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Mon Feb 27, 2017 5:24 am    Post subject: Reply with quote

Sorry for not being more specific, I understand it is hard to debug my situation with such little information but let's try again.

When the crash happens the whole screen freezes, it's similar to a blue screen of death in Windows but you don't actually end up with the blue screen or any form of message, everything just halts as if frozen in time. For example you were using the browser Firefox to read this post, during a crash everything would freeze and become unresposive even though you still have the text in front of you, pressing any key makes no difference, touchpad / mouse is frozen as well and trying to switch to virtual tty's doesn't work. I'm using a window manager (i3) and the status bar would freeze as well.

I haven't tried connecting to the machine over the network but the issue seems to have disappeared after I downgraded rxvt-unicde (which runs in daemon mode) and i3 window manager to their stable version. Before I used the latest git version, probably one of them might be the culprit though but let's see. I need to use the system a bit more to determine whether the issue has been resolved or not.

Any feedback or suggestions would be appreciated & Thanks for your time.
Back to top
View user's profile Send private message
cboldt
l33t
l33t


Joined: 24 Aug 2005
Posts: 836

PostPosted: Mon Feb 27, 2017 10:39 am    Post subject: Reply with quote

From the more detailed description, sounds like a WM or X freeze, not a "whole system" freeze. If you can, have that machine on a network with the sshd service running (before its keyboard and mouse are unresponsive), and you should be able to get to the machine and shut down (kill) the offending program if or when it happens again.

I've had WM and X freezes take the keyboard, so nothing I did at the local machine "got through." Other than the screen freeze and keyboard/mouse takeover, the machine was running just fine. By having sshd running, I was able to get in and temporarily resolve the issue without rebooting.

Obviously this does not get to a root cause, but the running logs are available, as is the "running condition," like whether or not the offending problem is eating up clock cycles (run `top` while the machine is "locked up").
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Mon Feb 27, 2017 12:10 pm    Post subject: Reply with quote

Hmm thanks for the tip, ssh is indeed a good way to debug the issue but at the moment I don't have the problem anymore perhaps in the future I'll keep it in mind.

As for the crash, I also think the wm i3 might be responsible. Back in Arch Linux when I had a crash it was similar to the one I just described and at that time I was using i3-wm and I didn't use urxvt so by order of elimination I can only blame i3. :P

There was also a different type of crash in Arch Linux where the screen would flicker and fade to black, and then suddenly the computer would reset. The frustration of too much bleeding edge software and not having control over which version of software you want to have installed and stability issues made me try Gentoo. I guess that's a good thing as through pain and suffering I learned what is good and what is bad :)
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Wed Mar 01, 2017 2:25 pm    Post subject: Reply with quote

So this elusive bug shows up again :lol:

It wasn't there for like straight 2 days and totally surprised me. I decided to seriously startup sshd, connected to the box and waited and waited patiently. Finally after a long wait the screen went black, surprised that it didn't freeze so not sure if its the same bug I was looking at.

Anyway I was already sshd into the box so using dmesg this is the error I got:
Code:

[ 9938.390007] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe B (start=364272 end=364273) time 4 us, min 763, max 767, scanline start 760, end 768
[12256.139395] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun


After some Googling, I figured out a possible solution according to this link -> https://bugzilla.redhat.com/show_bug.cgi?id=1375399

The solution is to set i915.enable_psr=0, according to my grub config it was set as 1 previously. So now I'm in the process of testing whether the solution actually worked or not.

Any feedback would be appreciated.

Thanks.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6228
Location: Room 101

PostPosted: Wed Mar 01, 2017 3:08 pm    Post subject: Reply with quote

irenicus09 ...

if (as the above bug report suggests) it is a regression in 4.7 (fixed in 4.9) then you, on 4.4.39, shouldn't be hit by it. That said, I found 4.x series kernels so troublesome I went back to using 3.12.x ... though I haven't tried anything more recent than 4.4. If I were in your situation I would probably go back to 3.12.x or try 4.10.1.

best ... khay
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Wed Mar 01, 2017 3:59 pm    Post subject: Reply with quote

Hmm I hit the bug after switching to kernel 4.9.6-r1. I'm not sure if I encountered the same bug on 4.4.x but anyway according to your recommendation I will try a more stable older kernel (3.12.66).

As for 4.10.1 I've heard that it has some issues with luks+lvm, not sure if its fixed or not.

Thanks for the suggestion.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6228
Location: Room 101

PostPosted: Wed Mar 01, 2017 5:07 pm    Post subject: Reply with quote

irenicus09 wrote:
Hmm I hit the bug after switching to kernel 4.9.6-r1. I'm not sure if I encountered the same bug on 4.4.x but anyway according to your recommendation I will try a more stable older kernel (3.12.66).

irenicus09 ... I was basing that on your 'emerge --info' above, which has you running 4.4.39. Also, the keyworded (stable) package may not match the release in that kernel series, I would opt for the most recent (so, 3.12.70) rather than those keyworded arch (I say that because the level of testing for package stablisation with sys-kernel/* is minimal at best, and 'stable' according to k.org may include fixes, and/or backports for CVE's, etc).

best ... khay
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Thu Mar 02, 2017 2:44 am    Post subject: Reply with quote

@khayyam ...

Thanks for letting me know that just because a kernel is listed as stable does not necessarily mean that they are well tested or more stable, I actually wasn't aware of that. From now on I will prefer the more recent, updated version of a kernel over something that has been marked as stable in gentoo-sources.

Another thing is, I installed 3.12.70 according to your recommendation and for some reason X failed to start with error. I'm not sure what exactly the error is related to but here is a log -> https://paste.pound-python.org/show/Khj0RoY8MvphTjQQB0dv/ and this is my 20-intel.conf file in /etc/X11/xorg.conf.d directory -> https://paste.pound-python.org/show/MWid1DJHPorxFP9W7YmW/

Perhaps has something to do with messing with Xorg settings, not sure.

As for the current kernel that I'm running (4.9.6-r1), it seems to be running perfectly fine and I haven't encountered a crash so far after changing the esr setting. But I've noticed one interesting feature in 4.9.x that I didn't observe in 4.4.x which is throttling of CPU (showed up in dmesg as idle injection something) which tries to reduce power dissipation in order reduce heat if the temperature of the machine goes like above 65-70 (especially when playing 4k / high resolution videos) even though it is a fanless tablet processor.

Thanks for your time.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6228
Location: Room 101

PostPosted: Thu Mar 02, 2017 8:10 am    Post subject: Reply with quote

irenicus09 wrote:
Thanks for letting me know that just because a kernel is listed as stable does not necessarily mean that they are well tested or more stable, I actually wasn't aware of that. From now on I will prefer the more recent, updated version of a kernel over something that has been marked as stable in gentoo-sources.

irenicus09 ... you're welcome.

irenicus09 wrote:
Another thing is, I installed 3.12.70 according to your recommendation and for some reason X failed to start with error.

Can you provide the output of 'cat /proc/fb'.

irenicus09 wrote:
Perhaps has something to do with messing with Xorg settings, not sure.

I'm not sure, you may have omited the inteldrmfb driver, or have some other framebuffer enabled. Your .config might be worth seeing (via pastebin). BTW, how are you creating the .config ... you need to be a little careful if using a config from a previous kernel series (often its best to start from scratch if the kernel versions are major).

irenicus09 wrote:
As for the current kernel that I'm running (4.9.6-r1), it seems to be running perfectly fine and I haven't encountered a crash so far after changing the esr setting. But I've noticed one interesting feature in 4.9.x that I didn't observe in 4.4.x which is throttling of CPU (showed up in dmesg as idle injection something) which tries to reduce power dissipation in order reduce heat if the temperature of the machine goes like above 65-70 (especially when playing 4k / high resolution videos) even though it is a fanless tablet processor.

That is the advantage of running the most recent kernel version on recent HW, you get features that may only recently have been added. That alone would probably be a good reason to stick with >4.9.x.

best ... khay
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Thu Mar 02, 2017 10:07 am    Post subject: Reply with quote

@khayyam ...

Oops, you are on point about the fact that the kernel config from major kernel versions are not compatible. I wasn't aware of that :oops:

Anyway, since my issues have been fixed for now and the system looks stable, I'll stick to 4.9.x kernel and mark this thread as solved.

Thanks for all your help, really appreciate it! :)
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6228
Location: Room 101

PostPosted: Fri Mar 03, 2017 7:49 am    Post subject: Reply with quote

irenicus09 wrote:
Thanks for all your help, really appreciate it! :)

irenicus09 ... again, you're welcome. BTW, I think the issue with 4.4/i915 is bug 583522.

best ... khay
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Fri Mar 03, 2017 11:22 am    Post subject: Reply with quote

khayyam wrote:
irenicus09 wrote:
Thanks for all your help, really appreciate it! :)

irenicus09 ... again, you're welcome. BTW, I think the issue with 4.4/i915 is bug 583522.

best ... khay


Hmm, another bug showed up today. Was watching a random youtube video using mpv & hardware acceleration, then suddenly the screen froze, and the sound that was playing on the speaker was stuck in a 1-2s loop.

I tried connecting over ssh but it wasn't working, later I setup the laptop again made sure ssh was working and played long 4k videos but the bug didn't happen.

Man these bugs are driving me crazy, as if they have a mind of their own lol.

I guess if I do end up finding it, I'll open a new thread.

Thanks.
Back to top
View user's profile Send private message
Section_8
Guru
Guru


Joined: 22 May 2004
Posts: 566
Location: Arlington, TX, US

PostPosted: Fri Mar 03, 2017 4:58 pm    Post subject: Reply with quote

Ok - I'll try a totally different stab at this. With at least 2 video cards in the past, I've experienced random freezes like this, usually when playing a game, because the fan was in its death throes and the card was overheating. I seem to have a talent for buying video cards with crappy fans. Could this be something overheating?
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Sat Mar 04, 2017 2:52 am    Post subject: Reply with quote

Hmm this is a fanless tablet chip (n3700), the issues for the most part I think is not due to heating or could be but I think has more to do with driver related bugs. The crash doesn't necessarily happen when its running hot but also when it is in a semi idle state with low temperature and all that I'm doing is just browsing the web.

Last edited by irenicus09 on Sun Mar 05, 2017 6:37 pm; edited 1 time in total
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Sun Mar 05, 2017 6:28 pm    Post subject: Reply with quote

Sorry for double post, but I think I've finally figured out everything so let me summarize what I've found.

Hardware:
Intel Cherry View Graphics
Code:

00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:22b1] (rev 21)
   Subsystem: Acer Incorporated [ALI] Device [1025:100f]
   Kernel driver in use: i915


Symptoms:
1) Crash was due to the Intel driver / card
2) Crash only happened after resuming from suspend mode, may be with in the first 5-20 mins after waking up. Whole screen would freeze and become unresponsive.
3) When computer first boots, even playing demanding 4k videos with hardware acceleration didn't lead to crash. It only happened in (2).
4) Changing i915 parameters made little or no difference.

Solution:
Not yet found!

After the crash connecting over the network (ssh) failed, so that wasted a lot of time. Under uxa it works fine so far but hard to say until I've used it long enough. :P


Last edited by irenicus09 on Mon Mar 06, 2017 1:59 am; edited 1 time in total
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1746

PostPosted: Sun Mar 05, 2017 11:28 pm    Post subject: Reply with quote

Quote:
Under uxa it works fine so far but hard to say until I've used it long enough
It's hard to say until it crashes :lol:
How did you find out what the problem is? Random crashes are a real pain to debug, since you can't reliably reproduce it. And there may not be any trace log available either at that point.
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Mon Mar 06, 2017 2:04 am    Post subject: Reply with quote

szatox wrote:
Quote:
Under uxa it works fine so far but hard to say until I've used it long enough
It's hard to say until it crashes :lol:
How did you find out what the problem is? Random crashes are a real pain to debug, since you can't reliably reproduce it. And there may not be any trace log available either at that point.


You are right. I jumped at the first opportunity where it stopped happening for a time being and today it crashed again! :P

Anyway I'll not be so quick to jump to conclusion until I've tested the solution over a long period of time. Switching to uxa didn't solve the problem, but the symptom is clear - It only crashes after waking up from suspend mode, or perhaps that's the most visible thing to me at the moment.

Any suggestion would be welcome, thanks.
Back to top
View user's profile Send private message
unheatedgarage
n00b
n00b


Joined: 19 Sep 2016
Posts: 51

PostPosted: Wed Mar 08, 2017 4:23 am    Post subject: Reply with quote

irenicus09,

Are you using a very big swap file or partition? Since you mentioned it happens after you wake from suspend, maybe there's an issue there?
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Wed Mar 08, 2017 8:26 am    Post subject: Reply with quote

@unheatedgarage: I just use suspend to ram using 'pm-suspend' and I don't have a swap partition. I don't make use of hibernation / suspend to disk so I guess swap partition is not required in my case.

Alright so with kernel version 4.10.1, after waking up from suspend it goes into a black screen, virtual tty's dont work and I didn't mess around with it further.

The most stable kernel that I've seen so far is 4.4.39. Waking up from suspend works fine and as for the crash after waking up from suspend...I did a video stress test with 4k video, so with in 5-10 mins of playing that video it led to the crash. Otherwise probablility of crash after waking up from suspend is kind of low although occasionally I do get that random crash without doing anything gpu/cpu intensive once in a while.

Edit-1:
Alright so I decided to unset 'CONFIG_INT340X_THERMAL', 'CONFIG_INTEL_MEI_TXE' and 'CONFIG_INTEL_MEI' flags from my kernel config, as a result 'proc_thermal' and 'mei_txe' modules weren't loaded.

Now after resuming from suspend to ram and 4k video stress test, no more crash so far...need to wait and see if it does eventually. I can't confirm though that having no crash is somehow related to unsetting those flags, need more time to experiment :P


Edit-2:

I have to do more testing but I tried both with and without the flag 'CONFIG_INT340X_THERMAL' responsible for 'proc_thermal' module. Without this flag it doesn't seem to crash so far, and enabling it does.

Any ideas?

Thanks.
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Fri Mar 10, 2017 3:48 am    Post subject: Reply with quote

Sorry for double post, but it seems obvious to me that the most probable cause for the crash that I've narrowed down with consistent observation comes down to having this flag set in the kernel config 'CONFIG_INT340X_THERMAL'.

I don't have much idea why that is the case but I have one possible logical explanation. When playing 4k videos in mpv (high settings + hardware acceleration using vaapi), the temperature shoots up very easily (60-70) without any form of cooling (fanless processor), so probably a bug in the 'proc_thermal' module is responsible for the crash as it goes into over drive trying to cool down the system. Sometimes it results in a random crash even when not doing any cpu/gpu intensive task but that happens less frequently. Any other views/explanation would be welcome.

This is my current kernel config for those that are curious -> https://paste.pound-python.org/show/wMLK7ImYbVHK4ZeO8lUJ/

Conclusion: So far (3-4 days), without the flag 'CONFIG_INT340X_THERMAL' set it didn't result in a crash even once (including 4k video stress test) which is good news! :P
Back to top
View user's profile Send private message
irenicus09
Tux's lil' helper
Tux's lil' helper


Joined: 07 Jun 2013
Posts: 118

PostPosted: Thu Apr 06, 2017 8:49 am    Post subject: Reply with quote

Sorry for bumping the thread, it has been almost a month and I can confirm that I have observed zero crashes so far. Its rock solid as ever, that's what I had come to expect from Gentoo :P

I don't want to jump to any conclusions, but let me put it out there what has changed.

1) Disabled 'CONFIG_MMC', 'CONFIG_INT340X_THERMAL', 'CONFIG_INTEL_MEI', 'CONFIG_INTEL_MEI_ME', 'CONFIG_INTEL_MEI_TXE'
2) Disabled bluetooth drivers and support completely 'CONFIG_BT'.
3) Switched to modesetting driver from 'xf86-video-intel' and rebuilt xorg-server accordingly.


This is my current kernel config: https://paste.pound-python.org/show/CpeOcwUOgSX1WhvuYf6W/

Hope this helps!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum