Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How to troubleshoot hanging Gentoo system?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
mutiny
n00b
n00b


Joined: 06 Aug 2014
Posts: 14

PostPosted: Tue Mar 31, 2015 4:52 am    Post subject: How to troubleshoot hanging Gentoo system? Reply with quote

Hello all.

I have been experiencing a system hang for a few days now, and have no idea how to go about diagnosing the cause or how to troubleshoot/where to even begin.

Basically, I have been waking up to a complete system hang every morning, after the system has been idle overnight. The keyboard does not respond (Num Lock LED is on, but pressing Num Lock keys or Caps Lock etc, does not produce any response, keyboard seems dead), mouse seems to not have power, and monitors are stuck off/idle state. However, I am able to use Magic Sysrq key combinations to recover keyboard input and reboot the system (REISUB works). I cannot obtain any video output regardless of commands after Alt+Sysrq+R, such as Ctrl+Alt+F1. The system also appears to go offline when in this state, as other machines cannot ping/ssh into it.

I'm not sure if this is a video driver type issue, kernel issue, etc. I have done long term memtest to test ram, as well as some intensive CPU/system tests like mprime, to attempt to rule out hardware issues. This hang only happens after some period of idle, and has never happened during active use of the system. Are there logs I can check? Additional tests I can perform? How to go about figuring out what is going on and fixing this issue?

System is ~amd64 with 3.19.3 kernel, and systemd (because using Gnome 3)
Video card is Nvidia GTX 650 with nouveau driver
CPU is Core i7-4790K

Thanks for any ideas and assistance!
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7414
Location: almost Mile High in the USA

PostPosted: Tue Mar 31, 2015 1:54 pm    Post subject: Reply with quote

Try dsabling power save and see if it still does this?
If it oopsed/panicked it should be blinking the Numlock/Capslock LEDs so this is weird.

Is there any information in the journal (journalctl) that could be interesting at the estimated time of the crash?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
mutiny
n00b
n00b


Joined: 06 Aug 2014
Posts: 14

PostPosted: Tue Mar 31, 2015 7:01 pm    Post subject: Reply with quote

I'm not sure I have any power saving features enabled, except for blanking screen after 15 mimutes in Gnome's settings.

How do I view the journal for a particular time?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7414
Location: almost Mile High in the USA

PostPosted: Tue Mar 31, 2015 7:15 pm    Post subject: Reply with quote

You could do something like:
# journalctl --since "2015-03-25 00:00:00" --until "2015-03-25 01:00:00"

See if there is anything interesting, oopses, etc. Then again if it crashed and you can't sync, likely nothing will be recorded their either.

For power saving, try disable screen blanking for one test (actually, it's best to keep it disabled so you could see any problems that show up if they do), and another disable any CPU throttling - let it run at "performance".

Is this repeatable? Every idle period it will fail?

Will it fail if Ethernet is disconnected?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6347

PostPosted: Tue Mar 31, 2015 7:53 pm    Post subject: Reply with quote

Enable pstore and efivars pstore backend in the kernel, then next time it hangs you can reboot, mount /sys/fs/pstore and get the dmesg log from when it crashed.
Back to top
View user's profile Send private message
mutiny
n00b
n00b


Joined: 06 Aug 2014
Posts: 14

PostPosted: Wed Apr 01, 2015 7:09 pm    Post subject: Reply with quote

Thanks for the information.

After last nights hang, I tried something this morning. After Alt+SysRq+R and getting keyboard back, I tried Alt+SysRq+K and after a few minutes the monitors came back on, with what looked like some error lines or something on console. I wasn't able to catch or record what was there... the monitors went blank again and system "hung" again.

Checked journalctl from last night to this morning:

Code:
-- Logs begin at Mon 2015-03-30 17:45:32 HST, end at Wed 2015-04-01 09:04:16 HST. --
Apr 01 00:00:00 renoir gnome-session[964]: (evolution-alarm-notify:1127): evolution-alarm-notify-WARNING **: alarm.c:253: Reques
Apr 01 07:35:23 renoir kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0004326000 [PTE] from GR/GPC0/T1_0 on channel 0
Apr 01 07:35:23 renoir kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 5, recovering...
Apr 01 07:35:23 renoir kernel: nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 5 [0x003f955000 gnome-shell[1055]]
Apr 01 07:35:23 renoir kernel: nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC0/TEX: 0x80000049
Apr 01 07:35:23 renoir kernel: nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC1/TEX: 0x80000049
Apr 01 07:36:14 renoir synergys[969]: Synergy 1.7.0: NOTE: client "nemesis" is dead
Apr 01 07:40:36 renoir dbus[733]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.f
Apr 01 07:40:36 renoir systemd[1]: Starting Network Service...
Apr 01 07:40:36 renoir systemd[1]: Starting Network Manager Script Dispatcher Service...
Apr 01 07:40:36 renoir dbus[733]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Apr 01 07:40:36 renoir systemd[1]: Started Network Manager Script Dispatcher Service.
Apr 01 07:40:36 renoir nm-dispatcher[6900]: Dispatching action 'dhcp4-change' for eno1
Apr 01 07:40:36 renoir dhclient[930]: bound to 192.168.67.12 -- renewal in 35102 seconds.
Apr 01 07:40:36 renoir systemd-timesyncd[728]: Network configuration changed, trying to establish connection.
Apr 01 07:40:36 renoir systemd-networkd[6899]: Enumeration completed
Apr 01 07:40:36 renoir systemd[1]: Started Network Service.
Apr 01 07:40:36 renoir systemd-timesyncd[728]: Network configuration changed, trying to establish connection.


It seems to be fairly repeatable after every long idle session, the system will very likely be hung. I'll try tonight with ethernet disconnected and monitors left on. Nouveau seems to have a problem it seems from the journal entries?

I think I may also try replacing the Nvidia card with an older Radeon card I have laying around and switching to radeon drivers, to see if it is a driver/GPU hardware issue.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum