Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
System Lockups / Multiple Computers / High IOWait
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mud99
n00b
n00b


Joined: 03 Nov 2006
Posts: 2

PostPosted: Wed Mar 26, 2014 3:42 pm    Post subject: System Lockups / Multiple Computers / High IOWait Reply with quote

Hey guys,
5 servers which I admin have been having continual crashes. We have a memory and CPU hungry application running, and after a day or so of running the machine slowly becomes unresponsive and processes start to lock up.

This has been happening on all kernels from at least 3.7.x to the latest 3.13.6

I've tracked the issue down to a deadlock related to IOWait happening on one CPU causing anything scheduled to run on that CPU to hang. Here is the bizarre output of IOStat, note the Zero IO but 12.5% IOWait (which means 1 of 8 cores is 100% hung)

Code:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   12.50    0.00   87.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


Here is the CPU usage from top, note the hung cpu:
Code:

Tasks: 176 total,   1 running, 174 sleeping,   0 stopped,   1 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st


Right now I managed to intercept the machine in this state, and set the processor affinity for bash to cpu0 which is allowing me access to the machine, but i'm not sure how to diagnose it further? Any idea how to core dump the kernel, or find out what line of code it is running? Usually the machine goes into a spiral of death so fast I can't connect to it, this is the first time i've had it partially crashed with a console still working. I don't have any debugging stuff compiled in.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum