Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Comm: swapper/1 Not tainted
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mcbarlo
Tux's lil' helper
Tux's lil' helper


Joined: 18 Jul 2005
Posts: 130

PostPosted: Sat May 10, 2014 6:21 pm    Post subject: [SOLVED] Comm: swapper/1 Not tainted Reply with quote

Several times per day I notice this in logs:

Code:
May  9 09:46:30 bgp kernel: NMI backtrace for cpu 1
May  9 09:46:30 bgp kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.12.13-gentoo #5
May  9 09:46:30 bgp kernel: Hardware name: IBM IBM System X3250 M4 -[2583E1G]-/00D3729, BIOS -[JQE142CUS-1.01]- 05/14/2012
May  9 09:46:30 bgp kernel: task: ffff88007eb73d50 ti: ffff88007eb9e000 task.ti: ffff88007eb9e000
May  9 09:46:30 bgp kernel: RIP: 0010:[<ffffffff812ab127>]  [<ffffffff812ab127>] intel_idle+0xc7/0x130
May  9 09:46:30 bgp kernel: RSP: 0018:ffff88007eb9fdf8  EFLAGS: 00000046
May  9 09:46:30 bgp kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
May  9 09:46:30 bgp kernel: RDX: 0000000000000000 RSI: ffff88007eb9ffd8 RDI: 0000000000000001
May  9 09:46:30 bgp kernel: RBP: ffff88007eb9fe28 R08: 0000000000001fd5 R09: 0000000000000018
May  9 09:46:30 bgp kernel: R10: 000000000000351f R11: 0000000000008d59 R12: 0000000000000004
May  9 09:46:30 bgp kernel: R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff817ac678
May  9 09:46:30 bgp kernel: FS:  0000000000000000(0000) GS:ffff88007ee80000(0000) knlGS:0000000000000000
May  9 09:46:30 bgp kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  9 09:46:30 bgp kernel: CR2: 00007fe831f07030 CR3: 000000000176d000 CR4: 00000000000407e0
May  9 09:46:30 bgp kernel: Stack:
May  9 09:46:30 bgp kernel: ffff88007eb9fe28 000000018107cedd ffff88007ee98300 ffffffff817ac500
May  9 09:46:30 bgp kernel: 00002a71dc2520e1 0000000000000004 ffff88007eb9fe88 ffffffff8142991a
May  9 09:46:30 bgp kernel: 000000000000001f 0000000002333743 000000000000001f 0000000002333743
May  9 09:46:30 bgp kernel: Call Trace:
May  9 09:46:30 bgp kernel: [<ffffffff8142991a>] cpuidle_enter_state+0x4a/0xd0
May  9 09:46:30 bgp kernel: [<ffffffff81429a3e>] cpuidle_idle_call+0x9e/0x150
May  9 09:46:30 bgp kernel: [<ffffffff8100a409>] arch_cpu_idle+0x9/0x20
May  9 09:46:30 bgp kernel: [<ffffffff81076551>] cpu_startup_entry+0x91/0x170
May  9 09:46:30 bgp kernel: [<ffffffff81028afa>] start_secondary+0x19a/0x1f0
May  9 09:46:30 bgp kernel: Code: 48 8b 34 25 b0 b7 00 00 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <85> 1d 6b 17 50 00 75 0e 48 8d 75 dc bf 05 00 00 00 e8 03 80 dd
May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired
May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired
May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired
May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired
May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired
May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired


At this moment load is very high. CPU, disk etc. do nothing. The worst is BGP sessions are disconnecting. Can it be hardware problem?


Last edited by mcbarlo on Wed May 14, 2014 10:07 am; edited 1 time in total
Back to top
View user's profile Send private message
blu3bird
Retired Dev
Retired Dev


Joined: 04 Oct 2003
Posts: 612
Location: Munich, Germany

PostPosted: Sat May 10, 2014 7:48 pm    Post subject: Re: Comm: swapper/1 Not tainted Reply with quote

mcbarlo wrote:
Code:
May  9 09:46:30 bgp kernel: NMI backtrace for cpu 1

Do you know what's causing the http://en.wikipedia.org/wiki/Non-maskable_interrupt? Does your server have some sort of IML?
_________________
Black Holes are created when God divides by zero!
Back to top
View user's profile Send private message
mcbarlo
Tux's lil' helper
Tux's lil' helper


Joined: 18 Jul 2005
Posts: 130

PostPosted: Sun May 11, 2014 9:25 am    Post subject: Reply with quote

Unfortunatelly I don't know.
Back to top
View user's profile Send private message
aCOSwt
Bodhisattva
Bodhisattva


Joined: 19 Oct 2007
Posts: 2537
Location: Hilbert space

PostPosted: Sun May 11, 2014 9:39 am    Post subject: Reply with quote

hmmm... the hardware watchdog timer I presume.

I'm sure you know this one : http://publib.boulder.ibm.com/infocenter/systemx/documentation/index.jsp?topic=/com.ibm.sysx.2583.doc/c_using_imm.html

could you post the result of
Code:
cat /usr/src/linux/.config | grep NMI
?

And BTW, check your BIOS version. Several fixes regarding NMI : http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5086587
_________________
Back to top
View user's profile Send private message
mcbarlo
Tux's lil' helper
Tux's lil' helper


Joined: 18 Jul 2005
Posts: 130

PostPosted: Sun May 11, 2014 10:16 am    Post subject: Reply with quote

Code:
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y


IMM is not configured (default settings).

I will check BIOS version when I have physical access to server. One of my x3250 have updated BIOS but I'm not sure with one.

What should I do? Turn off watchdog in IMM?
Back to top
View user's profile Send private message
aCOSwt
Bodhisattva
Bodhisattva


Joined: 19 Oct 2007
Posts: 2537
Location: Hilbert space

PostPosted: Sun May 11, 2014 11:34 am    Post subject: Reply with quote

mcbarlo wrote:
What should I do? Turn off watchdog in IMM?

8O :?
Hmmm... only you can answer this one I presume.
Of course, as long as *you* decide *you* do not care detecting hangs, *you can* disable the watchdog.

http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaai.crashdump%2Fliaaicrashdumpnmiwatch.htm

BTW, That can't be part of the help I provide here to tell that you should or should not care detecting hangs.
_________________
Back to top
View user's profile Send private message
mcbarlo
Tux's lil' helper
Tux's lil' helper


Joined: 18 Jul 2005
Posts: 130

PostPosted: Sun May 11, 2014 2:37 pm    Post subject: Reply with quote

Ok, I understand. I will move disks to another server. This should tell me it is a hardware or software problem. Thank you for your reply.
Back to top
View user's profile Send private message
mcbarlo
Tux's lil' helper
Tux's lil' helper


Joined: 18 Jul 2005
Posts: 130

PostPosted: Wed May 14, 2014 10:06 am    Post subject: Reply with quote

I solved problem I think. Turn off C1E state in BIOS. Router is working about 30h without any problems.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum