Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CMCI storm detected: switching to poll mode
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Mon Sep 04, 2017 8:09 am    Post subject: CMCI storm detected: switching to poll mode Reply with quote

Code:
Sep  2 18:03:54 shuttle kernel: [708926.214851] CMCI storm detected: switching to poll mode
Sep  2 18:03:54 shuttle kernel: [710011.679941] INFO: rcu_sched self-detected stall on CPU
Sep  2 18:03:54 shuttle kernel: [710011.679948] ^I0-...: (2 GPs behind) idle=cc6/140000000000001/0 softirq=10536764/10536766 fqs=0
Sep  2 18:03:54 shuttle kernel: [710011.679949] ^I (t=1294978 jiffies g=5107773 c=5107772 q=62979)
Sep  2 18:03:54 shuttle kernel: [710011.679952] rcu_sched kthread starved for 1294978 jiffies! g5107773 c5107772 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
Sep  2 18:03:54 shuttle kernel: [710011.679954] rcu_sched       S15088     8      2 0x00000000
Sep  2 18:03:54 shuttle kernel: [710011.679959] Call Trace:
Sep  2 18:03:54 shuttle kernel: [710011.679968]  ? __schedule+0x1ef/0x430
Sep  2 18:03:54 shuttle kernel: [710011.679970]  ? schedule+0x2d/0x80
Sep  2 18:03:54 shuttle kernel: [710011.679971]  ? schedule_timeout+0xf3/0x170
Sep  2 18:03:54 shuttle kernel: [710011.679975]  ? mod_timer+0x180/0x180
Sep  2 18:03:54 shuttle kernel: [710011.679977]  ? rcu_accelerate_cbs+0x36/0x190
Sep  2 18:03:54 shuttle kernel: [710011.679978]  ? rcu_gp_kthread+0x489/0x7b0
Sep  2 18:03:54 shuttle kernel: [710011.679981]  ? prepare_to_swait_event+0x1a/0x40
Sep  2 18:03:54 shuttle kernel: [710011.679982]  ? rcu_gp_kthread+0x489/0x7b0
Sep  2 18:03:54 shuttle kernel: [710011.679984]  ? kthread+0xf2/0x130
Sep  2 18:03:54 shuttle kernel: [710011.679986]  ? synchronize_rcu_expedited+0x10/0x10
Sep  2 18:03:54 shuttle kernel: [710011.679987]  ? kthread_create_on_node+0x40/0x40
Sep  2 18:03:54 shuttle kernel: [710011.679989]  ? ret_from_fork+0x22/0x30
Sep  2 18:03:54 shuttle kernel: [710011.679993] NMI backtrace for cpu 0
Sep  2 18:03:54 shuttle kernel: [710011.679996] CPU: 0 PID: 2491 Comm: cw_process Not tainted 4.12.5-gentoo #2
Sep  2 18:03:54 shuttle kernel: [710011.679997] Hardware name: Shuttle Inc. DX30D/FDX30, BIOS 1.02 02/15/2017
Sep  2 18:03:54 shuttle kernel: [710011.679997] Call Trace:
Sep  2 18:03:54 shuttle kernel: [710011.679998]  <IRQ>
Sep  2 18:03:54 shuttle kernel: [710011.680002]  ? dump_stack+0x46/0x61
Sep  2 18:03:54 shuttle kernel: [710011.680004]  ? nmi_cpu_backtrace+0x8a/0x90
Sep  2 18:03:54 shuttle kernel: [710011.680006]  ? irq_force_complete_move+0xe0/0xe0
Sep  2 18:03:54 shuttle kernel: [710011.680008]  ? nmi_trigger_cpumask_backtrace+0x86/0xc0
Sep  2 18:03:54 shuttle kernel: [710011.680009]  ? rcu_dump_cpu_stacks+0x88/0xc1
Sep  2 18:03:54 shuttle kernel: [710011.680011]  ? rcu_check_callbacks+0x642/0x780
Sep  2 18:03:54 shuttle kernel: [710011.680013]  ? update_wall_time+0x474/0x720
Sep  2 18:03:54 shuttle kernel: [710011.680015]  ? update_process_times+0x23/0x50
Sep  2 18:03:54 shuttle kernel: [710011.680016]  ? tick_sched_timer+0x3d/0x130
Sep  2 18:03:54 shuttle kernel: [710011.680018]  ? __hrtimer_run_queues+0xb5/0x120
Sep  2 18:03:54 shuttle kernel: [710011.680019]  ? hrtimer_interrupt+0x9d/0x1e0
Sep  2 18:03:54 shuttle kernel: [710011.680022]  ? smp_trace_apic_timer_interrupt+0x59/0x90
Sep  2 18:03:54 shuttle kernel: [710011.680024]  ? apic_timer_interrupt+0x7f/0x90
Sep  2 18:03:54 shuttle kernel: [710011.680024]  </IRQ>
Sep  2 18:03:54 shuttle kernel: klogd 1.5.1, ---------- state change ----------
Sep  2 18:03:54 shuttle kernel: Loaded 57659 symbols from 13 modules.
Sep  2 18:03:54 shuttle kernel: [710011.682343] Hangcheck: hangcheck value past margin!
Sep  2 18:09:23 shuttle kernel: [710340.172989] CMCI storm subsided: switching to interrupt mode


Got this error with new Shuttle XPC Slim DX30 computer with Intel Celeron J3355 CPU and Corsair 8GB memory kit (CMSO8GX3M2C1600C11). Is this incompatible or broken memory problem or something else? Clock was several hours wrong and couldn't reboot cleanly next morning.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7267
Location: almost Mile High in the USA

PostPosted: Mon Sep 04, 2017 4:55 pm    Post subject: Reply with quote

It's possible it's bad memory, also possible bad CPU. CMCI is usually a hardware problem, and likely you may have to RMA the machine... You may want to try other memory configurations, or perhaps muck with overclocking options to see if it will go away.

There's also a possibility of bad firmware that needs to be addressed. See if there's a firmware update.

Kernel is still a possibility but rare if it works on other machines.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Thu Sep 07, 2017 4:44 pm    Post subject: Reply with quote

BIOS is latest. Some OS selection is set to Windows in UEFI/BIOS because it also controls UEFI vs. legacy BIOS switching.

I updated system and kernel to 4.13.0 and switched clocksource to hpet, no issues since. Might be too early to tell, but let's hope it was 4.12.5 kernel or other software problem.

Edit: Disabling intel_idle from kernel seems to be workaround for this problem. Need to test different intel_idle.max_cstate levels...

Edit2: Different machine with Celeron J3455 and Void Linux, CMCI storm does not happen with "processor.max_cstate=1 intel_idle.max_cstate=0" kernel boot options.

CMCI storms usually happen when copying data from local SSD to NAS at >100MB/s (full gigabit network load). Might not be faulty hardware because same issue is in 2 different boxes.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum