Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mce Hardware error
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Kosmas
Apprentice
Apprentice


Joined: 14 Sep 2006
Posts: 276
Location: Greece

PostPosted: Fri Jun 02, 2017 6:13 am    Post subject: mce Hardware error Reply with quote

Hello,

I just found out that in my dmesg there is an mce error
Code:
mce: [Hardware Error]: Machine check events logged


After running mcelog, I get the following:
Code:
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880014086 ADDR fef1ce80
TIME 1496394010 Fri Jun  2 12:00:10 2017
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
PPIN ee0000000040110a
CPUID Vendor Intel Family 6 Model 142
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 7880014086 ADDR fef1ce40
TIME 1496394010 Fri Jun  2 12:00:10 2017
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142


And the hardware (lspci) is the following:
Code:
00:00.0 Host bridge: Intel Corporation Device 5904 (rev 02)                                                                                                                                   
00:02.0 VGA compatible controller: Intel Corporation Device 5916 (rev 02)                                                                                                                     
00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 02)                                                                                           
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)                                                                                                   
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Device 9d58 (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Device 9d71 (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] (rev c3)
02:00.0 Network controller: Intel Corporation Wireless 3165 (rev 79)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)


Does anyone have a clue as to what this error might be?
The hardware is a Dell inspiron 15 Series 5000 with an i7 (Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz)

Thank you in advance.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7071

PostPosted: Fri Jun 02, 2017 8:28 am    Post subject: Reply with quote

look at: https://forums.gentoo.org/viewtopic-t-1063672.html
but don't forget to read this

keep in mind parts of computers may have different rma status: check your rma status with dell, and check the rma status of your cpu if dell rma is over.
Back to top
View user's profile Send private message
Kosmas
Apprentice
Apprentice


Joined: 14 Sep 2006
Posts: 276
Location: Greece

PostPosted: Mon Jun 05, 2017 5:55 pm    Post subject: Reply with quote

Thanks so much krinn.
The laptop is still in warranty, but I have to run diagnostics for dell to accept the problem.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7071

PostPosted: Mon Jun 05, 2017 7:33 pm    Post subject: Reply with quote

keep us in touch, i don't have one myself, but other dell users may enjoy seeing how dell have deal your issue.
Back to top
View user's profile Send private message
Kosmas
Apprentice
Apprentice


Joined: 14 Sep 2006
Posts: 276
Location: Greece

PostPosted: Wed Jun 07, 2017 7:40 am    Post subject: Reply with quote

Hi krinn,

Just to update the forum, I got my hands on a second Dell Inspiron (mine is with i7, the other has an i5), and I get the same hardware error.
It seems that it is not an actual error, rather than a problematic interpretation of the processor cache or something similar.
I will try to run the diagnostics on both laptops, and then if I get any kind of error, I will contact Dell.

Thanks again,
Kosmas.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7071

PostPosted: Wed Jun 07, 2017 10:33 am    Post subject: Reply with quote

You should know that cpu is reporting an error, it's a cpu event, it mean you don't check if cpu have an error, it's when the cpu have an error that the cpu trigger the event.

I don't know for amd, but intel report heat errors thru mce, and that's really good : stress cpu can fail when not use in normal conditions (heat, overclocking, unstable or too low/up voltages...) ; because of unusual conditions cpu may fail and trigger the error, it might still not mean cpu is damage, getting it back to normal conditions may fix the issue.

The software part that may bug, is kernel thinking cpu has report an mce but cpu didn't, that would be kinda a huge one, and many people will get it.

Or interpreting the error, that's mcelog and other utility, while it is highly probable mce type of errors may not be correctly interpret, because (without really knowing), amd and intel are used to use different code everywhere, and an mce error code for amd may say something different with an intel. The problem is even if you are unsure about the mce type of error, still an mce did happen.

That's why i recommand user to use a livecd and see what's going on: first, and it's important, you must reboot, meaning resetting cpu and removing whatever bad context have disturb it, second, you are using another kernel version, and it's a kernel all livecd users will use, lowering risk a bug in that kernel won't be seen by others.

If mce still occurs with livecd, it mean your hardware, even after reboot still have the error, giving a sad but good clue the error is persistent ; either because the cpu is still in bad conditions or damage. But if you don't yourself put the cpu in non normal conditions, then an hardware is failing there.

About your two computers: while i could admit myself it's low probability, and would be really bad luck, still it might be that the two computers are indeed failing. I'm a bit surprise you have the same error on both i must say, or do you mean "same, because both report mce"?
Back to top
View user's profile Send private message
cyberhoffman
n00b
n00b


Joined: 30 Apr 2016
Posts: 30

PostPosted: Wed Jun 07, 2017 10:51 pm    Post subject: Reply with quote

krinn wrote:
look at: https://forums.gentoo.org/viewtopic-t-1063672.html


Code:
CONFIG_INTEL_PMC_CORE=y

CONFIG_INTEL_PCH_THERMAL=y


I say it solves the issue.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum