Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
low memory corruption, fatal?? -- IT'S BACK!!
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Mon Jan 29, 2018 6:11 am    Post subject: low memory corruption, fatal?? -- IT'S BACK!! Reply with quote

I have a ASUS X79-Deluxe mobo with a Intel i7-4820K CPU @ 3.70GHz with 32GB of memory

I just happened to look at 'dmesg' to check on something unrelated and
noticed that this was just reported (20180128 @2130):

Code:
[84418.213227] Corrupted low memory at ffff916a4000a000 (a000 phys) = 000897f6
[84418.213235] Memory corruption detected in low memory
[84418.213242] ------------[ cut here ]------------
[84418.213252] WARNING: CPU: 1 PID: 18530 at arch/x86/kernel/check.c:141 check_for_bios_corruption+0xa5/0xf0
[84418.213253] Modules linked in: ntfs fuse nct6775 hwmon_vid nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) b43 drm_kms_helper x86_pkg_temp_thermal ssb coretemp syscopyarea sysfillrect mxm_wmi kvm_intel md4 sysimgblt kvm fb_sys_fops snd_usb_audio irqbypass drm snd_usbmidi_lib snd_rawmidi pcspkr bcma vgastate fb_ddc wmi
[84418.213285] CPU: 1 PID: 18530 Comm: kworker/1:1 Tainted: P           O    4.12.12-gentoo #6
[84418.213287] Hardware name: System manufacturer System Product Name/X79-DELUXE, BIOS 0902 08/19/2014
[84418.213291] Workqueue: events check_corruption
[84418.213293] task: ffff9171b9224240 task.stack: ffffac9a0b1c0000
[84418.213297] RIP: 0010:check_for_bios_corruption+0xa5/0xf0
[84418.213299] RSP: 0018:ffffac9a0b1c3e18 EFLAGS: 00010296
[84418.213301] RAX: 0000000000000028 RBX: ffff916a40010000 RCX: 0000000000000006
[84418.213303] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff91727fc4cba0
[84418.213304] RBP: ffffac9a0b1c3e48 R08: 0000000000000001 R09: 0000000000006510
[84418.213305] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[84418.213307] R13: ffffffffb92988b0 R14: 0000000000000001 R15: 0000000080000000
[84418.213309] FS:  0000000000000000(0000) GS:ffff91727fc40000(0000) knlGS:0000000000000000
[84418.213311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[84418.213312] CR2: 00007f3d2d62cc88 CR3: 000000074e74d000 CR4: 00000000001406e0
[84418.213313] Call Trace:
[84418.213319]  check_corruption+0x9/0x40
[84418.213324]  process_one_work+0x1c7/0x400
[84418.213327]  worker_thread+0x43/0x3e0
[84418.213330]  kthread+0x104/0x140
[84418.213334]  ? trace_event_raw_event_workqueue_work+0x80/0x80
[84418.213336]  ? kthread_create_on_node+0x40/0x40
[84418.213341]  ret_from_fork+0x22/0x30
[84418.213342] Code: 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 80 3d d4 c0 ed 00 00 75 e7 48 c7 c7 00 23 da b8 c6 05 c4 c0 ed 00 01 e8 a1 48 0e 00 <0f> ff eb d0 4c 89 c0 48 89 da 48 2b 05 42 2a df 00 4c 01 fa 48
[84418.213385] ---[ end trace 4cac7e7fd42f7caa ]---
[97197.530320] Corrupted low memory at ffff916a4000a000 (a000 phys) = 00096cde

(I checked 'dmesg' again just before posting this and there is still only this one indication of error.)

line 4 mentions 'check_for_bios_corruption+0xa5/0xf0'

is the memory in question part of normal ram (32GB) (4x8GB)?
or is it some sort of BIOS specific memory?

if it is part of regular ram memory, can you tell which stick it is?

I assume that memtest would find it, no?

If it's not ram and it *is* some BIOS specific memory problem, would that be fixable or
would this indicate a fatal error such that the mobo needs to be replaced?

In the last five days, I've done several kernel re-configs and two update @worlds,
but had no problems rebooting, nor indication of problems that I have noticed when rebooting.

atm, the system seems to be running normally.

thank you for any enlightenment you may offer.

.

[Moderator edit: added [code] tags to preserve output layout. -Hu]


Last edited by mjbjr on Sun Feb 04, 2018 12:43 am; edited 2 times in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7163
Location: almost Mile High in the USA

PostPosted: Mon Jan 29, 2018 6:43 am    Post subject: Reply with quote

While it's possible bad memory could cause this, it's unlikely. Usually it's BIOS itself that's doing the corruption. There's an option in the kernel that you should try increasing:

CONFIG_X86_RESERVE_LOW

to something larger. 64KiB should have been enough unless you changed it to something else?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Mon Jan 29, 2018 7:50 am    Post subject: Reply with quote

eccerr0r wrote:
While it's possible bad memory could cause this, it's unlikely. Usually it's BIOS itself that's doing the corruption. There's an option in the kernel that you should try increasing:

CONFIG_X86_RESERVE_LOW

to something larger. 64KiB should have been enough unless you changed it to something else?


Thank you for your response.

CONFIG_X86_RESERVE_LOW was set to 64 (kb), I upped it to 128 (kb).

On reboot, dmesg showed no problems, but I will be keeping an eye on it.

Thank you for your help.

.
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Tue Jan 30, 2018 2:35 am    Post subject: and it happens again Reply with quote

20180129 1830 - at some point in the last couple of hours 'dmesg' shows the corruption continues:

Code:
[66787.806079] Corrupted low memory at ffff9b8fc000c000 (c000 phys) = 00131242
[66787.806084] Memory corruption detected in low memory
[66787.806090] ------------[ cut here ]------------
[66787.806097] WARNING: CPU: 1 PID: 17349 at arch/x86/kernel/check.c:141 check_for_bios_corruption+0xa5/0xf0
[66787.806098] Modules linked in: ntfs fuse nct6775 hwmon_vid nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) x86_pkg_temp_thermal drm_kms_helper coretemp syscopyarea b43 kvm_intel sysfillrect kvm sysimgblt fb_sys_fops irqbypass ssb drm md4 mxm_wmi pcspkr snd_usb_audio snd_usbmidi_lib snd_rawmidi bcma vgastate fb_ddc wmi
[66787.806139] CPU: 1 PID: 17349 Comm: kworker/1:1 Tainted: P           O    4.12.12-gentoo #6
[66787.806140] Hardware name: System manufacturer System Product Name/X79-DELUXE, BIOS 0902 08/19/2014
[66787.806144] Workqueue: events check_corruption
[66787.806146] task: ffff9b9775e9dcc0 task.stack: ffffb6be45b90000
[66787.806150] RIP: 0010:check_for_bios_corruption+0xa5/0xf0
[66787.806152] RSP: 0018:ffffb6be45b93e18 EFLAGS: 00010296
[66787.806154] RAX: 0000000000000028 RBX: ffff9b8fc0010000 RCX: 0000000000000006
[66787.806155] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9b97ffc4cba0
[66787.806156] RBP: ffffb6be45b93e48 R08: 0000000000000001 R09: 0000000000000450
[66787.806158] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[66787.806159] R13: ffffffffaa6988b0 R14: 0000000000000001 R15: 0000000080000000
[66787.806161] FS:  0000000000000000(0000) GS:ffff9b97ffc40000(0000) knlGS:0000000000000000
[66787.806162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66787.806164] CR2: 0000304ca318c000 CR3: 00000008181e6000 CR4: 00000000001406e0
[66787.806165] Call Trace:
[66787.806170]  check_corruption+0x9/0x40
[66787.806175]  process_one_work+0x1c7/0x400
[66787.806178]  worker_thread+0x43/0x3e0
[66787.806180]  kthread+0x104/0x140
[66787.806184]  ? trace_event_raw_event_workqueue_work+0x80/0x80
[66787.806186]  ? kthread_create_on_node+0x40/0x40
[66787.806190]  ret_from_fork+0x22/0x30
[66787.806192] Code: 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 80 3d d4 c0 ed 00 00 75 e7 48 c7 c7 00 23 1a aa c6 05 c4 c0 ed 00 01 e8 a1 48 0e 00 <0f> ff eb d0 4c 89 c0 48 89 da 48 2b 05 42 2a df 00 4c 01 fa 48
[66787.806231] ---[ end trace 10db5b70058c99d9 ]---


[Moderator edit: added [code] tags to preserve output layout. -Hu]
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7163
Location: almost Mile High in the USA

PostPosted: Tue Jan 30, 2018 5:01 pm    Post subject: Reply with quote

That seems very strange. It looks like that it should be enough, but maybe not.

Try reserving the whole first 1MB?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Wed Jan 31, 2018 3:32 am    Post subject: Reply with quote

eccerr0r wrote:
That seems very strange. It looks like that it should be enough, but maybe not.

Try reserving the whole first 1MB?


eccerr0r, thank you for responding...

Well, I rebooted about 22 hrs ago, for other purposes, and the corruption hasn't shown
its ugly head... yet.

I've been running linux since 1995, gentoo for 8-10 years, and I've never seen this sort
of problem before (not that I was looking), but I'm thinking that
64kB for CONFIG_X86_RESERVE_LOW has "always" been good enough in the past
and most likely 128kB should "certainly" cover any issues.

There is a BIOS upgrade for the mobo that I have downloaded, but haven't installed yet.

I will update this thread when there has been a change in the situation.

Thanks

.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7163
Location: almost Mile High in the USA

PostPosted: Wed Jan 31, 2018 6:04 am    Post subject: Reply with quote

Yeah this is dependent on BIOS firmware. I suspect that despite using Linux for all this time, it must be many different machines.

I haven't run into one of these machines yet, at least one that requires more than 64K reserved (I do recall seeing this before but not recently.)
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Sun Feb 04, 2018 1:02 am    Post subject: Reply with quote

Well, after a couple of days without this BIOS memory problem raising its head,
plus a day and one half after updating the BIOS without the memory problem showing up,

it's back... dmesg reports:

Code:
[68815.246904] Corrupted low memory at ffff8cb0c0001000 (1000 phys) = 00052e84
[68815.246922] Memory corruption detected in low memory
[68815.246928] ------------[ cut here ]------------
[68815.246938] WARNING: CPU: 1 PID: 67 at arch/x86/kernel/check.c:141 check_for_bios_corruption+0xa5/0xf0
[68815.246938] Modules linked in: ntfs fuse nct6775 hwmon_vid nvidia_drm(PO) b43 nvidia_modeset(PO) nvidia(PO) ssb x86_pkg_temp_thermal mxm_wmi snd_usb_audio coretemp snd_usbmidi_lib drm_kms_helper kvm_intel snd_rawmidi md4 syscopyarea kvm sysfillrect sysimgblt irqbypass fb_sys_fops pcspkr drm bcma wmi
[68815.246968] CPU: 1 PID: 67 Comm: kworker/1:1 Tainted: P           O    4.12.12-gentoo #10
[68815.246970] Hardware name: System manufacturer System Product Name/X79-DELUXE, BIOS 4805 02/02/2016
[68815.246974] Workqueue: events check_corruption
[68815.246976] task: ffff8cb8dba83500 task.stack: ffffaf7ec3464000
[68815.246980] RIP: 0010:check_for_bios_corruption+0xa5/0xf0
[68815.246982] RSP: 0018:ffffaf7ec3467e18 EFLAGS: 00010296
[68815.246984] RAX: 0000000000000028 RBX: ffff8cb0c0010000 RCX: 0000000000000006
[68815.246986] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8cb8ffc4cba0
[68815.246987] RBP: ffffaf7ec3467e48 R08: 0000000000000001 R09: 0000000000000471
[68815.246988] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[68815.246990] R13: ffffffffbc6978b0 R14: 0000000000000001 R15: 0000000080000000
[68815.246992] FS:  0000000000000000(0000) GS:ffff8cb8ffc40000(0000) knlGS:0000000000000000
[68815.246993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68815.246995] CR2: 00001b422053b000 CR3: 00000007a5e6f000 CR4: 00000000001406e0
[68815.246996] Call Trace:
[68815.247002]  check_corruption+0x9/0x40
[68815.247007]  process_one_work+0x1c7/0x400
[68815.247010]  worker_thread+0x43/0x3e0
[68815.247013]  kthread+0x104/0x140
[68815.247017]  ? trace_event_raw_event_workqueue_work+0x80/0x80
[68815.247019]  ? kthread_create_on_node+0x40/0x40
[68815.247023]  ret_from_fork+0x22/0x30
[68815.247025] Code: 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 80 3d 94 b3 ed 00 00 75 e7 48 c7 c7 e8 2c 1a bc c6 05 84 b3 ed 00 01 e8 a1 48 0e 00 <0f> ff eb d0 4c 89 c0 48 89 da 48 2b 05 42 2a df 00 4c 01 fa 48
[68815.247067] ---[ end trace 7ea4ac97c8de8c67 ]---

[147644.434395] Corrupted low memory at ffff8cb0c0001000 (1000 phys) = 00092add


So, it would seem I have a memory problem...

is the corrupted memory a part of normal ram? and all I have to do is replace a stick of memory?

if the corrupted memory is *not* a part of normal ram, but is different memory just for the BIOS,
is it replaceable?

if it's none of the above then the only solution is to replace the motherboard?

the computer continues to run now, as before, without any obvious problems.

thank you for your help

.

[Moderator edit: added [code] tags to preserve output layout. -Hu]
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13987

PostPosted: Sun Feb 04, 2018 1:48 am    Post subject: Reply with quote

I believe this is normal RAM. A replacement should be sufficient, if it is a RAM problem, but nothing in this thread definitively states that it is a defect in the RAM. Did you ever run memtest after this problem appeared? If so, how long did you let it run? Did it report any problems, anywhere at all?

Can you reproduce this problem with an untainted kernel? This is an unusual failure mode for buggy out-of-tree drivers, but it's not impossible.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7163
Location: almost Mile High in the USA

PostPosted: Sun Feb 04, 2018 2:27 am    Post subject: Reply with quote

Also try adding to your kernel boot command line:

memmap=64K$0 memory_corruption_check=0

Seems that after reserving the low memory, it will still check it, this will prevent it from checking I think the error is now benign if you tell the kernel to not use this memory, so you should be safe in either case.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5855

PostPosted: Sun Feb 04, 2018 3:42 am    Post subject: Reply with quote

You could try swapping RAM sticks between slots (if that's an option) and see if it still happens. Would help to at least rule them out as the cause.
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Mon Feb 05, 2018 5:17 am    Post subject: Memtest Reply with quote

Well, I downloaded the latest memtest and ran it overnight and it did not find any problems.
I ran it a second time choosing a slightly different option and again it did not find any problems.

I'll be messing with the ram sticks next.
Back to top
View user's profile Send private message
roarinelk
Guru
Guru


Joined: 04 Mar 2004
Posts: 501

PostPosted: Mon Feb 05, 2018 7:56 am    Post subject: Reply with quote

If an extended memtest doesn't reveal any problems with the RAM itself, the I don't think you should worry.
This corruption check and the option in the kernel config to ignore a certain amount of ram starting at zero
is there precisely because some bioses use that area for themselves (in SMM mode for instance) although they
should not and/or they forget to mark that area as "reserved" in the E820 memory map.
If the kernel placed anything vital there, you'd get random behavior or random panics.

In your particular case, I'd set CONFIG_X86_RESERVE_LOW to 64 (I think that's the default anyway), and then disable
that check (set CONFIG_X86_CHECK_BIOS_CORRUPTION=n) and be done with it.
Back to top
View user's profile Send private message
mjbjr
Apprentice
Apprentice


Joined: 02 Mar 2003
Posts: 233

PostPosted: Mon Feb 05, 2018 9:33 pm    Post subject: Reply with quote

roarinelk wrote:
If an extended memtest doesn't reveal any problems with the RAM itself, the I don't think you should worry.
This corruption check and the option in the kernel config to ignore a certain amount of ram starting at zero
is there precisely because some bioses use that area for themselves (in SMM mode for instance) although they
should not and/or they forget to mark that area as "reserved" in the E820 memory map.
If the kernel placed anything vital there, you'd get random behavior or random panics.

In your particular case, I'd set CONFIG_X86_RESERVE_LOW to 64 (I think that's the default anyway), and then disable
that check (set CONFIG_X86_CHECK_BIOS_CORRUPTION=n) and be done with it.



Thank you for replying.

You make good points for ignoring the issue, though I am not quite there yet.
.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum