Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
What would cause kernel taints
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2540
Location: Near Toronto

PostPosted: Sun Sep 07, 2014 12:56 pm    Post subject: What would cause kernel taints Reply with quote

doing an emerge this morning, failed with these log messages (see below):

Is this failing hardware? Memory? Motherboard?

EDIT: system would not reboot save via sys-rq-b

It's a bit worrying :(

Quote:
Sep 7 08:40:31 olorin kernel: CPU: 1 PID: 18927 Comm: mv Tainted: P D W O 3.16.1-gentoo #1
Sep 7 08:40:31 olorin kernel: Hardware name: System manufacturer System Product Name/P8P67 REV 3.1, BIOS 3602 11/01/2012
Sep 7 08:40:31 olorin kernel: task: ffff88009f912be0 ti: ffff880109980000 task.ti: ffff880109980000
Sep 7 08:40:31 olorin kernel: RIP: 0010:[<ffffffff810eb25d>] [<ffffffff810eb25d>] __kmalloc_track_caller+0x7d/0x140
Sep 7 08:40:31 olorin kernel: RSP: 0018:ffff880109983ce8 EFLAGS: 00010282
Sep 7 08:40:31 olorin kernel: RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000016531
Sep 7 08:40:31 olorin kernel: RDX: 0000000000016529 RSI: 0000000000000000 RDI: 0000000000000000
Sep 7 08:40:31 olorin kernel: RBP: ffff88040e003e00 R08: 0000000000014240 R09: 0000000000000000
Sep 7 08:40:31 olorin kernel: R10: 0000000000000000 R11: ffff8800cb1339c0 R12: ffff0032385f7065
Sep 7 08:40:31 olorin kernel: R13: ffffffff810fcfb0 R14: 00000000000000d0 R15: 0000000000000000
Sep 7 08:40:31 olorin kernel: FS: 00007f4e68761700(0000) GS:ffff88041ec40000(0000) knlGS:0000000000000000
Sep 7 08:40:31 olorin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 7 08:40:31 olorin kernel: CR2: 00000034508691f0 CR3: 000000012ed81000 CR4: 00000000000407e0
Sep 7 08:40:31 olorin kernel: Stack:
Sep 7 08:40:31 olorin kernel: 00000000000000d0 ffff880326bed9f8 0000000000000005 ffff88008a422d58
Sep 7 08:40:31 olorin kernel: 0000000000000000 ffffffff810c2984 ffff880326bed9c0 ffff8800be2916d8
Sep 7 08:40:31 olorin kernel: ffff8800cb1339c0 ffffffff810fcfb0 ffff88019ed98540 ffff880000000001
Sep 7 08:40:31 olorin kernel: Call Trace:
Sep 7 08:40:31 olorin kernel: [<ffffffff810c2984>] ? kstrdup+0x34/0x70
Sep 7 08:40:31 olorin kernel: [<ffffffff810fcfb0>] ? vfs_rename+0x140/0x700
Sep 7 08:40:31 olorin kernel: [<ffffffff810ff914>] ? SyS_renameat2+0x3d4/0x490
Sep 7 08:40:31 olorin kernel: [<ffffffff810d24e1>] ? vma_rb_erase+0x121/0x230
Sep 7 08:40:31 olorin kernel: [<ffffffff81416d12>] ? system_call_fastpath+0x16/0x1b
Sep 7 08:40:31 olorin kernel: Code: 48 8b 50 08 65 ff 0c 25 60 b8 00 00 74 68 4c 8b 20 48 8b 40 10 4d 85 e4 74 63 48 85 c0 74 5e 48 63 45 20 48 8d 4a 08 4c 8b 45 00 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 ad 48 63
Sep 7 08:40:31 olorin kernel: RIP [<ffffffff810eb25d>] __kmalloc_track_caller+0x7d/0x140
Sep 7 08:40:31 olorin kernel: RSP <ffff880109983ce8>
Sep 7 08:40:31 olorin kernel: ---[ end trace 817a1731096ce035 ]---

_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7070
Location: almost Mile High in the USA

PostPosted: Sun Sep 07, 2014 1:22 pm    Post subject: Reply with quote

Most common cause of tainted dumps: P: Proprietary kernel modules inserted (Nvidia, ati-drivers, compaq raid array, etc.)

However you have other, more serious problems: D means the kernel panicked earlier and tried to continue onwards. So there was another oops earlier than this. W means there was an earlier warning. And O means you built an out of kernel module you insmodded into the kernel. Kernel debuggers don't like P and O flags as they don't know what they may be dealing with, and D / W flags could mean secondary corruption.

You'll need to find the first oops and debug that first. Debugging second corruptions tend to be fruitless as they may have been caused by the first problem.

Judging by this oops, you need a reboot badly, your kernel is in very bad shape right now.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
N8Fear
Tux's lil' helper
Tux's lil' helper


Joined: 15 Apr 2013
Posts: 140
Location: Berlin (Germany)

PostPosted: Sun Sep 07, 2014 1:24 pm    Post subject: Reply with quote

Tainted in conjunction with the kernel means that there are non-GPL modules loaded like i.e. nvidia or zfs or virtual box drivers.
This (likely) hasn't got to do anything with the call trace you get.
Can you reproduce this issue (by e.g. loading a certain module or by running a certain program)?
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2540
Location: Near Toronto

PostPosted: Sun Sep 07, 2014 1:37 pm    Post subject: Reply with quote

thanks for the replies

If I look throught /var/log/messages-2014* I see these from
yesterday:

Quote:
messages-20140907:Sep 6 16:56:23 olorin kernel: CPU: 1 PID: 1009 Comm: khubd Tainted: P W O 3.16.1-gentoo #1
messages-20140907:Sep 6 16:56:54 olorin kernel: CPU: 1 PID: 4738 Comm: upowerd Tainted: P D W O 3.16.1-gentoo #1
messages-20140907:Sep 6 17:01:02 olorin kernel: CPU: 1 PID: 16877 Comm: offlineimap Tainted: P D W O 3.16.1-gentoo #1
messages-20140907:Sep 6 17:20:34 olorin kernel: CPU: 1 PID: 17339 Comm: firefox Tainted: P D W O 3.16.1-gentoo #1
messages-20140907:Sep 6 17:39:49 olorin kernel: CPU: 1 PID: 18570 Comm: firefox Tainted: P D W O 3.16.1-gentoo #1


It looks like the khubd error was the first ...

This machine has been suffering random lockups for the last two months but they seemed to be the
result of a failing hard drive which was replaced recently. Maybe that is not the only or most basic
problem with this machine ...
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13607

PostPosted: Sun Sep 07, 2014 4:11 pm    Post subject: Reply with quote

N8Fear is inaccurate. Refer to eccerr0r's post instead, since there are ways to get a tainted kernel even without the ability to load modules.

OP: what kernel modules do you load on this system? We should start with accounting for why you have P+O, then deal with the warnings if those are independent of using the out-of-tree modules.
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2540
Location: Near Toronto

PostPosted: Sun Sep 07, 2014 5:14 pm    Post subject: Reply with quote

I think the only proprietary module is nvidia

Here's the full list:

Quote:
xt_REDIRECT 1686 1
xt_statistic 1167 0
xt_CT 3250 2
xt_LOG 7468 9
xt_connlimit 4283 0
xt_realm 879 0
xt_comment 851 25
xt_recent 8080 1
xt_nat 1761 0
ipt_ULOG 4406 0
ipt_REJECT 2201 4
ipt_MASQUERADE 1658 1
ipt_ECN 1712 0
ipt_CLUSTERIP 6111 0
ipt_ah 1037 0
nf_conntrack_proto_sctp 6238 0
nf_conntrack_netbios_ns 1029 2
nf_conntrack_broadcast 1173 1 nf_conntrack_netbios_ns
xt_TPROXY 2631 0
xt_time 2003 0
xt_TCPMSS 2395 0
xt_tcpmss 1361 0
xt_sctp 2103 0
xt_policy 2370 0
xt_pkttype 979 0
xt_owner 1107 0
xt_NFQUEUE 2318 0
xt_NFLOG 1014 0
nfnetlink_log 7415 1 xt_NFLOG
xt_multiport 1614 16
xt_mark 1069 1
xt_mac 939 0
xt_limit 1737 0
xt_length 1140 0
xt_iprange 1464 0
xt_helper 1251 0
xt_hashlimit 6671 0
xt_dscp 1587 0
xt_dccp 2115 0
xt_conntrack 2993 17
xt_connmark 1701 0
xt_CLASSIFY 1013 0
xt_tcpudp 2295 53
xt_state 1151 0
iptable_raw 1063 1
iptable_nat 2465 1
nf_nat_ipv4 3392 1 iptable_nat
nf_nat 10525 5 ipt_MASQUERADE,nf_nat_ipv4,xt_nat,xt_REDIRECT,iptable_nat
nf_conntrack_ipv4 5978 20
nf_defrag_ipv4 1259 2 xt_TPROXY,nf_conntrack_ipv4
nf_conntrack 55718 15 xt_CT
iptable_mangle 1296 1
nfnetlink 4562 1 nfnetlink_log
iptable_filter 1136 1
ip_tables 15676 4
x_tables 15508 44
w83627ehf 32042 0
hwmon_vid 3084 1 w83627ehf
af_packet 28230 2
usbhid 18960 0
snd_hda_codec_realtek 50803 1
snd_hda_codec_generic 48738 1 snd_hda_codec_realtek
usblp 10914 0
nvidia 10480412 40
snd_hda_intel 15453 6
e1000e 159187 0
snd_hda_controller 15990 1 snd_hda_intel
snd_hda_codec 78145 4 snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_intel,snd_hda_controller
x86_pkg_temp_thermal 2976 0
ptp 10566 1 e1000e
r8169 56259 0
coretemp 4590 0
pps_core 6497 1 ptp
xhci_hcd 98730 0
snd_pcm 71874 3 snd_hda_codec,snd_hda_intel,snd_hda_controller
ehci_pci 3328 0
mii 3819 1 r8169
ehci_hcd 41659 1 ehci_pci
usbcore 160239 5 usblp,ehci_hcd,ehci_pci,usbhid,xhci_hcd
snd_timer 17926 1 snd_pcm
usb_common 1504 1 usbcore
thermal 7950 0
fan 2016 0
acpi_cpufreq 6242 0
processor 21544 1 acpi_cpufreq
unix 27303 1151

_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13607

PostPosted: Sun Sep 07, 2014 8:13 pm    Post subject: Reply with quote

If you blacklist the nVidia module and reboot, can you reproduce any of the failures?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7070
Location: almost Mile High in the USA

PostPosted: Mon Sep 08, 2014 8:41 pm    Post subject: Reply with quote

While I highly doubt nvidia is causing the problem but as stated above, yes, it would make it much better to take this variable out of the equation hence removing it is a good idea to test. The reason being, if nvidia-driver had a function call "wipe_out_random_memory_location(x)" and due to closed source we don't see it, this truly is the problem and not whatever the oops indicates.

As stated earlier a WARNING could cause taint. Do you see WARNING (in all caps) show up in your kernel logfiles?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2540
Location: Near Toronto

PostPosted: Mon Sep 08, 2014 9:35 pm    Post subject: Reply with quote

No errors have occurred in the last 30 hours or so.

I have a suspicion that the last error was caused when I plugged
in a bad usb drive (hence the khubd error). It was the same drive
that started the problem in the first place, which I had put in a
usb case to see if I could recover anything. The drive was unreadable ...

Since then the system has been running perfectly.

If the problem recurs, I'll try with the nvidia module blacklisted.
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7070
Location: almost Mile High in the USA

PostPosted: Mon Sep 08, 2014 10:46 pm    Post subject: Reply with quote

Something really bad must have happened to get the "D" taint. Perhaps that first W caused death, don't know.

Kind of funny, these taints are all just to help out the LKML and debuggers know whether to start looking at a problem. It looks like many of the flags were added post proprietary modules. Though you don't have it, I'm curious of the "S" taint - where the kernel detects SMP incompatible CPUs installed - I had been running a dual Celeron machine in the past that should qualify for the "S" taint.

I've never had hard drives in recent times cause system freeze-ups - they cause slow downs from retrys and I/O errors when they get offlined for me. You may have to look into other hardware issues, most of the system freezes I've had were due to bad motherboard devices.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2540
Location: Near Toronto

PostPosted: Tue Sep 09, 2014 1:07 am    Post subject: Reply with quote

Quote:
You may have to look into other hardware issues, most of the system freezes I've had were due to bad motherboard devices.


yes, I fear you are right; I've changed a sata cable and moved to a different port in
the hopes of placating the hardware gods ;)
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum