Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved,hack] bcm43xx of 2.6.18 halts CPU suddenly
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on PPC
View previous topic :: View next topic  
Author Message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Sun Sep 24, 2006 4:41 pm    Post subject: [solved,hack] bcm43xx of 2.6.18 halts CPU suddenly Reply with quote

I am noticing an issue after upgrading to gentoo-sources-2.6.18 . The machine, when idle and left unattended for hours, does one of the following:

  • powerbook becomes unresponsive (local or over network) with HDD audibly running (but not doing much)
  • powerbook turns completely off

This has never occurred before with a kernel version and my system setup is no different than with 2.6.17-r4 . It happens every day now. On this system, I am compiling in standard fashion with gcc 4.1.1 .

The softdog module is not loaded, and the last kernel log output appears to be a controller reset by the bcm43xx driver. Can I disable the NETDEV WATCHDOG, whatever that is? The following is the relevant section of /var/log/kern.log ; and all other /var/log/ logfiles do not show entries as close to the power-off issue as this one.
Code:

Sep 23 23:25:06 barnabas [   97.738356] SoftMAC: Open Authentication completed with 00:13:46:d6:2b:e4
Sep 23 23:44:26 barnabas [ 1256.935067] agpgart: Putting AGP V2 device at 0000:00:0b.0 into 4x mode
Sep 23 23:44:26 barnabas [ 1256.935081] agpgart: Putting AGP V2 device at 0000:00:10.0 into 4x mode
Sep 23 23:44:27 barnabas [ 1257.200638] [drm] Setting GART location based on new memory map
Sep 23 23:44:27 barnabas [ 1257.200659] [drm] Loading R300 Microcode
Sep 23 23:44:27 barnabas [ 1257.200722] [drm] writeback test succeeded in 1 usecs
Sep 24 06:29:08 barnabas [25538.881387] NETDEV WATCHDOG: eth1: transmit timed out
Sep 24 06:29:08 barnabas [25538.881399] bcm43xx: Controller RESET (TX timeout) ...
Sep 24 06:29:08 barnabas [25538.900393] bcm43xx: Chip ID 0x4306, rev 0x3
Sep 24 06:29:08 barnabas [25538.900398] bcm43xx: Number of cores: 5
Sep 24 06:29:08 barnabas [25538.900403] bcm43xx: Core 0: ID 0x800, rev 0x4, vendor 0x4243, enabled
Sep 24 06:29:08 barnabas [25538.900412] bcm43xx: Core 1: ID 0x812, rev 0x5, vendor 0x4243, disabled
Sep 24 06:29:08 barnabas [25538.900420] bcm43xx: Core 2: ID 0x80d, rev 0x2, vendor 0x4243, enabled
Sep 24 06:29:08 barnabas [25538.900428] bcm43xx: Core 3: ID 0x807, rev 0x2, vendor 0x4243, disabled
Sep 24 06:29:08 barnabas [25538.900436] bcm43xx: Core 4: ID 0x804, rev 0x9, vendor 0x4243, enabled
Sep 24 06:29:08 barnabas [25538.903692] bcm43xx: PHY connected
Sep 24 06:29:08 barnabas [25538.903703] bcm43xx: Detected PHY: Version: 2, Type 2, Revision 2
Sep 24 06:29:08 barnabas [25538.903725] bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2)
Sep 24 06:29:08 barnabas [25538.903740] bcm43xx: Radio turned off
Sep 24 06:29:08 barnabas [25538.903751] bcm43xx: Radio turned off
Sep 24 06:29:08 barnabas [25538.903783] bcm43xx: Controller restarted
Sep 24 12:09:39 barnabas [   26.452981] PCI: Probing PCI hardware

_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html


Last edited by timotheus25 on Tue Sep 26, 2006 9:13 pm; edited 2 times in total
Back to top
View user's profile Send private message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Sun Sep 24, 2006 5:00 pm    Post subject: Reply with quote

A quick look at the bcm43xx sources shows that the net scheduler is used in bcm43xx_main.c, but that the dev->watchdog_timeo is never initialized like in orinoco and others; but I don't know enough about it to figure this out.
_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html
Back to top
View user's profile Send private message
JoseJX
Retired Dev
Retired Dev


Joined: 28 Apr 2002
Posts: 2774

PostPosted: Sun Sep 24, 2006 11:12 pm    Post subject: Reply with quote

There is a huge thead about this issue on lkml, it's an issue with the driver and it's being worked on.
_________________
Gentoo PPC FAQ: http://www.gentoo.org/doc/en/gentoo-ppc-faq.xml
Back to top
View user's profile Send private message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Sun Sep 24, 2006 11:47 pm    Post subject: Reply with quote

Is there a known work-around that you've heard of?
_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html
Back to top
View user's profile Send private message
JoseJX
Retired Dev
Retired Dev


Joined: 28 Apr 2002
Posts: 2774

PostPosted: Mon Sep 25, 2006 1:36 am    Post subject: Reply with quote

Use 2.6.17 or follow the thread on lkml, there are some test patches floating around. It also depends on the card, some aren't affected as much as others.
_________________
Gentoo PPC FAQ: http://www.gentoo.org/doc/en/gentoo-ppc-faq.xml
Back to top
View user's profile Send private message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Tue Sep 26, 2006 9:11 pm    Post subject: Reply with quote

Something to do with broken preemption code being released with 2.6.18. Too bad that it also affects kernels without preemption. Temporary solution that works well:
Code:

diff -u drivers/net/wireless/bcm43xx/bcm43xx_main.c.orig drivers/net/wireless/bcm43xx/bcm43xx_main.c                 
--- drivers/net/wireless/bcm43xx/bcm43xx_main.c.orig    2006-09-26 17:07:31.000000000 -0400
+++ drivers/net/wireless/bcm43xx/bcm43xx_main.c 2006-09-25 22:23:53.000000000 -0400
@@ -3166,7 +3166,7 @@
        if (state % 1 == 0) /* every 15 sec */
                badness += 1;
 
-#define BADNESS_LIMIT  4
+#define BADNESS_LIMIT  40
        return badness;
 }


_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html
Back to top
View user's profile Send private message
JoseJX
Retired Dev
Retired Dev


Joined: 28 Apr 2002
Posts: 2774

PostPosted: Tue Sep 26, 2006 9:39 pm    Post subject: Reply with quote

That helps by letting the periodic work run longer. The problem is preemption of the periodic work which can take long periods of time. By making the work preemptable, the long delays people were seing before have been drastically reduced, but we have an issue with locking with the preemptable work. There's a bunch of patches floating around, but try the ones in Larry's FTP first: ftp://lwfinger.dynalias.org/
_________________
Gentoo PPC FAQ: http://www.gentoo.org/doc/en/gentoo-ppc-faq.xml
Back to top
View user's profile Send private message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Tue Sep 26, 2006 11:34 pm    Post subject: Reply with quote

JoseJX wrote:
That helps by letting the periodic work run longer. The problem is preemption of the periodic work which can take long periods of time. By making the work preemptable, the long delays people were seing before have been drastically reduced, but we have an issue with locking with the preemptable work.


That's basically what I read on lkml. But it doesn't explain why the first trigger of the NETDEV WATCHDOG always locks my machine, and sometimes even powers-off the machine. My guess would be that 'rebooting' the net device is the whole issue on my particular model, and the wrong way to handle the preemption for code that otherwise works very very well on this model. I'll volunteer for testing should I have time...
_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html
Back to top
View user's profile Send private message
JoseJX
Retired Dev
Retired Dev


Joined: 28 Apr 2002
Posts: 2774

PostPosted: Wed Sep 27, 2006 1:53 am    Post subject: Reply with quote

The reason why it reboots is that the PMU has a built in watchdog that needs to be checked often. When the bcm43xx card locks, it prevents the PMU from being checked, so the PMU reboots the machine.
_________________
Gentoo PPC FAQ: http://www.gentoo.org/doc/en/gentoo-ppc-faq.xml
Back to top
View user's profile Send private message
timotheus25
Apprentice
Apprentice


Joined: 27 Dec 2005
Posts: 162
Location: *upstate* New York, USA

PostPosted: Wed Sep 27, 2006 2:55 am    Post subject: Reply with quote

JoseJX wrote:
The reason why it reboots is that the PMU has a built in watchdog that needs to be checked often. When the bcm43xx card locks, it prevents the PMU from being checked, so the PMU reboots the machine.


Really? I've seen the kernel panic and the machine sit frozen for hours, still powered on displaying the stack strace, so I'm a bit surprised that it would be the PMU turning off the CPU from not being polled recently. Also, the machine doesn't reboot, but turns-off from the bcm43xx issue. Either way, I don't expect that my system is exactly to published Apple specification; for example, the auto-frequency adjust of the CPU, when enabled, causes the machine to have occasional memory and I2C exceptions with any OS running, despite Freescale and Apple indicating support on the data/product sheets.

Just the same, it is interesting that some of the powerbooks do that (or even that mine is supposed to). Perhaps my PMU only halts the machine after the CPU encounters a malformated instruction / memory access (and resultant CPU halt), but not on mutex deadlocks (and resultant CPU idle and/or infinite loop).
_________________
http://tstotts.net/linux/gentoopb.html
http://tstotts.net/linux/gentooinsp640m.html
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on PPC All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum