Subject: kern/33440: bce0 network interface freezes system when configured with ifconfig
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <raphael@raphael.g-system.at>
List: netbsd-bugs
Date: 05/08/2006 07:40:00
>Number:         33440
>Category:       kern
>Synopsis:       bce0 network interface freezes system when configured with ifconfig
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 08 07:40:00 +0000 2006
>Originator:     Raphael Langerhorst
>Release:        3.99.19
>Organization:
>Environment:
NetBSD spirit.raphael.g-system.at 3.99.19 NetBSD 3.99.19 (GENERIC-SPIRIT-$Revision: 1.750) #8: Sun May  7 20:38:51 CEST 2006  root@spirit.raphael.g-system.at:/usr/obj/sys/arch/i386/compile/GENERIC_LAPTOP i386
>Description:
On a DELL Inspiron 6000 laptop, when using ifconfig with a bce0 network interface, the system freezes. The bug is (I guess) either in ifconfig or in the if_bce device driver (sys/dev/pci/if_bce.c) (but the cause might be somewhere totally different, see known fix below!).

This problem did not occur on NetBSD 3.0 release. I only observed it after updating to the current NetBSD 4 development version.

A PCMCIA network interface (a tlp0 in that case - Netgear) works fine with 3.99.19

Note: The laptop has an Intel 915 board.


>How-To-Repeat:
Boot a machine with NetBSD 3.99.19 (current CVS HEAD) and with a bce0 network interface. After startup, run "ifconfig bce0 up"

This will freeze the system (Ctrl+Alt+Esc works still!)
>Fix:
I added various debug messages both into ifconfig and the if_bce driver. But after playing around for two hours I wasn't able to find anything - for example ifconfig actually gets to the end of main(), so it exits gracefully. And it seems (not sure!) that all routines from the bce driver also don't hang (that is, they get from begin to end). 

So I'm not sure if the system hangs somewhere else (interrupt related?)

It seems to me that the system repeatedly(!) tries to init the device (two times it is).

Maybe there is something wrong with the bce_timeout (or related calls)?

I'm not a kernel developer, so I'm a bit stuck now, and I hope that this bug report helps solving the issue.