Subject: Re: current panic: trap
To: Chris Tribo <ctribo@college.dtcc.edu>
From: Michael <macallan18@earthlink.net>
List: port-macppc
Date: 12/14/2004 23:30:34
Hello,

> >>  From the land of the inexplicable, I rebooted, same kernel, and it
> >> worked!
> > Now that's weird. It seems to have paniced in establish_intr() or 
> > something related...
> 
> I've booted the same kernel about five times now and nothing weird has 
> happened yet.
Strange, I keep wondering what caused the panic.

> Oddly enough the PCI bridge has a full OF instance. I guess I shouldn't 
> be surprised since it's the same bridge the B&W G3 uses.
OF does support PCI bridges - the S900 has one of them and runs unchanged OF 1.0.5.
( that's why some of my PCI cards share IRQ 25 too )

> >> so then I built a current kernel from todays sources with no
> >> patches applied and that's what the panic and now, non-panic is from.
> > Ok, so the bridge-fix works for you - wasn't sure if it handles 64bit 
> > PCI correctly but apparently it does :)
> 
> In a 32 bit slot at least =]
Well, the bus on the other side of the bridge is 64bit - but that shouldn't have any impact on the IRQ line registers so it was expectable that everything works, I just couldn't be sure :)

> > Hmm, what kind of interrupt controller does the G3 have? Apple's Grand 
> > Central or an OpenPIC? Since there are IRQs above 31 in your dmesg I'd 
> > assume it's an OpenPIC so the interrupt code fixes won't affect you.
> > Besides that there seems to be something wrong with the OpenPIC code 
> > too - my iBook seems to lose interrupts, just like the S900 used to 
> > before the patch. Well, not quite as bad and it's still only a 
> > suspicion but looking at it can't hurt.
> 
> It's a Grackle / Heathrow combo, I don't know what delineates OpenPIC. 
> vmstat -i says "pic irq xx"
Ok, Heathrow is the same interrupt controller as in Grand Central, just with a second one so it effectively doubles the usable IRQs. So the patches should affect you.

> I plugged in a mouse and a PNY USB 256 MB thumb drive and I can mount 
> it and fsck_msdos on it, but then I tried to do "file blah.gz" that's 
> about 7MB and everything times out. I can't kill the process as it 
> seems to be stuck waiting on IO. I ended up yanking it out of the 
> socket which killed the process.
> 
> sd0: fabricating a geometry
> sd0: no NetBSD disk label
> umass0: BBB reset failed, TIMEOUT
> umass0: BBB bulk-in clear stall failed, TIMEOUT
> umass0: BBB bulk-out clear stall failed, TIMEOUT
> umass0: BBB reset failed, TIMEOUT
> 
> # usbdevs
> addr 1: OHCI root hub, Opti
> addr 3: product 0x0110, Samsung Electronics
> addr 2: product 0x0095, Microsoft
Argh, and the interrupt isn't even shared. This leaves three possibilities - the USB stick is weird, you didn't get the interrupt fixes or there's something wrong with them - yours would be the first known G3 which has problems. Do other USB devices still work? If all USB stuff is dead it's probably the interrupt code, if others continue working it's almost certainly the drive and umass needs another quirks entry.

> The StarFire card driver seems to have some serious issues on macppc. 
> All incoming mac addresses are bogus and the destination mac address in 
> tcpdump that should be the cards MAC address is showing up as 
> de:ad:be:ef:00:16
Ouch, looks as if the driver has serious trouble communicating with the controller.

> So I just tried to download pkgsrc.tar.gz from a nearby mirror, the 
> onboard bm0 NIC wouldn't go faster than 13 kB/sec. My sawtooth G4 
> running OS X just downloaded from the same mirror at the same time at 
> 185 kB/sec. In fact, it's going even slower between another machine on 
> the same switch. 7 kB/sec. I see collisions every time it tries to 
> transmit. The only IRQs that are firing according to vmstat is cpu0 
> clock, cpu0 soft net (at the rate of 1 IRQ/sec) and pic irq 18 (adb) 
> because I'm typing. I pulled the starfire card and rebooted to make 
> sure it wasn't causing anything weird but no change.
Ouch, that looks bad.

> firewire code aside (we have experimental code enabled by default in 
> generic?), things are not quite right. I could do at least 60 kB/sec 
> with bm when it would cooperate under 2.0, now it's less than 10 
> kB/sec. 
When did you pull the kernel source from cvs? Or - which version does src/sys.macppc/macppc/extintr.c show? It should be 1.45 from 12/09 or newer, if you have an older one please update and try again. If not then we're in trouble - well, Allen and I are. It's never been tested with a Heathrow.

have fun
Michael