port-macppc: Re: current panic: trap

Subject: Re: current panic: trap
To: Michael <macallan18@earthlink.net>
From: Chris Tribo <ctribo@college.dtcc.edu>
List: port-macppc
Date: 12/14/2004 22:53:42
On Dec 14, 2004, at 8:16 AM, Michael wrote:

> Hello,
>
>>  From the land of the inexplicable, I rebooted, same kernel, and it
>> worked!
> Now that's weird. It seems to have paniced in establish_intr() or 
> something related...

I've booted the same kernel about five times now and nothing weird has 
happened yet.

Oddly enough the PCI bridge has a full OF instance. I guess I shouldn't 
be surprised since it's the same bridge the B&W G3 uses.

>> so then I built a current kernel from todays sources with no
>> patches applied and that's what the panic and now, non-panic is from.
> Ok, so the bridge-fix works for you - wasn't sure if it handles 64bit 
> PCI correctly but apparently it does :)

In a 32 bit slot at least =]

>> Since I should know better than not posting full dmesgs by now, here's
>> the whole thing. IRQ 25 seems to be correct as slot 1 is 23 and slot 2
>> is 24. It looks like it probes everything fine. The starfire nic's
>> won't send packets though, tcpdump shows data coming in off the wire,
>> and ifconfig -i shows packet counters incrementing in both directions.
>> So nothing wrong with the pci code for me, keep up the good work! 
>> Sorry
>> for the noise!
> Hmm, what kind of interrupt controller does the G3 have? Apple's Grand 
> Central or an OpenPIC? Since there are IRQs above 31 in your dmesg I'd 
> assume it's an OpenPIC so the interrupt code fixes won't affect you.
> Besides that there seems to be something wrong with the OpenPIC code 
> too - my iBook seems to lose interrupts, just like the S900 used to 
> before the patch. Well, not quite as bad and it's still only a 
> suspicion but looking at it can't hurt.

It's a Grackle / Heathrow combo, I don't know what delineates OpenPIC. 
vmstat -i says "pic irq xx"

> Do the other devices ( ohci, firewire and so on ) work? Could you 
> check throughput / data loss on the serials? If they're unreliable 
> we'll definitely have to look at the OpenPIC stuff ( which in turn may 
> fix the problem the snapper audio controller has in my iBook G4 )
> If you have a spare SCSI disk sitting around please hook it up with 
> the mesh, allow sync mode ( in kernel config change the upper 8 bits 
> of the flags field in mesh's line to 0, the lower 8 bits are ignored 
> anyway so you may just set it to 0 ) and see what throughput you get, 
> should be quite a bit better than it used to be.
>

I plugged in a mouse and a PNY USB 256 MB thumb drive and I can mount 
it and fsck_msdos on it, but then I tried to do "file blah.gz" that's 
about 7MB and everything times out. I can't kill the process as it 
seems to be stuck waiting on IO. I ended up yanking it out of the 
socket which killed the process.

sd0: fabricating a geometry
sd0: no NetBSD disk label
umass0: BBB reset failed, TIMEOUT
umass0: BBB bulk-in clear stall failed, TIMEOUT
umass0: BBB bulk-out clear stall failed, TIMEOUT
umass0: BBB reset failed, TIMEOUT

# usbdevs
addr 1: OHCI root hub, Opti
addr 3: product 0x0110, Samsung Electronics
addr 2: product 0x0095, Microsoft

The StarFire card driver seems to have some serious issues on macppc. 
All incoming mac addresses are bogus and the destination mac address in 
tcpdump that should be the cards MAC address is showing up as 
de:ad:be:ef:00:16

So I just tried to download pkgsrc.tar.gz from a nearby mirror, the 
onboard bm0 NIC wouldn't go faster than 13 kB/sec. My sawtooth G4 
running OS X just downloaded from the same mirror at the same time at 
185 kB/sec. In fact, it's going even slower between another machine on 
the same switch. 7 kB/sec. I see collisions every time it tries to 
transmit. The only IRQs that are firing according to vmstat is cpu0 
clock, cpu0 soft net (at the rate of 1 IRQ/sec) and pic irq 18 (adb) 
because I'm typing. I pulled the starfire card and rebooted to make 
sure it wasn't causing anything weird but no change.

Tried to test with firewire and I got a panic. I tried connecting 
directly to my G4 so I could try using fw ip but the node kept going up 
and down up and down. So then I unplugged it and plugged it back into 
my FW cd burner which suddnly caused all sorts of weird things to 
happen.

fwnode0 at fwohci0 Node 0: UID 00:30:65:ff:fe:42:80:04 <- the computer 
that was unplugged earlier
fwnode1 at fwohci0 Node 0: UID 00:d0:4b:01:fe:42:80:04 <- What?! It 
looks like it took half of the computers address and half of the cd 
burners and merged them into one!
fwnode2 at fwohci0 Node 0: UID 00:d0:4b:01:07:16:40:3a <- I think 
that's the burner now
fwnode0: link speed: 100 Mb/s, max_rec: 64 bytes
sbpscsi0 at fwnode0
scsibus1 at sbpscsi0
...
multiply freed item 0xd01de920
panic: free: duplicated free
Stopped in pid 2.1 (fwohci0) at netbsd:cpu_Debugger+0x10: lwz r0, r1,
0
x14
db> bt
0xd52fdb70: at panic+0x19c
0xd52fdc00: at free+0x218
0xd52fdc40: at sbp2_free+0x10c
0xd52fdc60: at sbpscsi_match+0x94
0xd52fdc80: at mapply+0x40
0xd52fdca0: at config_Search_loc+0x168
0xd52fdd00: at config_found_sm_loc+0x40
0xd52fdd20: at p1212_match_units+0xa0
0xd52fdd60: at fwnode_configrom_input+0x234
0xd52fdd80: at fwohci_Read_multi_+resp+0x164
0xd52fde40: at fwohci_arrs_input+0xa8
0xd52fdef0: at fwohci_Event_thread+0x1b0
0xd52fdf20: at fwohci_thread_init+0x21c
0xd52fdf40: at cpu_switchto+0x44
saved LR(0x939d6636) is invalid.
db>

firewire code aside (we have experimental code enabled by default in 
generic?), things are not quite right. I could do at least 60 kB/sec 
with bm when it would cooperate under 2.0, now it's less than 10 
kB/sec.