Subject: Re: current panic: trap
To: Michael <macallan18@earthlink.net>
From: Chris Tribo <ctribo@college.dtcc.edu>
List: port-macppc
Date: 12/22/2004 03:49:16
I took a hack day and got back to playing with my beige G3.

The situation seems to be this:

IRQs below 23 and all the PCI slots fire fine and consistently. Sliding 
the mouse around fires the IRQ for that USB card slot. The bm 
definitely seems to be misbehaving though. It's IRQ's seem to fire for 
bursts up to about 100 per second, then drop to 0 for a second, then 
burst back and none. I'm extracting pkgsrc.tar.gz inside X while 
sliding the mouse around and typing and IRQ 13 is firing at about 1500 
IRQs per second consistently. The StarFire won't work but what's new. I 
can't tell if bm is misbehaving or if something weird is happening in 
TCP land. There certainly are a lot of collisions for being connected 
to a switch. Transfer speed seems to start fast and then slowly drop 
off to about 20 k/sec even with ftp. The oversize frame error sounds 
kind of weird, and when I run dhclient I see:

ip length 576 disagrees with bytes received 580.
accepting packet with data after udp payload.

I remember asking about this on current-users a while back and someone 
said that it was because CRCs/checksums were being ignored somewhere. 
Wow, ok this is odd too. I just sent pkgsrc.tar.gz with scp to my OS X 
machine at over 1 MB/sec and it didn't even flinch once. I try to scp 
the same file I just sent back to the Beige G3 and I'm going at 8.7 
KB/sec. Very strange. Every time the bm transmits it's nothing but 
collisions, then it stops for a second, more collisions, over and over. 
My switch is indicating collisions when I send and receive from my G4 
mac running OS X, but no collisions are registered in netstat -i. 
Packet counters are increasing though. It's pretty consistently going 
from 0 interrupts, to 8/sec, back to 0, when I try to receive a file.

Is XFree86 supposed to work on the built in Rage IIc+ chip? There 
doesn't seem to be an ati driver loading for it. If I switch from vga 
to ati the logs say it loads rage128 and radeon but I didn't see any 
Rage Pro chips listed and the X server bails. Using the vga driver 
results with either a 320x240 screen which shows up as a 1/4" tall line 
across my screen about 1/3 of the way from the bottom, or nothing. 
Using Xmacppc works but is obviously not ideal. Stranger still, if I 
unplug my mouse it sends a broken pipe to the xterm which then kills 
the X server.

StarFire# Dec 22 02:11:07 StarFire /netbsd: bm0: discarding oversize 
frame (len=1528)
vmstat -i
interrupt                                     total     rate
cpu0 clock                                   138625      100
cpu0 soft clock                                 753        0
cpu0 soft net                                 33994       24
pic irq 23                                    13947       10
pic irq 24                                        5        0
pic irq 25                                      312        0
pic irq 12                                       46        0
pic irq 42                                    11968        8
pic irq 33                                    19913       14
pic irq 13                                    59616       43
pic irq 14                                        4        0
pic irq 18                                    20624       14
Total                                        299807      217

I pulled down and built a kernel from sources of about an hour ago, it 
booted up to
scsibus0: waiting 2 seconds for devices to settle...
umass0 at uhub0 port 1 configuration 1 interface 0
umass0: LaCie USB Mass STORAGE, rev 1.00/2.00, addr 2
umass0: using ATAPI over CBI
atapibus0 at umass0: 2 targets
uhidev0 at uhub1 port 1 configuration 1 interface 0
...
uhid0...
uhid1...
uhub2 at uhub1 port 2
uhub2: Atmel UHB124 hub, class 9/0, rev 1.00/1.00, addr 3
uhub2: 4 ports with 4 removable, bus powered
(freeze)
I can't drop into ddb even. So back to the old kernel from Dec 13th.

Now that I think of it, I remember seeing "discarding oversized frame" 
errors from my Cisco an 350 card in my powerbook. I wonder if that is 
related somehow. Sorry for this discombobulated email.

On Dec 15, 2004, at 6:20 AM, Michael wrote:

> Hello,
>
>> So the uni-north on the first AGP G4 and later are OpenPIC? The B&W G3
>> is Grackle/Heathrow with a CMD IDE chip for the internal HD, and I'm
>> pretty sure it's the same for the PCI graphics G4.
> Apparently.
>
>>>> umass0: BBB reset failed, TIMEOUT
>>>> umass0: BBB bulk-in clear stall failed, TIMEOUT
>>>> umass0: BBB bulk-out clear stall failed, TIMEOUT
>>>> umass0: BBB reset failed, TIMEOUT
>>>
>>> I suspect at least some of this could be missed interrupts.
>>
>> I tried copying off my firewire burner and the data light just goes on
>> and then off for like 5 seconds, then it picks up again and starts
>> copying for about 10 seconds and stops again. Now it's stopped
>> completely. cp is waiting on uvn_fp2 and the fwohci0 thread is waiting
>> on fwohciev.
> Please look at systat 1 vmstat when making experiments like that and 
> see if IRQs stop firing ( for a couple of seconds ) and start again or 
> if they stop completely for any particular IRQ. I /think/ deadlocks 
> related to the interrupt handler ( which would lead to some IRQ(s) 
> stop getting any attention ) shouldn't happen anymore and sure enough 
> I didn't see any since Allen committed the changes to extintr.c but my 
> machine seems to be weird anyway :)
>
>> I dunno if the USB code just left the system in a weird state or what.
>> After the reboot I tried to copy the file down again and it started 
>> out
>> at 1MB/sec then dropped slower and slower and slower, now it's down to
>> 40 KB/sec and dropping. Very strange.
> Indeed. So it didn't lock up completely ( as it did for me before the 
> patch ), definitely looks like lost interrupts ( or extremely high 
> latency ) - apparently there's something fishy in the handler, it 
> sometimes seems to gets stuck in a high priority level or so - I 
> wonder why all these problems avoid me so I can't properly track them 
> down :/
> Hmm, bm seems to be the only device that uses interrupts from the 2nd 
> controller, somehow I'd have expected that these make problems and the 
> lower 32 work fine
>
> have fun
> Michael