Subject: Re: wi0 going mute?
To: None <current-users@NetBSD.org>
From: David Young <dyoung@pobox.com>
List: current-users
Date: 01/25/2004 20:44:35
On Mon, Jan 26, 2004 at 01:20:49PM +1100, Paul Ripke wrote:
> 
> On Monday, Jan 26, 2004, at 07:01 Australia/Sydney, David Young wrote:
> 
> >On Sun, Jan 25, 2004 at 11:21:36PM +1100, Paul Ripke wrote:
> >>With -current kernel from around 20040119, userland somewhat
> >>more ancient (about 200304??), after giving my system some work,
> >>like building a release, its wireless interface has a nasty
> >>habit of going mute. Packets are still being received, but
> >>nothing sent.
> >
> >I have seen this, too. I think it is a bug in the rate adaptation 
> >code. I
> >don't see what it would have to do with building a release, unless the
> >kernel misses interrupts under those conditions....
> 
> I'm guessing it is missing interrupts. Not sure which device is
> holding things up, but I know wi0 will drop more pings when I'm
> reading/writing from/to a SCSI tape drive. I also notice the
> more interrupt load the system is under, the client machine,
> a Mac OS X 10.2.x laptop with DWL-650 card running
> "WirelessDriver" from sf.net, occasionally the status LED starts
> blinking as if its lost carrier.

"Nevermind!"

Does ifconfig tell you that OACTIVE is set?  Is there a lot of broadcast
traffic on this net?

See if it prolongs the inevitable when you increase WI_NTXRSS from 10
to 40 or 160.  Also, it would be interesting to see what the content of
sc->sc_rssd[] is when wi_stop cleans it up after you 'ifconfig wi0 down'.

Dave

> 
> Is there anything useful I can look at in gdb when the system
> gets into this state? Gives me an excuse to upgrade userland.
> 
> >>thing I've noticed is that the kernel prints a couple of
> >>"wi0: bad idx 6e"
> >
> >Hmm. This could be an endianness issue. What architecture is this?  I 
> >am
> >betting that it is big-endian.  0x6e (110) is Prism's representation 
> >for
> >11Mbps. The data rate and the index both go into the same 16-bit word,
> >but they are u_int8_t's in the wi_frame struct.
> 
> Bzzt. Nice theory, but this is on x86. How much do I win? :)
> Seriously, would adding some code to dump out more of the frame
> in the "bad idx" case help?
> 
> Ahah! Just did some more playing... I can now repeat the muting
> at will...
> Kicking off a "ttcp -s -t -b 65536 -l 65536" over a 100 MBit LAN
> (tlp0 interface) resulted in the following pings over wi0 - note
> that ttcp only ran for 11 seconds, and the ping was running at
> 1 second intervals, from laptop to NetBSD box:
> 
> 64 bytes from 192.168.254.129: icmp_seq=429 ttl=255 time=4.95 ms
> 64 bytes from 192.168.254.129: icmp_seq=430 ttl=255 time=4.798 ms
> 64 bytes from 192.168.254.129: icmp_seq=431 ttl=255 time=5.011 ms
> 64 bytes from 192.168.254.129: icmp_seq=440 ttl=255 time=4.288 ms
> 64 bytes from 192.168.254.129: icmp_seq=441 ttl=255 time=1005 ms
> 64 bytes from 192.168.254.129: icmp_seq=442 ttl=255 time=3008.79 ms
> 64 bytes from 192.168.254.129: icmp_seq=443 ttl=255 time=3008.1 ms
> 64 bytes from 192.168.254.129: icmp_seq=444 ttl=255 time=3007.73 ms
> 64 bytes from 192.168.254.129: icmp_seq=445 ttl=255 time=3007.37 ms
> 64 bytes from 192.168.254.129: icmp_seq=446 ttl=255 time=2877.65 ms
> 64 bytes from 192.168.254.129: icmp_seq=447 ttl=255 time=2006.51 ms
> 64 bytes from 192.168.254.129: icmp_seq=448 ttl=255 time=2006.16 ms
> 64 bytes from 192.168.254.129: icmp_seq=449 ttl=255 time=2874.81 ms
> 64 bytes from 192.168.254.129: icmp_seq=450 ttl=255 time=2026.97 ms
> 64 bytes from 192.168.254.129: icmp_seq=451 ttl=255 time=2007.26 ms
> 64 bytes from 192.168.254.129: icmp_seq=452 ttl=255 time=2006.23 ms
> 64 bytes from 192.168.254.129: icmp_seq=453 ttl=255 time=2008.94 ms
> 64 bytes from 192.168.254.129: icmp_seq=454 ttl=255 time=3874.05 ms
> 64 bytes from 192.168.254.129: icmp_seq=455 ttl=255 time=6010.24 ms
> 64 bytes from 192.168.254.129: icmp_seq=456 ttl=255 time=6009.22 ms
> 64 bytes from 192.168.254.129: icmp_seq=457 ttl=255 time=6009.53 ms
> 64 bytes from 192.168.254.129: icmp_seq=458 ttl=255 time=7009.32 ms
> 64 bytes from 192.168.254.129: icmp_seq=459 ttl=255 time=7009.93 ms
> 64 bytes from 192.168.254.129: icmp_seq=460 ttl=255 time=7069.77 ms
> 64 bytes from 192.168.254.129: icmp_seq=461 ttl=255 time=7009.68 ms
> 64 bytes from 192.168.254.129: icmp_seq=462 ttl=255 time=6126.11 ms
> 64 bytes from 192.168.254.129: icmp_seq=463 ttl=255 time=6010.1 ms
> 64 bytes from 192.168.254.129: icmp_seq=464 ttl=255 time=5123.26 ms
> 64 bytes from 192.168.254.129: icmp_seq=465 ttl=255 time=5008.62 ms
> 64 bytes from 192.168.254.129: icmp_seq=466 ttl=255 time=5010.2 ms
> 64 bytes from 192.168.254.129: icmp_seq=467 ttl=255 time=5009.07 ms
> 64 bytes from 192.168.254.129: icmp_seq=468 ttl=255 time=7011.13 ms
> 64 bytes from 192.168.254.129: icmp_seq=469 ttl=255 time=9012.41 ms
> 64 bytes from 192.168.254.129: icmp_seq=470 ttl=255 time=11100.9 ms
> 64 bytes from 192.168.254.129: icmp_seq=471 ttl=255 time=11013.4 ms
> 64 bytes from 192.168.254.129: icmp_seq=472 ttl=255 time=10097.6 ms
> 64 bytes from 192.168.254.129: icmp_seq=473 ttl=255 time=10024.5 ms
> 64 bytes from 192.168.254.129: icmp_seq=474 ttl=255 time=9190.56 ms
> 64 bytes from 192.168.254.129: icmp_seq=475 ttl=255 time=8301.99 ms
> 64 bytes from 192.168.254.129: icmp_seq=476 ttl=255 time=7479.14 ms
> 64 bytes from 192.168.254.129: icmp_seq=477 ttl=255 time=6716.77 ms
> 64 bytes from 192.168.254.129: icmp_seq=478 ttl=255 time=5968.18 ms
> 64 bytes from 192.168.254.129: icmp_seq=479 ttl=255 time=5090.35 ms
> 64 bytes from 192.168.254.129: icmp_seq=480 ttl=255 time=5179.41 ms
> 64 bytes from 192.168.254.129: icmp_seq=481 ttl=255 time=5095.31 ms
> 64 bytes from 192.168.254.129: icmp_seq=482 ttl=255 time=6033.54 ms
> 64 bytes from 192.168.254.129: icmp_seq=483 ttl=255 time=7163.45 ms
> 64 bytes from 192.168.254.129: icmp_seq=484 ttl=255 time=10091.2 ms
> 64 bytes from 192.168.254.129: icmp_seq=485 ttl=255 time=10080.8 ms
> 64 bytes from 192.168.254.129: icmp_seq=486 ttl=255 time=11080.9 ms
> 64 bytes from 192.168.254.129: icmp_seq=487 ttl=255 time=13082.2 ms
> 64 bytes from 192.168.254.129: icmp_seq=488 ttl=255 time=13489.1 ms
> 64 bytes from 192.168.254.129: icmp_seq=489 ttl=255 time=13014 ms
> 64 bytes from 192.168.254.129: icmp_seq=490 ttl=255 time=13039.2 ms
> 64 bytes from 192.168.254.129: icmp_seq=491 ttl=255 time=13534.8 ms
> 64 bytes from 192.168.254.129: icmp_seq=492 ttl=255 time=13015.9 ms
> 64 bytes from 192.168.254.129: icmp_seq=493 ttl=255 time=13013.6 ms
> 64 bytes from 192.168.254.129: icmp_seq=494 ttl=255 time=13549.4 ms
> 64 bytes from 192.168.254.129: icmp_seq=495 ttl=255 time=13014.3 ms
> 64 bytes from 192.168.254.129: icmp_seq=496 ttl=255 time=13218 ms
> 64 bytes from 192.168.254.129: icmp_seq=497 ttl=255 time=13577.8 ms
> 64 bytes from 192.168.254.129: icmp_seq=498 ttl=255 time=12827.5 ms
> 
> After a while it died completely. Looks like a transmit queue is
> building up, that only gets touched on receipt of an interrupt.
> Once it gets behind, it has a hard time catching up. Guessing it
> is a missed interrupt that starts the landslide. This is all
> plain routing too, no bridging. The "build release" may have been
> a furphy - it was probably more to do with my using another box
> with NFS mounts over the LAN.
> 
> BTW: Full dmesg can be had from:
> <http://stix.homeunix.net:8080/about/dmesg-stix-pc.txt>
> The system is somewhat stretched - not that I'm making excuses
> for it, this still shouldn't happen.
> 
> Cheers,
> --
> Paul Ripke
> Unix/OpenVMS/TSM/DBA
> I love deadlines. I like the whooshing sound they make as they fly by.
> -- Douglas Adams

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933