Subject: Lockup under heavy network use
To: None <port-cobalt@netbsd.org, port-macppc@netbsd.org,>
From: John Klos <john@ziaspace.com>
List: port-macppc
Date: 09/08/2005 18:29:30
Hello,

I'm seeing some interesting lockup problems on two different machines. One 
is a 200 MHz PowerPC 603e system, the other a 250 MHz Cobalt Raq2. Both 
are serving around 20 to 30 Mbps of web traffic, which is about as much as 
they can serve. I didn't want faster systems because I didn't want to use 
much more bandwidth than that (and altq is not exactly production ready 
yet). However, both of them have locked up under heavy network use. The 
symptoms are the same: they still respond to ICMP on both IPv4 and IPv6, 
but don't actually answer requests. Unfortunately, both are colocated, and 
neither has a serial terminal or console (yet).

The only thing which resembles a clue otherwise is seeing this on a root 
shell on the Cobalt right before the last lockup:

free(100676a8) bad block. (memtop = 100b3800 membot = 10058550)
free(10067688) bad block. (memtop = 100b3800 membot = 10058550)
free(10067668) bad block. (memtop = 100b3800 membot = 10058550)
free(10068608) bad block. (memtop = 100b3800 membot = 10058550)
free(10068c08) bad block. (memtop = 100b3800 membot = 10058550)
free(10067648) bad block. (memtop = 100b3800 membot = 10058550)

On the PowerMac, I was getting these from time to time, but that hardly 
seems all that bad:

wm0: excessive collisions
wm0: late collision
wm0: excessive collisions
wm0: excessive collisions
wm0: late collision
wm0: excessive collisions
wm0: late collision

Both systems are running NetBSD 2.1_RC3. netstat -m shows that they are 
nowhere near exhausting their nmbclusters (which is set to 16k).

Any ideas?

Thanks,
John Klos
-- 
I've seen Sun monitors on fire off the side of the multimedia lab.
I've seen NTU lights glitter in the dark near the Mail Gate.
All these things will be lost in time, like the root partition last week.
Time to die...
                 -- Peter Gutmann