Subject: SS5 loses with "arp_drain: locked; punting" scroll
To: None <tech-kern@netbsd.org, tech-net@netbsd.org>
From: matthew green <mrg@eterna.com.au>
List: tech-net
Date: 04/08/2004 11:05:09
[ sorry for the cross post, but this problem seems ultra strange to me ]


hi folks.


i recently installed an SS5 as my home router.  it has a nell & wi, first
had a be0 card plus onboard le0, then hme0 then le0.  i've not installed
a 2nd hme0 and am not using the onboard le0...  anyway, every couple of
days without fail it has locked up (not always solid, but one time was)
with the console covered in messages "arp_drain: locked; punting".  i've
only ever woken up to it doing this so i've never seen what happens before
or when it starts doing this, but from what i can tell, it starts to fail
to route internal hosts one by one until they all are "gone", and this
process make take an hour or more.  ie, one connection was broken at 6am
while another not until 8am (both "active" connections.)


for now i'm trying out the hme0 card instead, the two predecessors to this
machine had onboard le0 but used two hme0 cards instead...


anyone have any ideas or comments?  why is the arp lock being locked?
looking at the code i can see why it may take a long time for some hosts
to drop off -- the arp list aging also wants to take the lock and will
simply punt & return if it can't.  the lock is only taken in the drain
and timer routines, which both are simple "call free on each item in this
list" functions, plus arp_rtrequest() which takes the lock after a little
bit of checking...


the sparc is running week-old -current, and acts as a router between
my wireless & internal networks, plus the dsl modem to the internet.



.mrg.