Subject: Re: no buffer space available
To: NetBSD User's Discussion List <netbsd-users@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-users
Date: 06/24/2001 22:24:57
[ On Sunday, June 24, 2001 at 21:28:42 (-0400), Steven M. Bellovin wrote: ]
> Subject: Re: no buffer space available 
>
> A lost interrupt could probably produce the same symptoms, if the 
> driver doesn't have adequate recovery code, such as a watchdog timer.
> I haven't looked at any of the NetBSD Ethernet drivers to see what they 
> do.  But I remember the first time I saw a device driver that could 
> handle such situations -- it was amazing how much of a difference it 
> made to system stability.

Ah!  That could be it!  IIRC I've only ever had trouble with the ISA
version of 'le', and now with the PCI 'rtk' (and maybe the ISA 'iy').

Hmmm... seems like that's the problem indeed, though the fix that's
apparently supposed to already be there isn't working.  When I look back
in the logs I see:

	May 25 18:40:51 isit /netbsd: rtk0: watchdog timeout
	Jun 20 20:00:40 isit /netbsd: rtk0: watchdog timeout
	Jun 20 20:21:45 isit /netbsd: rtk0: watchdog timeout

and indeed Jun 20 at 8pm was the last time I remember manually downing
and resetting rtk0 to get back online....  Yup!  I had to login on the
console to see what was wrong and do the ifconfig's:

root      console                   Wed Jun 20 20:09 - shutdown (1+00:52)

and the freeze-up did happen twice in a short time....

Maybe I need a "watcdog timeout" watchdog!  ;-)

It'll soon be time to upgrade that router again anyway -- maybe there've
been related improvments since the 1.5F it's running....

>  (Anyone else remember having to pop out 
> the unit number plug on IBM 2314 disk drives, or do a 'vary ...,offline'/
> 'vary ...,online' sequence on IBM mainframes to recover from lost disk 
> drive interrupts?)

I've certainly done lots similar things with other communications
devices!  ;-)

(and I've learned to manually reset the scsi bus after disconnecting and
reconnecting devices "live" -- something that seems to always be
necessary on adaptec controllers on NetBSD, at least with the few
external devices I have, such as my CMD RAID arrays, because if you
don't do that first then you'll almost certainly hang the bus when you
first try to access the reconnected device)

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>