Subject: Re: Experiences with 2.0beta?
To: NetBSD port-sparc mailing list <port-sparc@netbsd.org>
From: Julian Coleman <jdc@coris.org.uk>
List: port-sparc
Date: 05/04/2004 12:01:42
>                                          I've started with a current
> kernel 2.0C that caused a total freeze nearly once a day, the error each
> time was "unable to allocate scsipi_xfer" and "unable to allocate ecb". I
> then switched to 2.0D that seemed to solve this issue but showed another
> one: after some time I got frequent "sbrk: grow ... failed, error 12". Now
> I'm at 2.0E that hasn't produced this error yet, but instead I got
> "virtual memory exhausted" during compilation, and no network daemons
> would work anymore (no sshd, no smtpd, even if restarted). And I had the
> same problems from 2.0C again. So I think the memory management code must
> be broken in some way.

There seems to be a kernel pool memory leak somewhere in the networking
code.  I'm running 1.6ZG (i.e. before ipfilter 4) and bridge+ipf on le0,
qe0 and qe1.  Without ipfilter, the machine will run for about a month
before running out of kernel pool memory (usually with a "unable to allocate
scsipi_xfer" message).  With ipfilter, it won't run for a week (currently
I reboot it every day).

I'd be interested to see if you have a similar problem.  Can you look at
the output from `netstat -m` and the mbuf pools from `vmstat -m`?  I see:

  $ uptime                    
  11:37AM  up  9:22, 1 user, load averages: 0.24, 0.17, 0.17
  $ netstat -m                
  2562 mbufs in use:
          2562 mbufs allocated to packet headers
  0 calls to protocol drain routines
  $ vmstat -m
  Memory resource pool statistics
  Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
    ...
  mbpl         256     2571    0        0   162     0   162   162     1   inf    1
  mclpl       2048        5    0        0     7     0     7     7     4   256    4
    ...
  In use 2357K, total allocated 2744K; utilization 85.9%

The number of mbufs allocated to packet headers (mbpl) keeps increasing
and the machine will die sometime after the total allocated goes over
(roughly) 12000K.

This is triggered by some of the traffic passing through the bridge.  Other
people have reported no problems with similar setups.  I have about 5 NetBSD
machines on le0 and 1 NetBSD, 3 Windows machines on qe1.  qe0 is conencted
to a DSL router.  One of the Windows machines runs peer-to-peer file sharing.
Traffic is both IPv4 and IPv6.

Looking at this is on my list of things to do.  However, I'm unlikely to
get to it soonish.  I'm not sure if the problem is in the bridge code, or
somewhere else in the network stack and is triggered by the bridge and/or
ipfilter code.  Also, I don't see anything in the changes post-1.6ZG that
will affect this (ignoring bugs in ipfilter 4).

J

PS.  This is almost certainly not a sparc-specific problem.

-- 
  My other computer also runs NetBSD    /        Sailing at Newbiggin
        http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/