Subject: NetBSD 3.0_BROKEN
To: None <port-amd64@netbsd.org>
From: Eric Radman <theman@eradman.com>
List: port-amd64
Date: 07/18/2006 17:45:05
Guys,

There is a serious problem with the kernel in the 3.0 release. The
kernel on my dual Opteron 250 box keeps slowly growing as long as the
box is up and under a heavy load:

$ uptime
11:15PM  up 18:41, 7 users, load averages: 3.49, 3.22, 2.96

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K  208M syncer/0   1:51  0.00%  0.00% [ioflush]

$ sudo reboot -r now

$ uptime
11:19PM  up 1 min, 2 users, load averages: 1.11, 0.34, 0.13

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K   68M syncer/0   0:00  0.00%  0.00% [ioflush]

$ uptime
11:25PM  up 6 mins, 3 users, load averages: 1.43, 1.20, 0.61

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K   94M syncer/0   0:00  0.00%  0.00% [ioflush]

$ uptime
11:29PM  up 11 mins, 4 users, load averages: 1.70, 1.66, 1.00

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K  100M syncer/0   0:00  0.00%  0.00% [ioflush]


After installing new kernel from src dated 2006-07-17:

$ uptime
11:33PM  up 1 min, 1 user, load averages: 2.68, 0.76, 0.28

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K   21M syncer/0   0:00  0.00%  0.00% [ioflush]

$ uptime
11:38PM  up 6 mins, 4 users, load averages: 3.52, 2.53, 1.23

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K   22M syncer/0   0:00  0.00%  0.00% [ioflush]

$ uptime
11:43PM  up 11 mins, 4 users, load averages: 2.44, 2.66, 1.68

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
   16 root      18    0     0K   21M syncer/0   0:00  0.00%  0.00% [ioflush]

So the problem has been fixed in 3.99, but the 3.0 ISO's should be
updated or marked as broken so that others don't suffer from this nasty
bug.

On amd64 this has caused the kernel to suddenly freeze when it reaches
399MB, which can happen in as little as 7 days. On i386 the problem is
there but not fatal until a heavy load is applied for 10 to 14 days.

This is the kernel configuration I used on amd64, i386 is similar:

include "arch/amd64/conf/GENERIC"

options         MULTIPROCESSOR
options         COM_MPLOCK
maxusers        256
options         SHMMAXPGS=59400
options         SHMSEG=512
options         SEMMNI=512
options         SEMMNS=1024
options         SEMMNU=512
options         SEMMAP=512
options         NMBCLUSTERS=4096
options         MAXDSIZ="(1024*1024*1024)"
options         DFLDSIZ="(1024*1024*1024)"
options         OPEN_MAX=512
options         CHILD_MAX=640

--
Eric Radman