Subject: kernel resource leak? (NetBSD-current from February)
To: None <current-users@netbsd.org>
From: Jarle Greipsland <jarle@idt.unit.no>
List: current-users
Date: 10/13/1994 21:07:14
Hi, let me start it off by declaring that this is not a bug report. It's
more of an inquiry to see if someone else has detected the behaviour
described below, and may know something about what causes it. So, if
someone (whoever that might be) finds that this post is lacking in detail,
either press 'd' for delete or email me for more details.
Okay, this is old stuff and I should probably have reported it a long time
ago. Sorry. I'm partly responsible for running a fileserver for a bunch
of PCs, mainly OS/2 machines.
The fileserver:
OS: NetBSD-current from late February 94 (I told you it was old! And don't
bother with why I installed -current, okay :-)
Some options from config 'NFSSERVER, NFSCLIENT, GATEWAY, SCSI, DDB' ++
(more available upon request)
HW: i486-66, 256kb cache, EISA bus, 3 WD8013EBT boards, aha1742,
2 Quantum PD1800S (1800 MB SCSI 2), 1 Maxtor XT-8702S (578MB SCSI 1),
1 Quantum PD1225S (1200 MB SCSI 2). "No-brand" VGA board.
2 16550 uarts, 1 lpt0 printer port
System setup:
serving approx 30 OS/2 boxes on one segment (NFS, BOOTP)
serving approx 20 OS/2, DOS or Windoze boxes on another segment (NFS,BOOTP)
(serving == exports read only several directories with software packages)
hooked up to 'the world' (cisco router, no clients) on the third segment.
maildrop for < 10 people.
hp laserjet on printer port.
28.8 uCom modem on one of the serial ports
runs: 4 nfsd, gated, xntpd, sendmail, lpd + standard stuff
Problem: Over a week's time or so, sometimes more, sometimes less, the load
(as reported by w and uptime) gradually picks up, unnoticeably at first,
then at a more rapid rate. Whenever no external activity takes place it
drops to approx 0, but as soon as an nfs-daemon or printer filter (or
whatever) gets work to do the load increases rapidly. The funny thing is
that almost no cpu time is spent in user mode, the system seems to devour
cpu cycles for its internal use at at terrifying rate. My hunch is that
there is a resource leak somewhere in there, and that this resource
eventually gets scarce enough that the processes have to really compete for
it. This can explain why all processes, even telnets and shells, start to
spend extra time in the system cpu state (as soon as they get active).
I suspect a memory leak, but I don't know for sure. The only thing I see
that I find a bit odd, but don't have enough knowledge about to interpret
properly, is the output from 'vmstat -m' just before we rebooted it. Down
this list I see:
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
16 1147 901 3299684 1280 0
32 533 235 648778 640 0
64 2819 317 1730772 320 215
128 195 285 69949362 160 5609735
256 177 79 206599 80 865
512 1362 6 10615 40 0
1024 23 5 3899612 20 0
What does the couldfree imply? Does a high number in the couldfree column
signify a problem or is it just an 'interesting tidbit'?
Also, it seems that it's the vnodes that really gobbles memory.
vnodes 1379 678K 682K 3687K 10527 0 0
but it's the mbufs that has the highest frequency
mbuf 8 2K 21K 3687K 69533027 0 0
The rest of the 'stats by type' mostly say below 10K, with a few above 10K,
but all below 100K. The memory totals says:
emory Totals: In Use Free Wasted Requests
1042K 129K 16K 79989910
Is this the way it ought to be? If not, can this be caused by the OS/2
machines mounting NetBSD partitions and never unmounting them? (I suspect
that client activity triggers the described behaviour, because during
summer break, when no, or just a few, students were using the PCs, the system
behaved impeccably.)
So, I guess my question really is: Has anyone else seen this behaviour?
Anyone know what may cause it? Is it not a memory leak, but something
completely different? If this triggers someones longterm memory, is it
fixed in 1.0(beta)?
We're planning to upgrade to 1.0 as soon as it becomes available, so that
may solve our problem (But we may have to look for alternatives if it
doesn't. That's why I would like to know.) Anyway, don't waste any time
on this one unless it rings a bell fairly immediately. That includes you,
mycroft :-)
-jarle
PS. The phase of the moon doesn't seem to have any influence. Just thougt
some of you might like to know..... DS.
----
"This terminal is no more. It has ceased to be. It's expired and
gone to meet its maker. This is a late terminal. It's a stiff.
Bereft of life, it rests in peace. If you hadn't nailed it to the
bench, it would be pushing up the daisies. It's run down the
curtain and joined the choir invisible. This is an X-Terminal!"
- Unknown