Subject: exceeded mb_map sensibility?
To: None <current-users@NetBSD.ORG>
From: C. Ewen MacMillan <firstname.lastname@example.org>
Date: 09/17/1996 23:55:54
First of all, the id-ing type stuff - this is as you will see, not
in fact current (rather, based on current circa August 19):
tepe#> uname -a
NetBSD tepe.tezcat.com 1.2_BETA NetBSD 1.2_BETA (tepe-new) #1:
Mon Aug 26 12:40:10 CDT 1996
(the above has been reformatted for human-oriented mail-readers).
Hardware: ASUS P90, ASUS PCI200 SCSI Controller, 64MB.
NetBSD 1.2_BETA (tepe-new) #1: Mon Aug 26 12:40:10 CDT 1996
CPU: Pentium (GenuineIntel 586-class CPU)
real mem = 66715648
avail mem = 61054976
using 840 buffers containing 3440640 bytes of memory
isa0 at mainbus0
ep0 at isa0 port 0x300-0x30f irq 10: 3Com 3C509 Ethernet
ep0: aui/utp address 00:20:af:18:d6:67
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
npx0 at isa0 port 0xf0-0xff: using exception 16
pc0 at isa0 port 0x60-0x6f irq 1: color
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
pci0 at mainbus0 bus 0: configuration mode 1
vendor 0x8086 product 0x122d (class bridge, subclass host, revision 0x02) at pci
0 dev 0 function 0 not configured
vendor 0x8086 product 0x122e (class bridge, subclass ISA, revision 0x02) at pci0
dev 7 function 0 not configured
vendor 0x10a8 product 0x0000 (class display, subclass VGA, revision 0x00) at pci
0 dev 10 function 0 not configured
ncr0 at pci0 dev 12 function 0: NCR 53c810 SCSI
ncr0: interrupting at irq 11
ncr0: restart (scsi reset).
scsibus0 at ncr0
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST51080N, 0943> SCSI2 0/direct fixed
sd0: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
1030MB, 4826 cyl, 4 head, 109 sec, 512 bytes/sec
sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST15230N, 0638> SCSI2 0/direct fixed
sd1: sd1(ncr0:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
4095MB, 3992 cyl, 19 head, 110 sec, 512 bytes/sec
sd2 at scsibus0 targ 2 lun 0: <SEAGATE, ST15230N, 0638> SCSI2 0/direct fixed
sd2: sd2(ncr0:2:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
4095MB, 3992 cyl, 19 head, 110 sec, 512 bytes/sec
biomask 840 netmask c40 ttymask c52
changing root device to sd0a
And the relevant parts of the kernel config (the rest of the configuration
simply reflects deletions from generic, for devices we do not have):
machine i386 # architecture, used by config; REQUIRED
# Some BIOSes don't get the size of extended memory right. If you
# have a broken BIOS, uncomment the following and set the value
# properly for your system.
#options EXTMEM_SIZE=... # size of extended memory
options DUMMY_NOPS # speed hack; recommended
options INSECURE # insecure; allow /dev/mem writing for X
maxusers 64 # estimated number of users
options TIMEZONE=0 # time zone to adjust RTC time by
options DST=0 # daylight savings time used by RTC
options SWAPPAGER # paging; REQUIRED
options VNODEPAGER # mmap() of files
Back around the end of June I think it was, I first started running
into mb_map full errors - Chris D. (I think) suggested the increase
to 512 for NMBCLUSTERS which has worked until:
Sep 16 00:21:25 tepe /netbsd: mb_map full
Sep 16 00:30:12 tepe /netbsd: mb_map full
Sep 16 01:28:53 tepe /netbsd: mb_map full
Sep 16 03:36:48 tepe load: 15 minute load average = 5.05
Sep 16 03:39:02 tepe load: 15 minute load average = 5.18
We have been getting the error periodically for about the last
two weeks - for background that might make some sense of all
of this, we climbed for some reason from ~#350 to #115 in the
Freenix ratings (i.e. Top 1000 Usenet sites.)
Obviously, we have experienced a commensurate increase
in network traffic. Unfortunately, increasing NMBCLUSTERS does
not do what one might think it would - it seems to panic the machine
(completely wiping the dmesg buffer for some reason!). I have tried
both 768 and 1024 as values, and the problem is decidedly worse
at 1024. Due to the lack of a 1.1 oriented compatibilty interface for
if_name/if_unit, I cannot even use SNMP it seems to generate a trap
at the time the problem occurs. Since this is a production machine, I
cannot leave it set to drop to the debugger either, though my intuition
tells me whatever kills it, kills it HARD.
I realize that what I am providing is subjective - I cannot unfortunately
provide you with dmesg outputs from the increased buffer sizes, as
I cannot produce them. I do recall an experiment with a P75 (we dropped
it in to test the hardware) generating a weird pmap error in the same
Lastly, the performance curve goes down as the mb_map full errors
increase - it seems as though (based on top) whenever the threshold
is reached the machine is effectively non-responsive for a period of at
least 45 seconds. We have effectively dropped to about 25% of
performance (based on userland NNTP statistics, the only thing relevant
for this machine) of what we had about one week ago, with only a 2-3%
total increase in network traffic to the machine.
I have tried the PR 1903 bufcache patch under a slightly older revision
but it panicked on me a great deal, and would not boot above 7% bufcache.
Any suggestions, aside from the obvious (i.e. go back to 1.1, or some other
more i386 arch friendly OS)?
C. Ewen MacMillan