Subject: kern/7414: 1.4A processes hang in mclpl - severve mbuf leak?
To: None <gnats-bugs@gnats.netbsd.org>
From: None <proff@suburbia.net>
List: netbsd-bugs
Date: 04/18/1999 09:06:26
>Number: 7414
>Category: kern
>Synopsis: 1.4A processes hang in mclpl - severve mbuf leak?
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 18 09:05:00 1999
>Last-Modified:
>Originator: Julian Assange
>Organization:
>Release: <NetBSD-current source date> NetBSD-current Apr 15 1999
>Environment:
System: NetBSD suburbia.net 1.4A NetBSD 1.4A (SUBURBIA.PROF) #0: Thu Apr 15 21:19:38 EST 1999 root@suburbia.net:/orb/s/netbsd/usr/src/sys/arch/i386/compile/SUBURBIA.PROF i386
>Description:
There appears to be an mbuf cluster leak in 1.4A (see uname above). After 24 hours, userland
processes which attempt to allocate a mbuf cluster hang in mclpl.
This did not occur in -current from feb 1999, and their has been no configuration changes
likely to affect the issue subsequently (other than the upgrade to 1.4A)
# netstat -m
2966 mbufs in use:
2579 mbufs allocated to data
387 mbufs allocated to socket names and addresses
7034/256 mapped pages in use
896 Kbytes allocated to network (1567% in use)
0 requests for memory denied
0 requests for memory delayed
3765 calls to protocol drain routines
# ps axl|fgrep mclpl
98 186 1 0 -22 0 1824 4 mclpl DWs ?? 3:59.93 named -u name
0 199 197 0 -22 0 16 160 mclpl DL ?? 0:00.01 nfsd: server
0 200 197 0 -22 0 16 160 mclpl DL ?? 0:00.02 nfsd: server
0 201 197 0 -22 0 16 160 mclpl DL ?? 0:03.67 nfsd: server
0 202 197 0 -22 0 16 160 mclpl DL ?? 0:00.03 nfsd: server
0 20733 265 0 -22 0 360 4 mclpl DW ?? 0:26.39 /usr/pkg/sbin
87 257 1 0 -22 0 116 4 mclpl DW v0- 5:47.25 qmail-send
88 273 247 0 -22 5 14056 4 mclpl DWN v0- 3:12.68 (squid) (squi
11004 1706 275 1 -22 0 428 4 mclpl DW+ v0 0:00.28 ps -axl
0 1720 276 3 -22 0 432 4 mclpl DW+ v1 0:00.24 ps -axl
The last two entries hung due to |less, which seems to allocate a mbuf cluster
(presumably for pipe buffers?).
#netstat -s
ip:
1190728 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with length > max ip packet size
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
152 fragments received 0 fragments dropped (dup or out of space)
0 malformed fragments dropped
0 fragments dropped after timeout
33 packets reassembled ok
1149941 packets for this host
1388 packets for unknown/unsupported protocol
33837 packets forwarded (0 packets fast forwarded)
257 packets not forwardable
0 redirects sent
1421418 packets sent from this host
9 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
1242 output packets discarded due to no route
334 output datagrams fragmented
1388 fragments created
157 datagrams that can't be fragmented
icmp:
1493 calls to icmp_error
0 errors not generated because old message was icmp
Output histogram:
echo reply: 13
destination unreachable: 1478
source quench: 14
57 messages with bad code fields
0 messages < minimum length
0 bad checksums
0 messages with bad length
Input histogram:
echo reply: 8
destination unreachable: 1332
source quench: 1
echo: 13
time exceeded: 55
13 message responses generated
igmp:
0 messages received
0 messages received with too few bytes
0 messages received with bad checksum
0 membership queries received
0 membership queries received with invalid field(s)
0 membership reports received
0 membership reports received with invalid field(s)
0 membership reports received for groups to which we belong
0 membership reports sent
tcp:
1257004 packets sent
1066241 data packets (473637694 bytes)
2066 data packets (598254 bytes) retransmitted
176070 ack-only packets (643130 delayed)
0 URG only packets
43 window probe packets
17074 window update packets
8962 control packets
1007348 packets received
731584 acks (for 473563269 bytes)
8842 duplicate acks
0 acks for unsent data
716652 packets (73405435 bytes) received in-sequence
3962 completely duplicate packets (1635011 bytes)
6 old duplicate packets
192 packets with some dup. data (56705 bytes duped)
9690 out-of-order packets (7373710 bytes)
1722 packets (1721 bytes) of data after window
1721 window probes
1680 window update packets
882 packets received after close
1 discarded for bad checksum
0 discarded for bad header offset fields
0 discarded because packet too short
3222 connection requests
1939 connection accepts
4498 connections established (including accepts)
6816 connections closed (including 146 drops)
426 embryonic connections dropped
647996 segments updated rtt (of 642671 attempts)
3157 retransmit timeouts
41 connections dropped by rexmit timeout
50 persist timeouts (resulting in 0 dropped connections)
164 keepalive timeouts
11 keepalive probes sent
15 connections dropped by keepalive
171337 correct ACK header predictions
236352 correct data packet header predictions
7404 PCB hash misses
3048 dropped due to no socket
60 connections drained due to memory shortage
12 bad connection attempts
1970 SYN cache entries added
0 hash collisions
1939 completed
0 aborted (no space to build PCB)
25 timed out
0 dropped due to overflow
0 dropped due to bucket overflow
6 dropped due to RST
0 dropped due to ICMP unreachable
285 duplicate SYNs received for entries already in the cache
41 SYNs dropped (no route or no space)
udp:
142572 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
1236 dropped due to no socket
0 broadcast/multicast datagrams dropped due to no socket
419 dropped due to full socket buffers
140917 delivered
118133 PCB hash misses
141759 datagrams output
#netstat -n | egrep -v LISTEN
root@suburbia:~# cat netstatn
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 203.4.184.1.25 128.32.43.207.2900 ESTABLISHED
tcp 0 0 203.4.184.1.25 129.132.7.153.60663 ESTABLISHED
tcp 0 0 203.4.184.1.62307 134.53.213.10.113 TIME_WAIT
tcp 0 0 203.4.184.1.62308 206.20.254.16.113 TIME_WAIT
tcp 44 0 127.0.0.1.53 127.0.0.1.62309 CLOSE_WAIT
tcp 0 0 127.0.0.1.62309 127.0.0.1.53 FIN_WAIT_2
tcp 0 0 203.4.184.1.25 200.17.178.18.54749 ESTABLISHED
tcp 0 0 203.4.184.1.25 134.53.213.10.18457 ESTABLISHED
tcp 44 0 127.0.0.1.53 127.0.0.1.62311 CLOSE_WAIT
tcp 0 0 127.0.0.1.62311 127.0.0.1.53 FIN_WAIT_2
tcp 0 0 203.4.184.1.25 203.15.111.1.64055 ESTABLISHED
tcp 0 0 203.4.184.1.25 203.15.111.1.64056 ESTABLISHED
tcp 334 0 203.4.184.1.25 206.20.254.16.15879 ESTABLISHED
tcp 0 0 203.4.184.1.23 203.4.184.224.64179 ESTABLISHED
tcp 45246 0 203.4.184.233.62318 209.81.8.250.80 CLOSE_WAIT
tcp 0 25920 203.4.184.1.3128 203.4.184.224.64180 ESTABLISHED
tcp 0 15784 203.4.184.1.3128 203.4.184.224.64182 ESTABLISHED
tcp 65700 0 203.4.184.233.62327 209.81.8.250.80 ESTABLISHED
tcp 0 0 203.4.184.233.62339 209.81.8.250.80 LAST_ACK
tcp 0 0 203.4.184.233.62341 209.81.8.250.80 LAST_ACK
tcp 65700 0 203.4.184.233.62342 209.81.8.250.80 ESTABLISHED
tcp 0 38952 203.4.184.1.3128 203.4.184.224.64205 ESTABLISHED
tcp 0 0 203.4.184.1.3128 203.4.184.224.64208 FIN_WAIT_2
tcp 320 0 203.4.184.1.22 203.4.184.224.1023 ESTABLISHED
tcp 0 0 203.4.184.1.63940 203.4.184.210.6000 ESTABLISHED
tcp 156 0 203.4.184.233.64332 203.37.45.2.6667 CLOSE_WAIT
tcp 0 0 203.4.184.1.22 203.4.184.227.1023 ESTABLISHED
tcp 65700 0 203.4.184.233.65423 131.170.24.190.1534 ESTABLISHED
tcp 0 0 203.4.184.233.65424 131.170.24.190.21 ESTABLISHED
tcp 52 0 203.4.184.233.65440 204.152.184.75.7326 ESTABLISHED
tcp 0 0 203.4.184.233.65441 203.15.111.22.7979 ESTABLISHED
tcp 0 0 127.0.0.1.65525 127.0.0.1.65524 ESTABLISHED
tcp 0 0 127.0.0.1.65524 127.0.0.1.65525 ESTABLISHED
tcp 0 0 127.0.0.1.65527 127.0.0.1.65526 ESTABLISHED
tcp 0 0 127.0.0.1.65526 127.0.0.1.65527 ESTABLISHED
tcp 0 0 127.0.0.1.65529 127.0.0.1.65528 ESTABLISHED
tcp 0 0 127.0.0.1.65528 127.0.0.1.65529 ESTABLISHED
tcp 0 0 127.0.0.1.65531 127.0.0.1.65530 ESTABLISHED
tcp 0 0 127.0.0.1.65530 127.0.0.1.65531 ESTABLISHED
tcp 0 0 127.0.0.1.65533 127.0.0.1.65532 ESTABLISHED
tcp 0 0 127.0.0.1.65532 127.0.0.1.65533 ESTABLISHED
udp 0 0 127.0.0.1.56140 127.0.0.1.53
udp 0 0 127.0.0.1.56141 127.0.0.1.53
udp 0 0 127.0.0.1.56142 127.0.0.1.53
udp 0 0 127.0.0.1.56143 127.0.0.1.53
udp 0 0 127.0.0.1.56147 127.0.0.1.53
udp 0 0 127.0.0.1.56148 127.0.0.1.53
udp 0 0 127.0.0.1.56152 127.0.0.1.53
udp 0 0 203.4.184.3.53 *.*
udp 0 0 203.4.184.2.53 *.*
udp 0 0 203.4.184.233.53 *.*
udp 5944 0 127.0.0.1.53 *.*
udp 0 0 203.4.184.34.53 *.*
udp 0 0 203.4.184.33.53 *.*
udp 16798 0 203.4.184.222.53 *.*
udp 16662 0 203.4.184.1.53 *.*
udp 0 0 10.0.0.1.53 *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr
f06d6484 dgram 0 0 0 f046f040 0 f07b80c0
f0798bac dgram 0 0 0 f046f040 0 f06fecc0
f0731200 stream 0 0 0 f0717d00 0 0
f07027bc stream 2500 0 0 f070d480 0 0
f073167c stream 0 0 0 f0717a40 0 0
f07313ec stream 0 0 0 f0717a80 0 0
f0731720 stream 0 0 0 f0717b00 0 0
f0731348 stream 0 0 0 f070dfc0 0 0
f07319b0 stream 0 0 0 f072da00 0 0
f07317c4 stream 0 0 0 f072de40 0 0
f07315d8 stream 0 0 0 0 0 0
f0731a54 stream 0 0 0 0 0 0
f07e0db8 stream 0 0 0 0 0 0
f06d6008 dgram 0 0 0 f046f040 0 f06c8bc0
f06a8d78 dgram 0 0 0 f046f040 0 f069a540
f06a8cd4 stream 0 0 0 f06c8cc0 0 0
f06a8c30 stream 0 0 0 f06c8240 0 0
f06a8b8c stream 0 0 0 f06cfd40 0 0
f06a8ae8 stream 0 0 0 f06c82c0 0 0
f06a8a44 stream 0 0 0 f06c8e00 0 0
f06a89a0 stream 0 0 0 f06c8840 0 0
f06a88fc stream 0 0 0 f06cff00 0 0
f06a8858 stream 0 0 0 f06c8e40 0 0
f06a87b4 stream 0 0 0 f06cf700 0 0
f06a8710 stream 0 0 0 f06cff40 0 0
f06a866c stream 0 0 0 f06cf4c0 0 0
f06a85c8 stream 0 0 0 f06cf780 0 0
f06a8338 stream 0 0 0 f06cf580 0 0
f07984a0 stream 0 0 0 f07e6000 0 0
f0798ee0 stream 0 0 0 f077e800 0 0
f0798b08 stream 0 0 0 f0730100 0 0
f0798d98 stream 0 0 0 f06d8100 0 0
f06a83dc stream 0 0 0 f06cf540 0 0
f06d6670 dgram 0 0 0 f046f040 0 f06c8d00
f0702154 stream 0 0 0 f06fefc0 0 0
f07020b0 stream 0 0 0 f06fedc0 0 0
f070200c stream 0 0 0 f06cf0c0 0 0
f06d6ec4 stream 0 0 0 f069aa80 0 0
f0798024 dgram 0 0 0 f046f040 0 f07ac740
f07980c8 stream 0 0 0 f0788300 0 0
f0798c50 stream 0 0 0 f07b8200 0 0
f06d6d7c stream 0 0 0 f07b8180 0 0
f06d633c stream 0 0 0 f07b8240 0 0
f070229c stream 0 0 0 f072de80 0 0
f07021f8 stream 39 0 0 f070d6c0 0 0
f0702a4c stream 0 0 0 f07b8700 0 0
f0702718 stream 0 0 0 f070d980 0 0
f07025d0 stream 0 0 0 f070d940 0 0
f0702e24 stream 0 0 0 f07acd00 0 0
f07023e4 stream 0 0 0 f07e6840 0 0
f0702c38 stream 0 0 0 f06d8580 0 0
f07029a8 stream 0 0 0 f06fe980 0 0
f0702ec8 stream 0 0 0 f07ace80 0 0
f0702d80 stream 0 0 0 f0788c40 0 0
f0702904 stream 0 0 0 f072d580 0 0
f06d6e20 stream 0 0 0 f06d8480 0 0
f0702860 stream 0 0 0 f079e980 0 0
f0702cdc dgram 0 0 0 f046f040 0 f0788540
f06a8524 dgram 0 0 0 f046f040 0 f06549c0
f0684148 dgram 0 0 0 f046f040 0 0
f07e0564 stream 0 0 0 f06d8940 0 0
f07e0608 stream 0 0 0 f06d8cc0 0 0
f07e0044 stream 0 0 0 f0717500 0 0
f07e07f4 stream 0 0 0 f0717280 0 0
f0684000 dgram 4078 0 fadaa6b4 0 f0717600 0 /dev/log
# ipfstat -s | head -11
IP states added:
7906 TCP
13474 UDP
66 ICMP
2436367 hits
211570 misses
0 maximum
0 no memory
124 active
13472 expired
7850 closed
# dmesg
NetBSD 1.4A (SUBURBIA.PROF) #0: Thu Apr 15 21:19:38 EST 1999
root@suburbia.net:/orb/s/netbsd/usr/src/sys/arch/i386/compile/SUBURBIA.PROF
cpu0: family 5 model 2 step 5
cpu0: Intel Pentium (P54C) (586-class)
real mem = 66715648
avail mem = 52948992
using 2430 buffers containing 9953280 bytes of memory
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o enabled, memory enabled
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82437FX System Controller (TSC) (rev. 0x02)
pcib0 at pci0 dev 7 function 0
pcib0: Intel 82371FB PCI-to-ISA Bridge (PIIX) (rev. 0x02)
pciide0 at pci0 dev 7 function 1: Intel 82371FB IDE controller (PIIX)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <Conner Peripherals 1275MB - CFS1275A>
wd0: drive supports 16-sector pio transfers, lba addressing
wd0: 1219MB, 2477 cyl, 16 head, 63 sec, 512 bytes/sect x 2496876 sectors
wd0: 32-bits data port
wd0: drive supports PIO mode 4, DMA mode 2
wd1 at pciide0 channel 0 drive 1: <QUANTUM FIREBALL EL5.1A>
wd1: drive supports 16-sector pio transfers, lba addressing
wd1: 4892MB, 10602 cyl, 15 head, 63 sec, 512 bytes/sect x 10018890 sectors
wd1: 32-bits data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
pciide0: primary channel interrupting at irq 14
pciide0: secondary channel wired to compatibility mode
wd2 at pciide0 channel 1 drive 0: <Conner Peripherals 1275MB - CFS1275A>
wd2: drive supports 16-sector pio transfers, lba addressing
wd2: 1219MB, 2477 cyl, 16 head, 63 sec, 512 bytes/sect x 2496876 sectors
wd2: 32-bits data port
wd2: drive supports PIO mode 4, DMA mode 2
wd3 at pciide0 channel 1 drive 1: <Conner Peripherals 1275MB - CFS1275A>
wd3: drive supports 16-sector pio transfers, lba addressing
wd3: 1219MB, 2477 cyl, 16 head, 63 sec, 512 bytes/sect x 2496876 sectors
wd3: 32-bits data port
wd3: drive supports PIO mode 4, DMA mode 2
pciide0: secondary channel interrupting at irq 15
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
wd1(pciide0:0:1): using PIO mode 4, DMA mode 2 (using DMA data transfers)
wd2(pciide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
wd3(pciide0:1:1): using PIO mode 4, DMA mode 2 (using DMA data transfers)
fxp0 at pci0 dev 8 function 0: Intel EtherExpress Pro 10+/100B Ethernet
fxp0: interrupting at irq 12
fxp0: Ethernet address 00:90:27:13:b0:38
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ncr0 at pci0 dev 9 function 0: ncr 53c810a fast10 scsi
ncr0: interrupting at irq 7
ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus0 at ncr0: 8 targets, 8 luns per target
sd0 at scsibus0 targ 0 lun 0: <DEC, DSP3133LS, X441> SCSI2 0/direct fixed
sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
sd0: 1283MB, 3117 cyl, 10 head, 84 sec, 512 bytes/sect x 2628330 sectors
isa0 at pcib0
ne0 at isa0 port 0x300-0x31f irq 3
ne0: NE2000 Ethernet
ne0: Ethernet address 00:c0:f0:1a:a3:21
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com2 at isa0 port 0x100-0x107 irq 5: ns16550a, working fifo
com3 at isa0 port 0x108-0x10f irq 5: ns16550a, working fifo
com4 at isa0 port 0x110-0x117 irq 5: ns8250 or ns16450, no fifo
com6 at isa0 port 0x120-0x127 irq 5: ns8250 or ns16450, no fifo
com7 at isa0 port 0x128-0x12f irq 5: ns8250 or ns16450, no fifo
com8 at isa0 port 0x130-0x137 irq 5: ns8250 or ns16450, no fifo
com9 at isa0 port 0x138-0x13f irq 5: ns8250 or ns16450, no fifo
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
vt0 at isa0 port 0x60-0x6f irq 1
vt0: mda, mono, 8 scr, mf2-kbd, [R3.32]
vt0: console
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.1 (slowidle)
apm0: A/C state: on
apm0: battery charge state: no battery
apm0: estimated 0m
biomask c080 netmask d088 ttymask d08a
Profiling kernel, textsize=1359264 [f0100000..f024bda0]
How-To-Repeat:
<code/input/activities to reproduce the problem (multiple lines)>
>How-To-Repeat:
>Fix:
>Audit-Trail:
>Unformatted: