(re-adding current-users to the discussion - maybe someone has a clue?) I reported:
Well, I just experienced another hang, about an hour ago.
I've been collecting mbuf data every 30 seconds. Around 11:04 AM PST today, the mbufs-in-use started climbing. From an earlier "stable" level of about 530 mbufs, it more than doubled to 1112. It stayed at that level for several minutes, and then dropped slightly. But themachine never actually recovered, and an outbound ping still complained with ENOBUF.Funny thing is, even though it would appear that the machine had reached a limit on mbufs, the vmstat data still reports 0 failures.
Greg responded:
hmm. I would also check: are there any failures at all in pool stats? i wonder if there is some other error path, because 1000 mbufs is not really a lot. where else are mbufs cached, because the pool is showing no releases, so something more complicated is going on.
The total line from vmstat also shows 0 failures.I'm also somewhat confused, since vmstat is reporting ~1100 mbufs, yet netstat only mentions ~530. Where are the other 550 mbus?
Some semi-random questions/musings: * If I were to force a crash dump, is there any way to grovel through the mbufs to figure out who/what owns them? * Even though vmstat reports no failures, there's obviously some limit being reached. Perhaps there is a request at some elevated IPL (or similarly restrictive condition) and the caller has indicated that WAITint is not permitted? If this is a possibility, is there any way for me to force pre-allocation of a large quantity of mbufs? ------------------------------------------------------------------------- | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -------------------------------------------------------------------------