Paul Goyette <paul%whooppee.com@localhost> writes: > BTW, what are "calls to protocol drain routine"? These seem to go up > very slowly over time, and there were 18 at the time of the failure. There is a notion, I think from 4.4BSD, not really super deeply implemented, that under memory pressure there are drain callbacks that can free memory that is not really necessary. An example would be shrinking receive socket buffers. > I tried again to "ifconfig wm0 down" and the process hung. I tried to > switch back to another xterm session, and it was unable to re-draw the > window. This is to me a clue that the buffers in the wm driver have gotten messed up (bad ring pointers, etc., but I haven't even looked at the data structures in use). I would suggest not only netstat -m but also vmstat -m run in the background (every 15s?), saved to files. In addition to regular mbufs you should watch 'mclpl'. Note that some drivers (e.g, bnx) allocate a lot of clusters, as much as 512, to have ready in the receive ring. That's ok if you only have 1 of them, but we ran into issues with systems with 8 bnx interfaces. I have seen problems with dual wm cards under -5 and -6. The issue seems to be mishandling the pci bridge that's the card (or in the dual-wm chip). I have not seem this problem with straight wm without the extra bridge. You should increase NMBCLUSTERS and see if that helps. On a modern machine with multi GB of RAM, 4096 or 8192 should be fine. Seeing 'no buffer space available' is a clue. While there may be a ring corruption bug, it is far more likely to be triggered under low memory conditions; most people bump up NMBCLUSTERS so few persistently encouter low memory.
Attachment:
pgp5Gw4jU05HL.pgp
Description: PGP signature