tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Complete lock-up from using pkgsrc/net/darkstat



Brian Buhrow's note sparked another thought.

> Stop darkstat.  Machine locks.

Is it possible the machine is not, strictly, hung, just doing something
that renders it unresponsive for a human-perceptible time?  You wrote
of having to get remote hands to poke an unresponsive machine; how long
did that take?  Did your remote hands notice whether the disk light was
lit (if there is such a light)?

I've had machines appear to lock up hard when what's actually going on
is that a large process is dumping core.  If the machine has what I
consider insane amounts of core (say, 64G), if darkstat's rlimit lets
it eat most of that, if there's enough free disk to store a substantial
fraction of that, and darkstat has a bug that leads it to core when
it's killed, this could look very similar - trying to write a 40-50 gig
coredump will not be fast, even on the kind of machine that has 40-50
gigs of core to write.

Especially if you've configured swap and it's thrashing between reading
swap and writing the coredump.

Another possibility is that it does not truly leak VM, but balloons
until its attempts to grab more memory are rejected and then does some
kind of management of the memory it's been granted.  If its rlimit is
set high enough, the symptoms could be similar.  (Still need to posit
something that makes it try to drop core, though.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index