[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Complete lock-up from using pkgsrc/net/darkstat
It also might be relevant to note which port you're running. It must
be capable of having re and wm interfaces, since you name them, but
that still includes a fair bit.
I still see no statement which port(s) you're running...
Here my brain equated port with ethernet, not with NetBSD port :P These
are all NetBSD/amd64 systems.
Maybe the problem is related to the number of DNS lookups the child
process is doing rather than the number of TB the parent process is
I know DNS can be a big issue when there're a lot of attempts to look up
reverse DNS, which is often broken, so I run darkstat with DNS lookup
disabled, like so:
darkstat_flags="-i re0 -b 127.0.0.1 --no-dns"
Where I am I don't have all the things I normally have, so I'm still
waiting for a null modem adapter so I can get a serial console on the
machine that's physically local. Once that's here, I'm going to try very
hard to get a lockup.
? specifically, it has a memory leak that runs the system it runs on out
I'm guessing it's a memory issue with darkstat --
of RAM. I bet if you add a ton of swap to a system on which you run
darkstat, you'll find it runs longer before it hangs, and, I'm guessing
you'll notice there is a lot of swap in use before it hangs
darkstat is run as user "nobody" and shows 23 megabytes after a week. It
has no special unlimiting of any resources. The systems where issues were
seen range from 4 to 16 gigabytes of memory, and I'd have noticed if any
of them were in to swap at all (none were).
Is it possible the machine is not, strictly, hung, just doing something
that renders it unresponsive for a human-perceptible time? You wrote of
having to get remote hands to poke an unresponsive machine; how long did
that take? Did your remote hands notice whether the disk light was lit
(if there is such a light)?
When a system was in this specific state, I had someone plug in a USB
keyboard and tell me if any new green text appeared on the screen. It did
not (he sent a photo of the screen, too). I then asked him to press (and
not hold) the power button. He did, and he said nothing happened - nothing
on the screen, no disk lighting, et cetera (I told him to look for that).
The systems normally power themselves down relatively nicely from a simple
I was communicating with him in the morning, hours after attempting to
stop darkstat, so it had plenty of time to recover. I had also logged in
to the backup machine and saw that I couldn't reach the internal interface
of the frozen machine.
I've had machines appear to lock up hard when what's actually going on
is that a large process is dumping core.
If it didn't finish in hours, then that would be a problem :)
I'll post more when I've got a serial console set up.
Main Index |
Thread Index |