Hi,
It also might be relevant to note which port you're running. It must be capable of having re and wm interfaces, since you name them, but that still includes a fair bit. I still see no statement which port(s) you're running...
Here my brain equated port with ethernet, not with NetBSD port :P These are all NetBSD/amd64 systems.
Maybe the problem is related to the number of DNS lookups the child process is doing rather than the number of TB the parent process is counting?
I know DNS can be a big issue when there're a lot of attempts to look up reverse DNS, which is often broken, so I run darkstat with DNS lookup disabled, like so:
darkstat_flags="-i re0 -b 127.0.0.1 --no-dns"Where I am I don't have all the things I normally have, so I'm still waiting for a null modem adapter so I can get a serial console on the machine that's physically local. Once that's here, I'm going to try very hard to get a lockup.
? specifically, it has a memory leak that runs the system it runs on outI'm guessing it's a memory issue with darkstat --
of RAM. I bet if you add a ton of swap to a system on which you run darkstat, you'll find it runs longer before it hangs, and, I'm guessing you'll notice there is a lot of swap in use before it hangs
darkstat is run as user "nobody" and shows 23 megabytes after a week. It has no special unlimiting of any resources. The systems where issues were seen range from 4 to 16 gigabytes of memory, and I'd have noticed if any of them were in to swap at all (none were).
Is it possible the machine is not, strictly, hung, just doing something that renders it unresponsive for a human-perceptible time? You wrote of having to get remote hands to poke an unresponsive machine; how long did that take? Did your remote hands notice whether the disk light was lit (if there is such a light)?
When a system was in this specific state, I had someone plug in a USB keyboard and tell me if any new green text appeared on the screen. It did not (he sent a photo of the screen, too). I then asked him to press (and not hold) the power button. He did, and he said nothing happened - nothing on the screen, no disk lighting, et cetera (I told him to look for that). The systems normally power themselves down relatively nicely from a simple press.
I was communicating with him in the morning, hours after attempting to stop darkstat, so it had plenty of time to recover. I had also logged in to the backup machine and saw that I couldn't reach the internal interface of the frozen machine.
I've had machines appear to lock up hard when what's actually going on is that a large process is dumping core.
If it didn't finish in hours, then that would be a problem :) I'll post more when I've got a serial console set up. Thanks, John