Port-macppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: "Stalled" system and runaway forking of 'cron' processes on macppc-6.99.28



On Fri, 6 Dec 2013, Donald Lee wrote:

This sounds *very* familiar, and I believe that the bug - and I believe
it is an OS bug - is in the architecture independent code.

I wonder if port-sun3/48432 is related.  The symptoms sound similar.

I can reproduce these stalls, and if you break into the debugger, you will
find many processes stopped on TSTILE. This appears to me to be a locking
problem. An interrupt handler grabbing a lock, per chance?

While I can switch virtual consoles, I cannot enter the debugger on the
machine that exhibits the problem--at least not with the built-in ADB
keyboard.  I might be able to with a USB keyboard, but will be awkward
to use.

I have been unable to nail it down, partly because it is hard to reproduce.
I have a test case or two, but it is intermittent, and sometimes takes hours to
trigger.

In my case, I would say its easy to reproduce, just not on demand.  It
will always occur, given enough time since last reboot--always less than
one day, so contrary to my prior assertion, it may show up when the nightly
maintenance scripts fire off.

I have two separate scenarios. The "hard" hang, and the "soft" one.

The "hard hang" always involves local, ATA disk. It looks like what you
describe. Stuff is still running, pings get responses. TCP connects
work, but no new procs can start. If you break into kernel debugger, you
see everybody (a lot of procs) waiting on TSTILE.

The "soft" hang is the same, with a difference. The soft hang
is "soft" because the disk activity being done is over NFS, and is
therefore interruptible.

I have not experienced anything like your "hard" hang as the machine on
which the problem occurs runs disklessly (being a laptop, the local disk
still has MacOS X 10.4.11 on it).

I set up another machine (PowerMac G4 AGP Graphics--aka "sawtooth" IIRC).
It runs off of local disk and does NOT exhibit the problem.  I cannot
operate this machine disklessly as the ethernet interface (gem0) dies
under the load.  Casual NFS/amd/ssh/etc. usage works fine.

Since no new processes (other than 'cron' it seems) may be forked,
no new SSH connections can be created either outbound or inbound.
Multi-threaded applications (like 'links' web browser) can create
new windows and network connections.

In my case, the problem machine is nominally my "bedroom terminal", so
I'm mostly running a pile of terminal windows with SSH sessions to other
hosts along with attempting to debug building 'firefox' on macppc.  Plus
'links' and a few other things.

--
|/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
| X  No HTML/proprietary data in email.   BSD just sits there and works!
|/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645



Home | Main Index | Thread Index | Old Index