Subject: Re: tty/thread machine starvation/lockups with 4.99.40 (sparc64)
To: Erik Fair <fair@netbsd.org>
From: Rafal Boni <rafal@pobox.com>
List: port-sparc64
Date: 12/12/2007 21:57:04
Erik Fair wrote:
> I/O to ttys has been the proximate cause of UNIX process unhappiness
> since time-immemorial. You can't kill (or swap) a process involved in
> DMA I/O (the subsequently completed I/O would then end up in some other
> process' memory, which would be ... bad), so the kernel doesn't permit
> that.

Well, you're right in many ways, though of course that's why we put data
in the kernel tty buffers, so all I/O goes there rather than directly to
process memory...

> If you want to watch real badness, hit ^S on a UNIX console, and wait a
> while. Depending on how many processes want to spew on the console, and
> how often, the process table will eventually fill up with unkillable
> processes, or ... zombies. This is one reason why syslogd(8) exists.

Sure, I've seen the pile-up of processes all blocked on a single
resource (be it something stuck in disk-wait that probably *is* pinned
for the reasons you state above), or the flow-control induced console
backup.  But this smells different...

[...]
> Just out of curiosity, if you put asterisk on the back side of a pty
> (e.g. with script(1), screen(1), ssh(1), etc), does the hang still
> happen? Or is asterisk directly opening /dev/console itself?

Yes, in fact it *is* on the back side of a SSH pty, and the other
processes that it blocks are *not* sharing that same tty/pty (e.g.,
login on the real console device).  Interrupt delivery also doesn't look
to be the culprit since the interrupt to kick me into DDB arrives fine.
 I have not tried just quitting DDB to see if that unwedges anything,
but I suspect not.

I guess I should have been more specific, but when I wrote 'tty lockups'
I really mean 'tty subsystem lockups' because that's what it looks like
is the culprit (possibly when there's threading involved as well).

--rafal