tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: hangup in close(2) after posix_openpt(3)

On Sat, Oct 17, 2015 at 02:34:10AM +0900, Izumi Tsutsui wrote:
> For application side, it looks the behavior (how unread data should be
> handled on close(2)) is something like "implementation-defined."
> I'll try to put some workaround into the application.
> ("tcflush(s, TCIOFLUSH)" might work around)

That may or may not help.  There are some races in our pty code that can
cause the flush to block forever.  I found this out when I was doing the
libcurses test frame and also I think the same problem was experience
with anita.  I spent a lot of time digging at this and my conclusion was

The master and slave end of the pty effectively communicate using
"in-band" messages, there are control messages that are passed from the
master to the slave to tell it to flush, change tty settings and the
like.  The slave application does not see these messages, they are
processed inside the slave end pty read in the kernel and acted upon
there.  All this works fine when the data is flowing, nobody is the
wiser that there are extra bits in the data stream.  Things come unstuck
a bit if the master and slave do things in the "wrong" order.  With the
libcurses testframe I found that if the slave process performed a
read(2) too soon then things would hang up.  What was happening was the
slave was going into read, the master was then running a curses library
function that would result in a "flush your buffers and change your
terminal settings" action.  In side the pty code this would put an
internal message with the new settings to be applied after a flush sent
to the slave end but since the slave end was blocked in a read the flush
never happened and everything just hung up waiting.  I am guessing that,
naively, we could do a wakeup on the slave end to process the message
despite the read being blocked due to no data being available to return
to the caller - just process the message and return to sleep.

I worked around the problem by putting a very small delay in the curses
testframe pty code to allow enough time for the slave to handle things.
Also note that whether or not you see this race really depends on your
hardware, when I had the problem I would only infrequently see the issue
on my laptop but others were able to reliably reproduce the hang.

Brett Lymn
Let go, or be dragged - Zen proverb.

Home | Main Index | Thread Index | Old Index