Subject: pkg/34442: lang/tcl-expect deadlock on solaris/linux
To: None <firstname.lastname@example.org, email@example.com,>
From: None <firstname.lastname@example.org>
Date: 08/31/2006 23:35:01
>Synopsis: lang/tcl-expect deadlock on solaris/linux
>Arrival-Date: Thu Aug 31 23:35:00 +0000 2006
>Originator: john heasley
>Release: NetBSD 3.99.17
System: NetBSD guelah 3.99.17 NetBSD 3.99.17 (guelah) #1: Tue Apr 18 01:51:21 UTC 2006 root@oak:/sys/arch/sparc64/compile/guelah sparc64
tcl-expect on linux/solaris can deadlock. What happens is data arrives on a
"channel" (eg: pty), your script consumes a portion of that data, then the
expect (the input matching loop) is continued on the existing buffer which
still has data. No new data is available from the channel, but read(2) is
called and we block instead of offering the contents of the buffer to the
If the process expect is communicating with happens to have reached a
prompting point, it will be waiting for input from expect and we deadlock,
because expect is also waiting for input from the process.
This does not occur on *BSD.
The problem is very timing sensitive. It is most prevalent when the network
is fast and the machine running RANCID is slow/loaded.
To hack this, I just added a poll() for readability. This hack has fixed
every complaint I've received from users of RANCID on linux/solaris.
The comments in exp_chan.c suggest that the file descriptor should be
non-blocking. My original hack was to just set it to be non-blocking. That
works for Linux, but caused problems with one of solaris' streams that would
bugger the user's terminal when expect exited.
Because the poll hack works and the tcl code is ugly, i've not expended any
effort to find the actual bug, which I believe is somewhere in Tcl.
I'll leave it to the tcl-expect maintainer to determine if this is appropriate
for pkgsrc. I submitted a bug report to the authors of expect about 4 years
ago, but have not heard from them.