Subject: Re: telnet loop while trying to flush revoked tty FD
To: Bill Studenmund <wrstuden@netbsd.org>
From: john heasley <heas@shrubbery.net>
List: tech-net
Date: 06/15/2003 20:48:18
Tue, Jun 03, 2003 at 10:48:44PM +0000, john heasley:
> Mon, Jun 02, 2003 at 02:32:08PM -0700, Bill Studenmund:
> > On Sat, 31 May 2003, john heasley wrote:
> > 
> > > I ran this patch by Matthias and Martin.  They did not notice anything
> > > wrong with the patch described below, but were concerned that it might
> > > be masking a bug that is the root cause and since I am not certain how
> > > telnet reached the described state, they thought it best to solicit
> > > further comment here.
> > 
> > Well, how about we figure out more what's going on then? :-)
> 
> :) I will try to reproduce it.  probably be a week before i can get
> back to this.  Thanks for the reply!
> 
> > > so, here goes....
> > >
> > > I am not positive how telnet gets into this situation; undoubtedly it
> > > is expect or me doing something dumb with expect.  However, it appears
> > > that while telnet is trying to exit, it can get stuck in it's final
> > > attempt to flush it's tty when the tty is already dead.
> > >
> > > In the instance that I caught, lsof indicated that the filedescriptor
> > > had been revoked, like so:
> > >
> > > telnet  22519 heas    0u  VBAD                            (revoked)
> > > telnet  22519 heas    1u  VBAD                            (revoked)
> > > telnet  22519 heas    2u  VBAD                            (revoked)
> > >
> > > causing it to loop in SetForExit->EmptyTerminal.
> > >
> > > Besides updating this comment that i stumbled upon, I changed changed
> > > EmptyTerminal to return when ttyflush returns "permanent failure"
> > 
> > What exactly is EmptyTerminal trying to do? What is the loop you're
> > escaping out of waiting for?
> 
> I am unsure.  could be anything, but I'd guess that it was something
> along the lines of "Connection closed by remote".  that is, the parent
> process (expect) went away in some fashion and at some later time the
> remote host closed the session due to it's inactivity timer.

took me far too long to follow this...telnet code sure is twisted.

I came across
	 bin/20304: Telnet fails to properly handle SIGPIPE on its terminal.
which mentions Fbsd PR 45995.  the fix provided is incomplete.  if instead of
	telnet localhost 25 | grep -q .
you use
	telnet localhost 25 |& grep -q .
telnet ends up in a loop servicing SIGPIPE while trying to display an
exit message.

then, to recreate the issue I bumped into where the tty file descriptors
has been revoked, I used an expect script (free pty allocation :) to
connect to chargen and a program to revoke the pty.

The following fixes both of these issues by
1) artificially consuming any data in the ring if a write error occurs,
   so that functions that check for/try to flush waiting data before
   exiting think it is empty
2) disabling SIGPIPE handling once called the first time. there is no
   need to handle it a second time, thus producing the loop [setjmp()
   .... fwrite() in tn()
and
3) check for errors from polling the tty in process_rings, so that it
   return < 0 on error (like the comment say it should).

I believe this covers the bases.  everything i have tried works as expected.

Index: sys_bsd.c
===================================================================
RCS file: /cvsroot/src/usr.bin/telnet/sys_bsd.c,v
retrieving revision 1.23
diff -u -d -u -r1.23 sys_bsd.c
--- sys_bsd.c	2003/03/15 04:48:22	1.23
+++ sys_bsd.c	2003/06/16 03:32:12
@@ -420,7 +420,8 @@
      * Write any outstanding data before switching modes
      * ttyflush() returns 0 only when there is no more data
      * left to write out, it returns -1 if it couldn't do
-     * anything at all, otherwise it returns 1 + the number
+     * anything at all, it returns -2 if writing returns a
+     * permanent failure, otherwise it returns 1 + the number
      * of characters left to write.
 #ifndef	USE_TERMIO
      * We would really like to ask the kernel to wait for the output
@@ -868,6 +869,7 @@
 deadpeer(sig)
     int sig;
 {
+	signal(SIGPIPE, SIG_DFL);
 	setcommandmode();
 	longjmp(peerdied, -1);
 }
@@ -1188,6 +1190,10 @@
     if (set[0].revents & POLLOUT) {
 	returnValue |= netflush();
     }
+
+    if (set[1].revents & POLLHUP || set[2].revents & POLLHUP)
+	return(-1);
+
     if (set[1].revents & POLLOUT) {
 	returnValue |= (ttyflush(SYNCHing|flushout) > 0);
     }
Index: terminal.c
===================================================================
RCS file: /cvsroot/src/usr.bin/telnet/terminal.c,v
retrieving revision 1.9
diff -u -d -u -r1.9 terminal.c
--- terminal.c	2003/03/15 04:48:22	1.9
+++ terminal.c	2003/06/16 03:32:12
@@ -158,10 +158,12 @@
 	ring_consumed(&ttyoring, n);
     }
     if (n < 0) {
-	if (errno == EAGAIN)
+	if (errno == EAGAIN) {
 	    return -1;
-	else
+	} else {
+	    ring_consumed(&ttyoring, ring_full_count(&ttyoring));
 	    return -2;
+	}
     }
     if (n == n0) {
 	if (n0)