Subject: bin/18984: telnet spins on dead tty
To: None <gnats-bugs@gnats.netbsd.org>
From: john heasley <heas@shrubbery.net>
List: netbsd-bugs
Date: 11/09/2002 01:52:40
>Number:         18984
>Category:       bin
>Synopsis:       telnet spins on dead tty
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 08 17:53:09 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     john heasley
>Release:        NetBSD 1.6H
>Organization:
	
>Environment:
	
	
System: NetBSD guelah 1.6H NetBSD 1.6H (guelahddb) #9: Mon Sep 16 01:23:36 UTC 2002 root@guelah:/u5/src/sys/arch/sparc/compile/guelahddb sparc
Architecture: sparc
Machine: sparc
>Description:
i have a set of scripts that use expect to login to devices via telnet 
(or ssh).  if the scripts are not configured properly, it is possible
that the expect script deadlocks waiting for the output it expects
once a login (username and password) is successful.

a timeout ensues and the expect script closes the tty (pty facing telnet)
and waits for for the child to exit.  and sometime after this, the device
closes the connection due to inactivity.

here's the telnet issue; telnet takes a SIGPIPE and spins out of control
trying to flush the file descriptor facing expect, terminal.c:ttyflush().

#0  ttyflush (drop=0) at /home/src/usr.bin/telnet/terminal.c:159
#1  0x804ed93 in TerminalNewMode (f=-1)
    at /home/src/usr.bin/telnet/sys_bsd.c:445
#2  0x8053cc8 in setcommandmode () at /home/src/usr.bin/telnet/terminal.c:249
#3  0x804f143 in deadpeer (sig=13) at /home/src/usr.bin/telnet/sys_bsd.c:869
#4  0x480fc1f0 in __sigtramp_sigcontext_1 () from /usr/lib/libc.so.12
#5  0x804e40a in netflush () at /home/src/usr.bin/telnet/network.c:145
  
so, the telnet process:
  
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
telnet  28617 heas    0u  VBAD                         (revoked)
telnet  28617 heas    1u  VBAD                         (revoked)
telnet  28617 heas    2u  VBAD                         (revoked)
telnet  28617 heas    3u  IPv4             0t0     TCP no PCB, CANTSENDMORE, CANTRCVMORE
telnet  28617 heas    4w  VREG    0,0      331 1022641 / (/dev/wd0a)

the write(fd=1) within TerminalWrite() returns -1 (errno = 5).  but,
ttyflush() only communicates the following in it's return value:

 *              Return value:
 *                      -1: No useful work done, data waiting to go out.
 *                       0: No data was waiting, so nothing was done.
 *                       1: All waiting data was written out.
 *                       n: All data - n was written out.
  
since ttyflush() isnt designed to return a "permanent failure" result,
the callee just calls it again, forever.

>How-To-Repeat:
use an expect script to telnet to something (like a router or smtp), close
the pty facing telnet and wait on it.

in my particular case, i'm using rancid's (www.shrubbery.net/rancid) clogin
script to telnet to a cisco.  since i had misconfigured it, it was expecting
a prompt ending in '#', not '>'.  this is where the expect deadlock occurs
followed by a timeout (expect_after()).
>Fix:
teach ttyflush() about permanent errors from write(2), presumably anything
other then EAGAIN.

the following (src/usr.bin/telnet) works for me reliably + a few asthetic
fixes.

Index: sys_bsd.c
===================================================================
RCS file: /cvsroot/basesrc/usr.bin/telnet/sys_bsd.c,v
retrieving revision 1.22
diff -d -u -r1.22 sys_bsd.c
--- sys_bsd.c	2002/09/23 12:48:04	1.22
+++ sys_bsd.c	2002/11/09 01:39:41
@@ -423,7 +423,7 @@
      * anything at all, otherwise it returns 1 + the number
      * of characters left to write.
 #ifndef	USE_TERMIO
-     * We would really like ask the kernel to wait for the output
+     * We would really like to ask the kernel to wait for the output
      * to drain, like we can do with the TCSADRAIN, but we don't have
      * that option.  The only ioctl that waits for the output to
      * drain, TIOCSETP, also flushes the input queue, which is NOT
@@ -443,6 +443,8 @@
 	    tcsetattr(tin, TCSADRAIN, &tmp_tc);
 #endif	/* USE_TERMIO */
 	    old = ttyflush(SYNCHing|flushout);
+	    if (old == -2)
+		return;
 	} while (old < 0 || old > 1);
     }
 
@@ -980,7 +982,8 @@
  *	The parameter specifies whether this is a poll operation,
  *	or a block-until-something-happens operation.
  *
- *	The return value is 1 if something happened, 0 if not.
+ *	The return value is 1 if something happened, 0 if not, < 0 if an
+ *	error occured.
  */
 
     int
@@ -1024,7 +1027,7 @@
 		return 0;
 #	    endif /* defined(TN3270) */
 		    /* I don't like this, does it ever happen? */
-	    printf("sleep(5) from telnet, after select\r\n");
+	    printf("sleep(5) from telnet, after poll\r\n");
 	    sleep(5);
 	}
 	return 0;
Index: terminal.c
===================================================================
RCS file: /cvsroot/basesrc/usr.bin/telnet/terminal.c,v
retrieving revision 1.8
diff -d -u -r1.8 terminal.c
--- terminal.c	2002/06/14 00:30:57	1.8
+++ terminal.c	2002/11/09 01:39:41
@@ -116,6 +116,7 @@
  *		Send as much data as possible to the terminal.
  *
  *		Return value:
+ *			-2: Permanent error writing to FD.
  *			-1: No useful work done, data waiting to go out.
  *			 0: No data was waiting, so nothing was done.
  *			 1: All waiting data was written out.
@@ -156,8 +157,12 @@
 	}
 	ring_consumed(&ttyoring, n);
     }
-    if (n < 0)
-	return -1;
+    if (n < 0) {
+	if (errno == EAGAIN)
+	    return -1;
+	else
+	    return -2;
+    }
     if (n == n0) {
 	if (n0)
 	    return -1;

>Release-Note:
>Audit-Trail:
>Unformatted: