Subject: kern/36674: tail -f exiting unexpectedly (tty code suspect)
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <he@NetBSD.org>
List: netbsd-bugs
Date: 07/21/2007 13:10:01
>Number:         36674
>Category:       kern
>Synopsis:       tail -f exiting unexpectedly (tty code suspect)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 21 13:10:00 +0000 2007
>Originator:     Havard Eidnes
>Release:        NetBSD 4.0_BETA2
>Organization:
	I try...
>Environment:
System: NetBSD bean.urc.uninett.no 4.0_BETA2 NetBSD 4.0_BETA2 (GENERIC) #7: Sat Jul 21 00:03:37 CEST 2007 he@bean.urc.uninett.no:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
	While building a largish package and redirecting the output
	to a file which I could then do "tail -f" on, I noticed that
	sometimes the "tail" process would simply exit with status 1.

	This is while running inside a remote xterm, started via ssh.

	After having this happen a number of times, I started the tail
	under control of ktrace, and this is the end of the trace:

  9939      1 tail     GIO   fd 1 wrote 61 bytes
       "-------------------------------------------------------------"
  9939      1 tail     RET   write 61/0x3d
  9939      1 tail     CALL  write(1,0x805103d,0xc)
  9939      1 tail     RET   write -1 errno 35 Resource temporarily unavailable
  9939      1 tail     CALL  issetugid
  9939      1 tail     RET   issetugid 0
  9939      1 tail     CALL  issetugid
  9939      1 tail     RET   issetugid 0
  9939      1 tail     CALL  break(0x8062000)
  9939      1 tail     RET   break 0
  9939      1 tail     CALL  open(0xbfbfd9d4,0,0xbfbfd944)
  9939      1 tail     NAMI  "/usr/share/nls/nls.alias.db"
  9939      1 tail     RET   open -1 errno 2 No such file or directory
  9939      1 tail     CALL  open(0xbbbd06ef,0,0xbfbfd9a8)
  9939      1 tail     NAMI  "/usr/share/nls/nls.alias"
  9939      1 tail     RET   open 5
  9939      1 tail     CALL  fcntl(5,2,1)
  9939      1 tail     RET   fcntl 0
  9939      1 tail     CALL  __fstat30(5,0xbfbfd938)
  9939      1 tail     RET   __fstat30 0
  9939      1 tail     CALL  mmap(0,0x5f0,1,2,5,0,0,0)
  9939      1 tail     RET   mmap -1146060800/0xbbb08000
  9939      1 tail     CALL  close(5)
  9939      1 tail     RET   close 0
  9939      1 tail     CALL  break(0x8063000)
  9939      1 tail     RET   break 0
  9939      1 tail     CALL  munmap(0xbbb08000,0x5f0)
  9939      1 tail     RET   munmap 0
  9939      1 tail     CALL  open(0xbfbfde5b,0,0xbfbfddc8)
  9939      1 tail     NAMI  "/usr/share/nls/C/libc.cat"
  9939      1 tail     RET   open 5
  9939      1 tail     CALL  __fstat30(5,0xbfbfddc8)
  9939      1 tail     RET   __fstat30 0
  9939      1 tail     CALL  mmap(0,0x10be,1,1,5,0,0,0)
  9939      1 tail     RET   mmap -1146064896/0xbbb07000
  9939      1 tail     CALL  close(5)
  9939      1 tail     RET   close 0
  9939      1 tail     CALL  munmap(0xbbb07000,0x10be)
  9939      1 tail     RET   munmap 0
  9939      1 tail     CALL  write(2,0x804aa76,6)
  9939      1 tail     RET   write -1 errno 35 Resource temporarily unavailable
  9939      1 tail     CALL  write(2,0x804b293,1)
  9939      1 tail     RET   write -1 errno 35 Resource temporarily unavailable
  9939      1 tail     CALL  exit(1)

	So, writes to stdout cause EAGAIN to be returned, and also
	when it tried to tell me the error message.
	
	Just to make sure that the tty is *not* set to non-blocking
	mode, I wrote this simple program:

#include <fcntl.h>
#include <stdio.h>

int
main(argc, argv)
        int argc;
        char *argv;
{
        int fl = fcntl(1, F_GETFL, 0);

        if (fl & O_NONBLOCK)
                printf("stdout is O_NONBLOCK\n");
        if (fl & O_APPEND)
                printf("stdout is O_APPEND\n");
        if (fl & O_ASYNC)
                printf("stdout is O_ASYNC\n");

        return 0;
}

	and of course it outputs nothing on the tty where this problem
	was experienced.

	I would have thought that the tty code should under such
	circumstances *never* return EAGAIN, but should rather put
	the "tail" process to sleep, waiting for the output to drain.

	BTW, this problem appears to not only affect tail; I saw the
	same problem with "less" on this tty, where it would fail to
	redraw the whole screen, and repeated ^L for "refresh" would
	repeatedly fail to redraw the whole screen.

>How-To-Repeat:
	Sorry, this appears to be a somewhat spurious problem, but see
	above for the conditions where this was experienced.

	This problem may be related to PR#31178.

>Fix:
	Sorry, don't know.