Subject: kern/36674: tail -f exiting unexpectedly (tty code suspect)
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <he@NetBSD.org>
List: netbsd-bugs
Date: 07/21/2007 13:10:01
>Number: 36674
>Category: kern
>Synopsis: tail -f exiting unexpectedly (tty code suspect)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jul 21 13:10:00 +0000 2007
>Originator: Havard Eidnes
>Release: NetBSD 4.0_BETA2
>Organization:
I try...
>Environment:
System: NetBSD bean.urc.uninett.no 4.0_BETA2 NetBSD 4.0_BETA2 (GENERIC) #7: Sat Jul 21 00:03:37 CEST 2007 he@bean.urc.uninett.no:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
While building a largish package and redirecting the output
to a file which I could then do "tail -f" on, I noticed that
sometimes the "tail" process would simply exit with status 1.
This is while running inside a remote xterm, started via ssh.
After having this happen a number of times, I started the tail
under control of ktrace, and this is the end of the trace:
9939 1 tail GIO fd 1 wrote 61 bytes
"-------------------------------------------------------------"
9939 1 tail RET write 61/0x3d
9939 1 tail CALL write(1,0x805103d,0xc)
9939 1 tail RET write -1 errno 35 Resource temporarily unavailable
9939 1 tail CALL issetugid
9939 1 tail RET issetugid 0
9939 1 tail CALL issetugid
9939 1 tail RET issetugid 0
9939 1 tail CALL break(0x8062000)
9939 1 tail RET break 0
9939 1 tail CALL open(0xbfbfd9d4,0,0xbfbfd944)
9939 1 tail NAMI "/usr/share/nls/nls.alias.db"
9939 1 tail RET open -1 errno 2 No such file or directory
9939 1 tail CALL open(0xbbbd06ef,0,0xbfbfd9a8)
9939 1 tail NAMI "/usr/share/nls/nls.alias"
9939 1 tail RET open 5
9939 1 tail CALL fcntl(5,2,1)
9939 1 tail RET fcntl 0
9939 1 tail CALL __fstat30(5,0xbfbfd938)
9939 1 tail RET __fstat30 0
9939 1 tail CALL mmap(0,0x5f0,1,2,5,0,0,0)
9939 1 tail RET mmap -1146060800/0xbbb08000
9939 1 tail CALL close(5)
9939 1 tail RET close 0
9939 1 tail CALL break(0x8063000)
9939 1 tail RET break 0
9939 1 tail CALL munmap(0xbbb08000,0x5f0)
9939 1 tail RET munmap 0
9939 1 tail CALL open(0xbfbfde5b,0,0xbfbfddc8)
9939 1 tail NAMI "/usr/share/nls/C/libc.cat"
9939 1 tail RET open 5
9939 1 tail CALL __fstat30(5,0xbfbfddc8)
9939 1 tail RET __fstat30 0
9939 1 tail CALL mmap(0,0x10be,1,1,5,0,0,0)
9939 1 tail RET mmap -1146064896/0xbbb07000
9939 1 tail CALL close(5)
9939 1 tail RET close 0
9939 1 tail CALL munmap(0xbbb07000,0x10be)
9939 1 tail RET munmap 0
9939 1 tail CALL write(2,0x804aa76,6)
9939 1 tail RET write -1 errno 35 Resource temporarily unavailable
9939 1 tail CALL write(2,0x804b293,1)
9939 1 tail RET write -1 errno 35 Resource temporarily unavailable
9939 1 tail CALL exit(1)
So, writes to stdout cause EAGAIN to be returned, and also
when it tried to tell me the error message.
Just to make sure that the tty is *not* set to non-blocking
mode, I wrote this simple program:
#include <fcntl.h>
#include <stdio.h>
int
main(argc, argv)
int argc;
char *argv;
{
int fl = fcntl(1, F_GETFL, 0);
if (fl & O_NONBLOCK)
printf("stdout is O_NONBLOCK\n");
if (fl & O_APPEND)
printf("stdout is O_APPEND\n");
if (fl & O_ASYNC)
printf("stdout is O_ASYNC\n");
return 0;
}
and of course it outputs nothing on the tty where this problem
was experienced.
I would have thought that the tty code should under such
circumstances *never* return EAGAIN, but should rather put
the "tail" process to sleep, waiting for the output to drain.
BTW, this problem appears to not only affect tail; I saw the
same problem with "less" on this tty, where it would fail to
redraw the whole screen, and repeated ^L for "refresh" would
repeatedly fail to redraw the whole screen.
>How-To-Repeat:
Sorry, this appears to be a somewhat spurious problem, but see
above for the conditions where this was experienced.
This problem may be related to PR#31178.
>Fix:
Sorry, don't know.