NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/46464: lib/librumphijack/t_tcpip:ssh test case randomly fails
>Number: 46464
>Category: kern
>Synopsis: lib/librumphijack/t_tcpip:ssh test case randomly fails
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 18 15:40:00 +0000 2012
>Originator: Andreas Gustafsson
>Release: NetBSD-current as of 2012.05.16.19.12.59
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
The "ssh" test case of the lib/librumphijack/t_tcpip test has been
randomly failing for a long time, perhaps its entire existence (it
was committed on 2011.02.14.15.14.00, and the first recorded failure
on my test system was ten days later at 2011.02.24.18.33.06).
The output from a recent failure can be seen at:
http://releng.netbsd.org/b5reports/i386/build/2012.05.16.11.45.08/test.html#lib_librumphijack_t_tcpip_ssh
The above log shows the test failing with the error message:
Timeout, server 127.0.0.1 not responding.
Interestingly, this message is only printed in case of a keep-alive
timeout, and keep-alive timeouts can only happen if ssh is configured
with a non-zero ServerAliveInterval, but in this test, ssh is using a
configuration with the default ServerAliveInterval of zero,
so there should be no way this can happen.
To debug this, I added some printfs to print the arguments and return value of
the select() system call in src/crypto/external/bsd/openssh/dist/clientloop.c,
and found that when the error occurs, select() is returning zero (indicating
a timeout) even though its "timeout" argument is a NULL pointer. That's
not supposed to happen, is it? So to me this looks like a bug in select(),
or maybe just in its rump implementation.
My patch adding the printfs is at
http://www.gson.org/netbsd/bugs/atf-ssh-test/ssh-select-debug.patch
and here is an excerpt from a /tmp/select.log file written by the patched
ssh, showing select() returning 0 even though the timeout argument is
NULL:
select nfds=129 tvp=0x0
read 5
read 128
select ret = 0
>How-To-Repeat:
cd /usr/tests/lib/librumphijack/
while atf-run t_tcpip; do true; done
This may fail in a test case other than the ssh one; if so, retry it.
>Fix:
Home |
Main Index |
Thread Index |
Old Index