NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/46464: lib/librumphijack/t_tcpip:ssh test case randomly fails

>Number:         46464
>Category:       kern
>Synopsis:       lib/librumphijack/t_tcpip:ssh test case randomly fails
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 18 15:40:00 +0000 2012
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current as of 2012.
System: NetBSD
Architecture: i386
Machine: i386

The "ssh" test case of the lib/librumphijack/t_tcpip test has been
randomly failing for a long time, perhaps its entire existence (it
was committed on 2011., and the first recorded failure
on my test system was ten days later at 2011.

The output from a recent failure can be seen at:

The above log shows the test failing with the error message:

  Timeout, server not responding.

Interestingly, this message is only printed in case of a keep-alive
timeout, and keep-alive timeouts can only happen if ssh is configured
with a non-zero ServerAliveInterval, but in this test, ssh is using a
configuration with the default ServerAliveInterval of zero,
so there should be no way this can happen.

To debug this, I added some printfs to print the arguments and return value of
the select() system call in src/crypto/external/bsd/openssh/dist/clientloop.c,
and found that when the error occurs, select() is returning zero (indicating
a timeout) even though its "timeout" argument is a NULL pointer.  That's
not supposed to happen, is it?  So to me this looks like a bug in select(),
or maybe just in its rump implementation.

My patch adding the printfs is at

and here is an excerpt from a /tmp/select.log file written by the patched
ssh, showing select() returning 0 even though the timeout argument is

  select nfds=129 tvp=0x0
    read 5
    read 128
  select ret = 0


  cd /usr/tests/lib/librumphijack/
  while atf-run t_tcpip; do true; done

This may fail in a test case other than the ssh one; if so, retry it.


Home | Main Index | Thread Index | Old Index