tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

A possible bug with non-blocking sockets and SIGIO



Hello,

I have discovered a bug (at least it looks like a bug) in NetBSD's socket layer. I use NetBSD 5.0.2, i386 port. Please see the detailed info below (I have to say that I am new to NetBSD and to kernel development as well, so there could be some wrong assumptions in the text -- I will be happy if someone will correct me).


Background information
----------------------

Suppose there is a single-threaded server. It uses SIGIO to maintain incoming
connections and data from clients.

When the process receives SIGIO:
* if there are pending connections on the server socket, the process calls
   accept(2) and then adds an accepted client socket to a list.
* if there is any data available on any client socket (it is determined via
   poll(2)), server processes this data.


Problem description
-------------------

Server process does not receive SIGIO on incoming data on the client socket.


Problem analysis
----------------

A process will receive SIGIO on incoming data if the owner PID and O_ASYNC option are set for the socket. The debugging has shown that the client socket does not have SB_ASYNC bit set in its so_rcv.sb_flags, so SIGIO is not emitted
in sowakeup().

However the process sets the required options this way:

    int oldflags = fcntl (sock, F_GETFL, 0);

    if (!(oldflags & O_ASYNC)) {
        if (-1 == fcntl (sock, F_SETFL, oldflags |= O_ASYNC)) {
            exit (1);
        }
    }

i.e. if there is no O_ASYNC, set it. But the issue is that *there was O_ASYNC in the `oldflags`*! So we get in the situation when we have O_ASYNC but do not have
SB_ASYNC.

How it could happen? sys_fcntl() returns (fp->f_flag - 1) value to a process
when it asks socket flags via fcntl with F_GETFL.

The value of `fp->f_flag` for client's socket is copied from server's
`fp->f_flag` in do_sys_accept(). Note that server socket is non-blocking and its
flags contain all the necessary bits. But in sonewconn():

    so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;

This code sets SB_AUTOSIZE bit at client's `so_rcv.sb_flags`, if this bit is set at server's flags. Otherwise the value is 0. Here we lost all other flags,
including SB_ASYNC. I think that it is the bug.


Possible solution
-----------------

Since do_sys_accept() copies `f_flag` value from server's fp to client's fp and there is many data copied from server socket to client socket in sonewconn(),
I think it would be right to do

    so->so_rcv.sb_flags = head->so_rcv.sb_flags;
    so->so_snd.sb_flags = head->so_snd.sb_flags;

instead of

    so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
    so->so_snd.sb_flags |= head->so_snd.sb_flags & SB_AUTOSIZE;

in the sonewconn(). In this case we get socket's `sb_flags` and it's fp's f_flag
in sync.


Tests
-----

A patched kernel seems to work well. I have tested it with opensshd and GNU
Smalltalk's Swazoo application server.

A simple test application is also attached. It waits for an incoming connection on port 8000, then waits for incoming data from the client and then terminates. It can be built in two modes: with blocking and non-blocking server socket.

Blocking version:      gcc -o test_sync  onc.c
Non-blocking version:  gcc -o test_async onc.c -DV_ASYNC

To perform a test, start an appication, i.e.:

    $ test_async &

connect with telnet to it:

    $ telnet localhost 8000

type something and hit Enter.

Blocking version works fine without a patch, because initally server fp flags do
not contain O_ASYNC and application is able to set it.

Non-blocking version works correctly only with the patched kernel.

Both versions work fine on GNU/Linux (but with SIGPOLL instead of SIGIO).


--------------
Best regards,
Dmitry Matveev

Attachment: netbsd-sb_flags.patch
Description: Binary data

Attachment: onc.c
Description: Binary data



Home | Main Index | Thread Index | Old Index