tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: eventfd(2) and timerfd(2) APIs

> On Sep 18, 2021, at 12:17 PM, Robert Elz <kre%munnari.OZ.AU@localhost> wrote:
>    Date:        Sat, 18 Sep 2021 10:26:29 -0700
>    From:        Jason Thorpe <>
>    Message-ID:  <>
>  |
> This one contains duplicated text...
>  Because they are associated with a file descriptor, they may be passed
>  to other processes, inherited across a fork, and multiplexed using
>  .Xr kevent ,
>  .Xr poll ,
>  or
>  .Xr select  they are associated with a file descriptor, they may be passed
>  to other processes, inherited across a fork, and multiplexed using
>  .Xr kevent 2 ,
>  .Xr poll 2 ,
>  or
>  .Xr select 2 .
> That should be fixed before anything is committed.

Thanks, fixed.

> Apart from that both man pages contain text like
>  unless the
>  .Nm
>  object was created with

I’m using those names, because those are the names used in the Linux API.  If you look at the code (it’s on the thorpej-futex branch), you will see that they are aliases for O_NONBLOCK and O_CLOEXEC.  I will clarify this in the man page.

> Since these things are working with file descriptors, I assume that
> fcntl(2) can be used to manipulate flags like O_NONBLOCK and O_CLOEXEC
> in which case I would guess (and hope) that the state of those flags when the
> object was created isn't what is releant, but the state of the flags at
> the time of the operation concerned.

Actually, I didn’t plumb fcntl through because just about nothing else plumbs it through either, but I’ll go ahead and do so.

> The man pages should probably be reworded with that in mind.
> The exact relationships of the {event,timer}fd_*() functions
> and read()/write() is also not clear to me - are those just wrappers
> around read/write or are they distinct sys calls of their own?

In the case of eventfd_read() and eventfd_write(), those are in fact just wrappers around read() and write(), they’re implemented in libc, and they’re provided only because Linux also provides them and I was aiming for API compatibility.

In the case of timerfd, Linux does not provide a timerfd_read() wrapper, so I also did not.  timerfd_gettime() and timerfd_settime() are not wrappers around anything.  They are themselves system calls, just as they are on Linux.

> I initially assumed the former, but then I see that timerfd_settimer()
> has an extra flags arg, which write() (I presume) has no easy way to
> pass in, so now I am not sure.
> If these are distinct operations how to actual read()/write() interact?

The behavior of timerfd with respect to read is documented in my man page:

     Each time a timerfd timer expires, an internal counter is incremented.
     Reads return the value of this counter as an unsigned 64-bit integer and
     reset the counter to 0.  If the value of the counter is 0, then reads
     will block, unless the timerfd object was created with TFD_NONBLOCK.

Writes to a timerfd return an error.  I will clarify this in the man page.

> Finally, what does fstat() return about these fds?   What is the dev_t ?
> What is the inode number, is the link count meaningfil, how about the
> uid and permissions?    And what affects the time fields?

For timerfd:

        struct timespec tfd_btime;      /* time created */
        struct timespec tfd_mtime;      /* last timerfd_settime() */
        struct timespec tfd_atime;      /* last read */

For eventfd:

        struct timespec efd_btime;      /* time created */
        struct timespec efd_mtime;      /* last write */
        struct timespec efd_atime;      /* last read */

Of course, we don’t document what these are for other kinds of descriptors, so I didn’t spend a lot of time documenting it for these.  It certainly might be a nice idea to fully document the stat info for every descriptor type in the system, but I don’t think the lack of that information (for which there is no standardized format, it seems, since no other descriptor types seem to document it) should be considered a blocker for adding these calls.

-- thorpej

Home | Main Index | Thread Index | Old Index