NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/50094
The following reply was made to PR kern/50094; it has been noted by GNATS.
From: Michael Pratt <mpratt%google.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/50094
Date: Thu, 17 Dec 2020 12:27:40 -0500
This problem is affecting the Go language runtime. See
https://github.com/golang/go/issues/42515 for complete details.
In particular, see
https://github.com/golang/go/issues/42515#issuecomment-747097912,
where I came to the same conclusion as Christof that the problem is
that the kqueue is unlocked with the knote removed, thus a racing
kevent call may miss the event. I've also captured custom ktrace
traces to demonstrate the problem.
However, while I believe Christof's proposed fix would work for
blocking kevent calls, it would still allow non-blocking kevent calls
to miss the event and return without it in violation of the API. It is
not clear to me how this should be fixed as I don't know exactly what
constraints require unlocking in the first place. For reference,
FreeBSD seems to handle this issue by noticing that the kqueue is "in
flux" and waiting for it to settle:
https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c#L1859-L1869.
I have a C reproducer similar to Go's usage at
https://gist.github.com/prattmic/8b5bc6c87437bd4496d5f546fb3226fc,
which is copied at the end of this email for convenience.
The C reproducer should be sufficient to see the issue, but here's
some additional context on Go's usage for reference:
Go uses a single kqueue to track networking file descriptors
("netpoll"). Typically there is one blocking kevent caller to capture
events when the process is otherwise idle, plus multiple non-blocking
callers from threads looking for work (so we don't need complete
synchronization with the blocking thread).
The blocking thread serves a second purpose to implement Go timers.
The timeout on the kevent is set to the earliest expiring timer. When
the kevent times out, we notice and handle the now-expired timer. If
user code sets a timer expiring earlier than any other, then we need
to break out of the blocking kevent to restart with a shorter timeout.
We do this by writing to a special purpose pipe used just for
generating events to break out of kevent. It is these "netpoll break"
events that we are occasionally missing due to this bug.
In Go 1.15 and prior, there was an additional background thread
running periodically that could also notice overrun timers and handle
them (at the expense of latency, since this only runs periodically).
For Go 1.16, various improvements to improve timer latency have made
that periodic check unnecessary, so it has been removed. Now this bug
is loudly exposed as we can completely miss timers for an unbounded
period, hanging programs and causing test timeouts.
As a workaround we will add back the periodic check for NetBSD, but as
mentioned above, it is at the expense of timer latency, so we'd really
like to see this bug fixed.
Regards,
Michael Pratt
---
// $ gcc -pthread kqueue_race.c
// $ ./a.out
// kevent = 0
// Incorrect number of events after 477 calls!
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/event.h>
#include <sys/time.h>
#include <unistd.h>
int kq;
void* read_kevent(void* arg) {
int count = 0;
while (1) {
count++;
// Non-blocking.
struct timespec ts = {};
struct kevent ev;
int n = kevent(kq, NULL, 0, &ev, 1, &ts);
if (n < 0) {
perror("kevent");
exit(1);
} else if (n != 1) {
printf("kevent = %d\n", n);
printf("Incorrect number of events after %d calls!\n", count);
exit(2);
}
}
}
int main(int argc, char** argv) {
int p[2];
int ret = pipe(p);
if (ret < 0) {
perror("pipe");
return 1;
}
kq = kqueue();
if (kq < 0) {
perror("kqueue");
return 1;
}
struct kevent ev = {};
EV_SET(&ev, p[0], EVFILT_READ, EV_ADD, 0, 0, 0);
ret = kevent(kq, &ev, 1, NULL, 0, NULL);
if (ret < 0) {
perror("kevent register");
return 1;
}
// Write to pipe, kevent now ready for read end.
char c = 0;
ret = write(p[1], &c, 1);
if (ret != 1) {
perror("write");
return 1;
}
pthread_t t1, t2;
ret = pthread_create(&t1, NULL, read_kevent, NULL);
if (ret < 0) {
perror("pthread_create");
return 1;
}
ret = pthread_create(&t2, NULL, read_kevent, NULL);
if (ret < 0) {
perror("pthread_create");
return 1;
}
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
Home |
Main Index |
Thread Index |
Old Index