NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/56828: futex calls in Linux emulation sometimes hang
The following reply was made to PR kern/56828; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Thomas Klausner <wiz%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
Jason Thorpe <thorpej%NetBSD.org@localhost>
Subject: Re: kern/56828: futex calls in Linux emulation sometimes hang
Date: Sat, 18 Jan 2025 11:03:37 +0000
> Date: Sat, 18 Jan 2025 11:36:27 +0100
> From: Thomas Klausner <wiz%NetBSD.org@localhost>
>=20
> The futex tests look much better now, but still quite a lot are
> failing (mostly futex_wait issues):
>=20
> futex_cmp_requeue01.c:95: TBROK: fork() failed: EAGAIN/EWOULDBLOCK (11)
> tst_test.c:1606: TINFO: Killed the leftover descendant processes
Looks like you hit a process rlimit. Can you bump ulimit -p or
kern.maxproc?
> *** futex_wait03 ***
>=20
> tst_memutils.c:141: TINFO: oom_score_adj does not exist, skipping the adj=
ustment
> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
> tst_memutils.c:141: TINFO: oom_score_adj does not exist, skipping the adj=
ustment
> futex_wait03.c:63: TINFO: Testing variant: syscall with old kernel spec
> Test timeouted, sending SIGKILL!
> tst_test.c:1612: TINFO: If you are running on slow machine, try exporting=
LTP_TIMEOUT_MUL > 1
> tst_test.c:1614: TBROK: Test killed! (timeout?)
>=20
> Summary:
> passed 0
> failed 0
> broken 1
> skipped 0
> warnings 0
I suspect this is a bug in NetBSD's implementation of /proc/$pid/stat.
This is the only test case that queries it from another thread, I
think, and it looks like when that happens, /proc/$pid/stat doesn't
correctly report the other thread as sleeping (`S') when it is waiting
in futex(FUTEX_WAIT), so the wait-for-sleep busy loop spins forever
(or until timeout).
Could add a printf after TST_PROCESS_STATE_WAIT (and an fflush after
that) to verify that the test never gets past that loop.
> *** futex_wait05 ***
> [...]
> tst_timer_test.c:263: TINFO: futex_wait() sleeping for 1000us 500 iterati=
ons, threshold 450.01us
> tst_timer_test.c:285: TINFO: Found 500 outliners in [20098,13688] range
> tst_timer_test.c:305: TINFO: min 13688us, max 20098us, median 20000us, tr=
unc mean 19976.46us (discarded 25)
> tst_timer_test.c:314: TFAIL: futex_wait() slept for too long
These failures are all about the limited resolution of sleeps. I'm
guessing you're running at 100 Hz. These times are around 1-2 ticks
past the requested deadline, or 10-20ms =3D 10000-20000us (plus a tiny
slop of a few dozen microseconds). I would expect this to slow things
down but not make them deadlock.
> I tried the Metalworks demo from jdk17 and it worked fine.
>=20
> Then I tried the PDF-Over application.
> I could get one successful run through the application, but
> I had about 9 other tries where it didn't complete the process.
> Mostly not show the PDF (step 2 of the process), or show
> just a gray screen.
>=20
> Right now top says the process is in futex, so I suspect there are
> still more problems. Perhaps the futex_wait() problem bites us here.
Boo. I guess we need to kernhist it up to find what futex events had
recently happened before the deadlock.
Home |
Main Index |
Thread Index |
Old Index