NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56828: futex calls in Linux emulation sometimes hang



The following reply was made to PR kern/56828; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Thomas Klausner <wiz%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
	Jason Thorpe <thorpej%NetBSD.org@localhost>
Subject: Re: kern/56828: futex calls in Linux emulation sometimes hang
Date: Sat, 18 Jan 2025 11:03:37 +0000

 > Date: Sat, 18 Jan 2025 11:36:27 +0100
 > From: Thomas Klausner <wiz%NetBSD.org@localhost>
 >=20
 > The futex tests look much better now, but still quite a lot are
 > failing (mostly futex_wait issues):
 >=20
 > futex_cmp_requeue01.c:95: TBROK: fork() failed: EAGAIN/EWOULDBLOCK (11)
 > tst_test.c:1606: TINFO: Killed the leftover descendant processes
 
 Looks like you hit a process rlimit.  Can you bump ulimit -p or
 kern.maxproc?
 
 > *** futex_wait03 ***
 >=20
 > tst_memutils.c:141: TINFO: oom_score_adj does not exist, skipping the adj=
 ustment
 > tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
 > tst_memutils.c:141: TINFO: oom_score_adj does not exist, skipping the adj=
 ustment
 > futex_wait03.c:63: TINFO: Testing variant: syscall with old kernel spec
 > Test timeouted, sending SIGKILL!
 > tst_test.c:1612: TINFO: If you are running on slow machine, try exporting=
  LTP_TIMEOUT_MUL > 1
 > tst_test.c:1614: TBROK: Test killed! (timeout?)
 >=20
 > Summary:
 > passed   0
 > failed   0
 > broken   1
 > skipped  0
 > warnings 0
 
 I suspect this is a bug in NetBSD's implementation of /proc/$pid/stat.
 
 This is the only test case that queries it from another thread, I
 think, and it looks like when that happens, /proc/$pid/stat doesn't
 correctly report the other thread as sleeping (`S') when it is waiting
 in futex(FUTEX_WAIT), so the wait-for-sleep busy loop spins forever
 (or until timeout).
 
 Could add a printf after TST_PROCESS_STATE_WAIT (and an fflush after
 that) to verify that the test never gets past that loop.
 
 > *** futex_wait05 ***
 > [...]
 > tst_timer_test.c:263: TINFO: futex_wait() sleeping for 1000us 500 iterati=
 ons, threshold 450.01us
 > tst_timer_test.c:285: TINFO: Found 500 outliners in [20098,13688] range
 > tst_timer_test.c:305: TINFO: min 13688us, max 20098us, median 20000us, tr=
 unc mean 19976.46us (discarded 25)
 > tst_timer_test.c:314: TFAIL: futex_wait() slept for too long
 
 These failures are all about the limited resolution of sleeps.  I'm
 guessing you're running at 100 Hz.  These times are around 1-2 ticks
 past the requested deadline, or 10-20ms =3D 10000-20000us (plus a tiny
 slop of a few dozen microseconds).  I would expect this to slow things
 down but not make them deadlock.
 
 > I tried the Metalworks demo from jdk17 and it worked fine.
 >=20
 > Then I tried the PDF-Over application.
 > I could get one successful run through the application, but
 > I had about 9 other tries where it didn't complete the process.
 > Mostly not show the PDF (step 2 of the process), or show
 > just a gray screen.
 >=20
 > Right now top says the process is in futex, so I suspect there are
 > still more problems. Perhaps the futex_wait() problem bites us here.
 
 Boo.  I guess we need to kernhist it up to find what futex events had
 recently happened before the deadlock.
 


Home | Main Index | Thread Index | Old Index