NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55670: pthread_create / pthread_join test may wedge



>Number:         55670
>Category:       kern
>Synopsis:       pthread_create / pthread_join test may wedge
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 19 18:40:00 +0000 2020
>Originator:     he%NetBSD.org@localhost
>Release:        NetBSD 9.0_STABLE
>Organization:
   I Try...
>Environment:
System: NetBSD smistad.uninett.no 9.0_STABLE NetBSD 9.0_STABLE (GENERIC) #0: Sat May 30 02:09:41 CEST 2020 he%smistad.uninett.no@localhost:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	This simple program adapted from

	https://github.com/rust-lang/rust/issues/76600#issuecomment-695335502

--------------------
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>

#define N 800

static pthread_t threads[N];

static void *run(void *arg) {
        return malloc(1024);
}

int main() {
        for (int i = 0; i != N; ++i) assert(pthread_create(&threads[i], NULL, run, NULL) == 0);
        for (int i = 0; i != N; ++i) assert(pthread_join(threads[i], NULL) == 0);
}
--------------------

	when built with "cc -pthread t.c" and run repeatedly, may
	eventually wedge (the program, not the system).  When this
	happens, "ps sdw" shows one thread stuck in Z state, and the
	others in "parked" state:

UID   PID  PPID   CPU LID NLWP PRI NI     VSZ    RSS WCHAN  STAT TTY      LTIME COMMAND
169  6279  5683     0   1    1  85  0   27172   2816 ttyraw I    pts/6  0:00.01 - -tcsh 
169  7549 29786     0   1    1  85  0   27320   2968 pause  I    pts/8  0:00.06   `-- -tcsh 
169 11316  7549     0 618   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 614   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 570   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 436   13  43  0 3405132  23264 -      Z-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 414   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 399   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 386   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 371   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 343   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 317   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 313   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0 301   13  43  0 3405132  23264 parked I-   pts/8  0:00.00     |-- ./a.out 
169 11316  7549     0   1   13  43  0 3405132  23264 parked I    pts/8  0:00.01     |-- ./a.out 

	In my case (i7 4th gen, 4 cores, 8 with HT), I had to try 25
	times before hitting the wedge.

	The original reproducer had N at just 4, and I could not get
	it to wedge with that on my host (I did more than 10000
	attempts).


>How-To-Repeat:
	See above.
>Fix:
	Sorry, don't know.



Home | Main Index | Thread Index | Old Index