Subject: threaded applications hang with concurrency enabled
To: NetBSD current <current-users@netbsd.org>
From: Nicolas Joly <njoly@pasteur.fr>
List: current-users
Date: 04/09/2004 12:21:01
Hi,

I made some tests with pthread concurrency on some of the biological
programs we run here (currently on alphas with Tru64 unix).

Most of the time, it works well. Thanks Christian !

But i noticed that under some currently unknown circumstances, those
programs can be stuck in `sawait' state ... and never come back.

1000 5092 1501  28   5    4  56  0 47288 19816 sawait I    p2 5:37.68 ./fasta_t -q db/p
1000 5092 1501  28   4    4  56  0 47288 19816 sawait I    p2 5:37.68 ./fasta_t -q db/p
1000 5092 1501  28   3    4  56  0 47288 19816 sawait I    p2 5:37.68 ./fasta_t -q db/p
1000 5092 1501  28   1    4  56  0 47288 19816 sawait I    p2 5:37.68 ./fasta_t -q db/p

Here is some debug got with PTHREAD_DEBUGLOG=1 :

[...]
(up 0x68000000) type 0 LWP 5 ev 0 intr 0
(up 0x68000000) switching to 0x78000000 (uc: U 0x79fff750 pc:
4809ccff)
(recycle 0x78000000) recycling 0x68000000
(setconcurrency 0xbe000000) requested delta 1, current 3
(setconcurrency 0xbe000000) requested 4, now 3, ret 1
(set 0xbe000000 concurrency) now 3
(setconcurrency 0x74000000) requested delta 1, current 3
(setconcurrency 0x74000000) requested 4, now 3, ret 0
(set 0x74000000 concurrency) now 3
(pthread__idle 0x4a000000).
(yield 0x4a000000 concurrency) now 2
(pthread__idle 0x4a000000) yielding.
(pthread__idle 0x4c000000).
(pthread__idle 0x4e000000).
(yield 0x4c000000 concurrency) now 1
(pthread__idle 0x4c000000) yielding.
(yield 0x4e000000 concurrency) now 0
(pthread__idle 0x4e000000) yielding.
(up 0x68000000) type 0 LWP 1 ev 0 intr 0
(up 0x68000000) switching to 0x50000000 (uc: U 0x51fff800 pc:
480a3700)
(recycle 0x50000000) recycling 0x68000000
(pthread__idle 0x50000000).
(yield 0x50000000 concurrency) now 0
(pthread__idle 0x50000000) yielding.
[HANG HERE]

I noticed the same behaviour, with 3 different programs : blast, fasta
and hmmer (but not with the same frequency).

All the programs works fine if concurrency is disabled, or if only 1
thread is active.

njoly@hal [~]> uname -a
NetBSD hal.sis.pasteur.fr 2.0C NetBSD 2.0C (HAL) #4: Fri Apr  9 00:39:01 CEST 2004  njoly@hal.sis.pasteur.fr:/local/src/NetBSD/obj/i386/sys/arch/i386/compile/HAL i386

Thanks in advance,
Regards.

-- 
Nicolas Joly

Biological Software and Databanks.
Institut Pasteur, Paris.