Subject: kern/26803: sigexit() has no barrier for other LWPs
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <thorpej@shagadelic.org>
List: netbsd-bugs
Date: 08/29/2004 15:16:52
>Number:         26803
>Category:       kern
>Synopsis:       sigexit() has no barrier for other LWPs
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 29 22:16:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Jason R Thorpe
>Release:        NetBSD 2.0G
>Organization:
        -- Jason R. Thorpe <thorpej@shagadelic.org>
>Environment:
	
	
System: NetBSD yeah-baby.shagadelic.org 2.0G NetBSD 2.0G (YEAH-BABY-XP) #26: Thu Jul 15 08:26:49 PDT 2004 thorpej@yeah-baby.shagadelic.org:/u1/netbsd/src/sys/arch/i386/compile/YEAH-BABY-XP i386
Architecture: i386
Machine: i386
>Description:
	sigexit() has a flaw for multi-threaded programs: while it
	sets a userret hook to suspend other LWPs, it doesn't wait
	for them to actually suspend.

	This means that other LWPs for the process that might be
	sleeping in the kernel may wake up and modify the process's
	address space while the core dump is taking place.

	Another issue (which even has an XXX in the code) is that
	other LWPs that might be running in userpace on other
	processors don't get jolted into the kernel to suspend
	themselves; there is simply no code to do this.

	I believe the lack of barrier has something to do with
	corrupted core files being dumped by a multi-threaded
	application I am working with that performs a lot of
	mmap / write (thus modifies the process's VM map and
	sleeps a lot while doing it).

>How-To-Repeat:
	I will work on a simple test case to show the problematic
	behavior.

>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: