Subject: Strange userland trap after cpu_lwp_exit
To: None <tech-kern@NetBSD.org>
From: Martin Husemann <firstname.lastname@example.org>
Date: 09/10/2006 19:02:58
I'm a bit puzzled by some debuging "results" I see and would like to solicit
hints for further debugging.
I have a SMP machine (where SMP is not working well yet, so there certainly
are bugs, and they sure are all my fault). This arch does lazy fpu saving.
Now I see the following, strange sequence of events:
- a new lwp is created, cpu_lwp_fork is called, finds the parent lwp
to have valid FPU state, and copies that state over to the new lwp
- time passes, fpustate is saved, restored several times
[all the above happens on various cpus]
- on one cpu I get a call to cpu_lwp_free. The save fpu state is marked
- on another cpu I get a userland FPU-not-enabled fault for this same lwp,
which now would access the invalid fpu state
I can not see how anything in userland could run once some lwp hass passed
through cpu_lwp_free - any ideas what is wrong here?
I only came up with radical ones (like "sheduler lock is not working, but
I'm pretty sure the simple_lock code is ok), or the ones I don't want to
hear: something is wrong in the locore scheduler related functions.
Any other ideas?