Subject: Strange userland trap after cpu_lwp_exit
To: None <tech-kern@NetBSD.org>
From: Martin Husemann <martin@duskware.de>
List: tech-kern
Date: 09/10/2006 19:02:58
Folks,

I'm a bit puzzled by some debuging "results" I see and would like to solicit
hints for further debugging.

I have a SMP machine (where SMP is not working well yet, so there certainly
are bugs, and they sure are all my fault). This arch does lazy fpu saving.
Now I see the following, strange sequence of events:

  - a new lwp is created, cpu_lwp_fork is called, finds the parent lwp
    to have valid FPU state, and copies that state over to the new lwp
  - time passes, fpustate is saved, restored several times
  [all the above happens on various cpus]
  - on one cpu I get a call to cpu_lwp_free. The save fpu state is marked
    invalid
  - on another cpu I get a userland FPU-not-enabled fault for this same lwp,
    which now would access the invalid fpu state

I can not see how anything in userland could run once some lwp hass passed
through cpu_lwp_free - any ideas what is wrong here?

I only came up with radical ones (like "sheduler lock is not working, but 
I'm pretty sure the simple_lock code is ok), or the ones I don't want to
hear: something is wrong in the locore scheduler related functions.

Any other ideas?

Martin