Subject: port-i386/2065: i386 calls tss_free() in a context with no curproc, leading to faults
To: None <gnats-bugs@NetBSD.ORG>
From: John Kohl <jtk@kolvir.arlington.ma.us>
List: netbsd-bugs
Date: 02/10/1996 22:51:05
>Number:         2065
>Category:       port-i386
>Synopsis:       i386 calls tss_free() in a context with no curproc, leading to faults
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 10 23:50:03 1996
>Last-Modified:
>Originator:     John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release:        1.1
>Environment:
	
System: NetBSD pattern 1.1A NetBSD 1.1A (PATTERN) #16: Sat Feb 10 18:24:41 EST 1996 jtk@pattern:/u3/NetBSD-current/src/sys/arch/i386/compile/PATTERN i386


>Description:
When a process is exiting on the i386, it can end up dying by getting
into tsleep with curproc == 0.

An example stack trace is (ignore the _lock_clear_recursive; they are
red herrings--those frames are somewhere inside the swap pager):

_tsleep(f8d35e50,4,f8188229,0) at _tsleep+0x1e
_lock_clear_recursive(f8847780,f8231c9c,1,100000,1) at _lock_clear_recursive+0x
97b
_lock_clear_recursive(f8843140,f8231c9c,1,1,f82cb318) at _lock_clear_recursive+
0x490
_vm_pager_get_pages(f8843140,f8231c9c,1,1,f8231cec) at _vm_pager_get_pages+0x4a

_vm_pager_get(f8843140,f82cb318,1) at _vm_pager_get+0x14
_vm_fault(f82ab000,f9dee000,7,1) at _vm_fault+0x226
_vm_fault_wire(f82ab000,f9dee000,f9df0000,19,f87ef900) at _vm_fault_wire+0x35
_vm_map_pageable(f82ab000,f9dee000,f9df0000,0) at _vm_map_pageable+0x285
_swapin(f87ef900) at _swapin+0x23
_gdt_compact(f81f6850,f8231d90,f8192e0f,20,f8231dd0) at _gdt_compact+0x3b
_gdt_put_slot(20,f8231dd0,f8100a6f,f9dd3000,ffffffff) at _gdt_put_slot+0x6a
_tss_free(f9dd3000,ffffffff,0,f820354c,0) at _tss_free+0x17
_switch_exit(f8d3621c,0,f883f770,0,f8231e38) at _switch_exit+0x57
bpendtsleep(f8d3621c,4,f8188229,0) at bpendtsleep
_lock_clear_recursive(f8847780,f8231ea4,1,100000,1) at _lock_clear_recursive+0x
97b
_lock_clear_recursive(f8843140,f8231ea4,1,1,f82ea920) at _lock_clear_recursive+
0x490
_vm_pager_get_pages(f8843140,f8231ea4,1,1,f8231ef4) at _vm_pager_get_pages+0x4a
           
_vm_pager_get(f8843140,f82ea920,1) at _vm_pager_get+0x14
_vm_fault(f82ab000,f9dc0000,7,1) at _vm_fault+0x226
_vm_fault_wire(f82ab000,f9dc0000,f9dc2000,3b37,f87af800) at _vm_fault_wire+0x35

_vm_map_pageable(f82ab000,f9dc0000,f9dc2000,0) at _vm_map_pageable+0x285
_swapin(f87af800) at _swapin+0x23
_scheduler(f8777b00,f810f198,22ffb0,22f000,23e000) at _scheduler+0x85
_main(0,0,0,0,0) at _main+0x4e5


What's happened is that we are in switch_exit(), and trying to free up
resources.  locore.s shows that switch_exit() leaves curproc == 0 until
it's done cleaning up, then it jumps to switch_search() which will
select a new process or go into the idle loop.

However, while switch_exit() is cleaning up there's a code path (seen in
the trace above) where curproc is expected to be non-NULL.  Oops.

>How-To-Repeat:

Run a system for a while, build some stuff, and cause some process to
swap out.  Then shut down to reboot and you'll likely hit this bug.

>Fix:

I'm not sure this is quite the right thing, but it seems like we need to
get curproc set to something acceptable before calling tss_free.  I
haven't tested this yet (I wanted to file the bug so that other folks
will be aware and might figure out how to fix it), but my first guess
is:


===================================================================
RCS file: RCS/locore.s,v
retrieving revision 1.38
diff -c -r1.38 locore.s
*** locore.s	1996/02/09 04:21:54	1.38
--- locore.s	1996/02/11 03:50:31
***************
*** 1860,1865 ****
--- 1860,1868 ----
  
  	/* Record new pcb. */
  	movl	%esi,_curpcb
+ 	
+ 	/* Record _curproc as proc0, so we can call tss_free(). */
+ 	movl	%ebx,_curproc
  
  	/* Interrupts are okay again. */
  	sti
>Audit-Trail:
>Unformatted: