Subject: port-i386/2065: i386 calls tss_free() in a context with no curproc, leading to faults
To: None <gnats-bugs@NetBSD.ORG>
From: John Kohl <jtk@kolvir.arlington.ma.us>
List: netbsd-bugs
Date: 02/10/1996 22:51:05
>Number: 2065
>Category: port-i386
>Synopsis: i386 calls tss_free() in a context with no curproc, leading to faults
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: gnats-admin (GNATS administrator)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Feb 10 23:50:03 1996
>Last-Modified:
>Originator: John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release: 1.1
>Environment:
System: NetBSD pattern 1.1A NetBSD 1.1A (PATTERN) #16: Sat Feb 10 18:24:41 EST 1996 jtk@pattern:/u3/NetBSD-current/src/sys/arch/i386/compile/PATTERN i386
>Description:
When a process is exiting on the i386, it can end up dying by getting
into tsleep with curproc == 0.
An example stack trace is (ignore the _lock_clear_recursive; they are
red herrings--those frames are somewhere inside the swap pager):
_tsleep(f8d35e50,4,f8188229,0) at _tsleep+0x1e
_lock_clear_recursive(f8847780,f8231c9c,1,100000,1) at _lock_clear_recursive+0x
97b
_lock_clear_recursive(f8843140,f8231c9c,1,1,f82cb318) at _lock_clear_recursive+
0x490
_vm_pager_get_pages(f8843140,f8231c9c,1,1,f8231cec) at _vm_pager_get_pages+0x4a
_vm_pager_get(f8843140,f82cb318,1) at _vm_pager_get+0x14
_vm_fault(f82ab000,f9dee000,7,1) at _vm_fault+0x226
_vm_fault_wire(f82ab000,f9dee000,f9df0000,19,f87ef900) at _vm_fault_wire+0x35
_vm_map_pageable(f82ab000,f9dee000,f9df0000,0) at _vm_map_pageable+0x285
_swapin(f87ef900) at _swapin+0x23
_gdt_compact(f81f6850,f8231d90,f8192e0f,20,f8231dd0) at _gdt_compact+0x3b
_gdt_put_slot(20,f8231dd0,f8100a6f,f9dd3000,ffffffff) at _gdt_put_slot+0x6a
_tss_free(f9dd3000,ffffffff,0,f820354c,0) at _tss_free+0x17
_switch_exit(f8d3621c,0,f883f770,0,f8231e38) at _switch_exit+0x57
bpendtsleep(f8d3621c,4,f8188229,0) at bpendtsleep
_lock_clear_recursive(f8847780,f8231ea4,1,100000,1) at _lock_clear_recursive+0x
97b
_lock_clear_recursive(f8843140,f8231ea4,1,1,f82ea920) at _lock_clear_recursive+
0x490
_vm_pager_get_pages(f8843140,f8231ea4,1,1,f8231ef4) at _vm_pager_get_pages+0x4a
_vm_pager_get(f8843140,f82ea920,1) at _vm_pager_get+0x14
_vm_fault(f82ab000,f9dc0000,7,1) at _vm_fault+0x226
_vm_fault_wire(f82ab000,f9dc0000,f9dc2000,3b37,f87af800) at _vm_fault_wire+0x35
_vm_map_pageable(f82ab000,f9dc0000,f9dc2000,0) at _vm_map_pageable+0x285
_swapin(f87af800) at _swapin+0x23
_scheduler(f8777b00,f810f198,22ffb0,22f000,23e000) at _scheduler+0x85
_main(0,0,0,0,0) at _main+0x4e5
What's happened is that we are in switch_exit(), and trying to free up
resources. locore.s shows that switch_exit() leaves curproc == 0 until
it's done cleaning up, then it jumps to switch_search() which will
select a new process or go into the idle loop.
However, while switch_exit() is cleaning up there's a code path (seen in
the trace above) where curproc is expected to be non-NULL. Oops.
>How-To-Repeat:
Run a system for a while, build some stuff, and cause some process to
swap out. Then shut down to reboot and you'll likely hit this bug.
>Fix:
I'm not sure this is quite the right thing, but it seems like we need to
get curproc set to something acceptable before calling tss_free. I
haven't tested this yet (I wanted to file the bug so that other folks
will be aware and might figure out how to fix it), but my first guess
is:
===================================================================
RCS file: RCS/locore.s,v
retrieving revision 1.38
diff -c -r1.38 locore.s
*** locore.s 1996/02/09 04:21:54 1.38
--- locore.s 1996/02/11 03:50:31
***************
*** 1860,1865 ****
--- 1860,1868 ----
/* Record new pcb. */
movl %esi,_curpcb
+
+ /* Record _curproc as proc0, so we can call tss_free(). */
+ movl %ebx,_curproc
/* Interrupts are okay again. */
sti
>Audit-Trail:
>Unformatted: