Subject: cpu1: spinout - MP kernel panic in sys/arch/x86/x86/lock_machdep.c
To: None <current-users@netbsd.org>
From: Brian A. Seklecki <lavalamp@spiritual-machines.org>
List: current-users
Date: 03/03/2004 01:26:19
All:

I've got a pretty heavily loaded box acting as a ' network appliance '. 
I can crash it on a pretty consistent basis by performing just about any
operation in pkgsrc; actually anything with a lot of disk I/O.  I'm not
quite sure what to make of the error in order to begin debugging the
problem.


On line 92 of sys/arch/x86/x86/lock_machdep.c: 1.3 there's a
preprocessor #ifdef:


#if defined(DEBUG) && defined(DDB)
                spincount++;
                if (spincount == limit) {
                        extern int db_active;
                        spincount = 0;

                        if (db_active) {
                                db_printf("cpu%d: spinout while in
debugger\n",
                                    cpu);
                                while (db_active)
                                        ;
                        }
                        db_printf("cpu%d: spinout\n", cpu);
                        Debugger();
                }
#endif
        }


...this is a very strange error.  To me that means that the panic could
only occur if you've compiled a kernel with debugging support.  I've
never heard of such a setup -- extremely large amounts of
incomprehensible debugging to the console yes, but panic() - no.
Additionally int spin_limit is statically defined earlier in the code:

#if defined (DEBUG) && defined(DDB)
int spin_limit = 10000000;
__cpu_simple_lock_t *wantlock[X86_MAXPROCS], *gotlock[X86_MAXPROCS];
#endif


Anyway I've provided a back trace from both CPU instances below.  My
kernel config for this box is attached.  Any input on these crashes
would be appreciated.  Once -current stabilizes, I'm going to try to
re-roll this custom kernel w/o debugging support.


-Brian

===> databases/ruby-mysql
cpu1: spinout
Stopped in pid 7713.1 (sh) at   netbsd:breakpoint+0x4:  leave
db{1}>
db{1}>
db{1}>

db{1}> machine cpu 0
using cpu 0
db{1}>
db{1}> bt
acquire(c053d420,c66e3d40,400040,0,600) at netbsd:acquire+0x7c
_lockmgr(c053d420,400042,0,c0495860,309) at netbsd:_lockmgr+0x8a2
x86_intlock(c66e3da0,6,c02c0010,30,c0510010) at netbsd:x86_intlock+0x2c
Bad frame pointer: 0xc0d52c00
db{1}>
db{1}> machine cpu 1

using cpu 1
db{1}> bt
breakpoint(c0495d22,1,c891fe3c,c0367758,c0518f28) at
netbsd:breakpoint+0x4
cpu_Debugger(c0518f28,c0ca8000,1,1312d00,1) at netbsd:cpu_Debugger+0xb
__cpu_simple_lock(c051a964,c0ca8000,c891fe5c,c02c5073,0) at
netbsd:__cpu_simple_
lock+0xce
_simple_lock(c051a964,c0491a7f,152,29,c0caae00) at
netbsd:_simple_lock+0x133
uvm_uarea_alloc(c891febc,1,c891feac,c02c56e0,d) at
netbsd:uvm_uarea_alloc+0x1d
fork1(c7976be0,0,14,0,0) at netbsd:fork1+0x16a
sys_fork(c7976be0,c891ff64,c891ff5c,c03823fd,6) at netbsd:sys_fork+0x22
syscall_plain(c891ffa8,1f,1f,1f,1f) at netbsd:syscall_plain+0x12a
db{1}> sync
syncing disks...