NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system



On Sun, Jan 15, 2012 at 10:08:42PM -0800, Greg A. Woods wrote:
> So I was finally able to get a new server, and its a nice big Dell
> PE2950 with 32GB RAM, lots-o-disk on a PERC-6/i, and a pair of zippy
> Intel Xeon E5440 CPUs (quad-cores x2).

At a guess, look for a locking bug in the PERC driver.  Aside from
that, this is a pretty ordinary system and much like those others
run with LOCKDEBUG all the time.

Usually the SPL NOT LOWERED business means a missing unlock.  It could
also mean sleeping with a lock held, in such a way that with a 
preemptible kernel you can return to userspace without releasing the
lock.

Thor
> 
> It's the first real hardware I've tried to do anything serious with
> netbsd-5 on -- previously I'd only run netbsd-5 in VirtualBox (though
> with two CPUs, on my iMac).
> 
> Everything looked OK during initial install of NetBSD-5, but during the
> first real load test (build.sh -j 4) it crashed:  PR# 45827.
> 
> I've since seen this a bunch more times.  The machine is basically
> useless because it seems to crash almost immediately under any decent
> load.
> 
> Most recently I've been trying to use it to run sysinst to install to a
> CF card that's connected via a USB reader.  The first attempt ended in
> the middle of unpacking the man.tgz set with a similar crash to PR#
> 45827, and the second attempt ended in the middle of comp.tgz with:
> 
> 
> panic: WARNING: SPL NOT LOWERED ON SYSCALL EXIT
> LOCKDEBUGWARNING: SPL NOT LOWERED ON SYSCALL EXIT
> 
> WARNING: SPL NOT LOWERED ON TRAP EXIT
> fatal breakpoint trapWARNING: SPL NOT LOWERED ON TRAP EXIT
>  in supervisor mode
> WARNING: SPL NOT LOWERED ON SYSCALL EXIT
> trap type 1 code 0 eip c05cc4ac cs 8 eflags 246 cr2 bbb90000 ilevel 0
> Stopped in pid 2751.1 (systat) at       netbsd:breakpoint+0x4:  popl    %ebp
> db{4}> trace
> breakpoint(c0bfe3da,dcc4bac8,c3398800,c04fcb9f,0,1,0,0,dcc4bac8,2) at 
> netbsd:breakpoint+0x4
> panic(c0b907a0,c0b8d6c7,c093658b,c0b90800,c04e06b2,1c62a60,0,8,1,c0d52cc0) at 
> netbsd:panic+0x1b0
> lockdebug_abort1(c0b90800,1,1,c0d5c120,0,0,c0d5c120,c3ba8db8,68,7fffffff) at 
> netbsd:lockdebug_abort1+0xbb
> rw_vector_exit(c0d52cc0,68,dcc4bc0c,c0534005,0,bbb909e0,68,1,c053eb38,dcc62a60)
>  at netbsd:rw_vector_exit+0xc8
> sysctl_unlock(0,bbb909e0,68,1,c053eb38,dcc62a60,0,18,1,dc81d7d0) at 
> netbsd:sysctl_unlock+0x12
> sysctl_dobuf(dcc4bca4,4,bbb55000,dcc4bccc,0,0,dcc4bc9c,dcc62a60,c336c980,0) 
> at netbsd:sysctl_dobuf+0xc5
> sysctl_dispatch(dcc4bc9c,6,bbb55000,dcc4bccc,0,0,dcc4bc9c,dcc62a60,c336c980,0)
>  at netbsd:sysctl_dispatch+0xcf
> sys___sysctl(dcc62a60,dcc4bd00,dcc4bd28,dcc4bd40,c05b8f02,c0d7dc20,ca,bfbfdf2c,6,bbb55000)
>  at netbsd:sys___sysctl+0xd6
> syscall(dcc4bd48,b3,ab,1f,1f,bfbfdf2c,bbb55000,bfbfde88,0,6) at 
> netbsd:syscall+0x100
> db{4}> 
> 
> 
> 
> I've also got what seems to be a 100% repeatable panic (cpu_switchto:
> switching above IPL_SCHED) happening if I try to power the system down
> with "halt -p".  I don't know if that is in any way related or not.
> 
> 
> So, I've been wondering, is anyone else running netbsd-5 on a many-core
> system with a LOCKDEBUG + DIAGNOSTICS + DEBUG kernel?
> 
> 
> -- 
>                                               Greg A. Woods
> 
> +1 250 762-7675                                RoboHack 
> <woods%robohack.ca@localhost>
> Planix, Inc. <woods%planix.com@localhost>      Secrets of the Weird 
> <woods%weird.com@localhost>



-- 
Thor Lancelot Simon                                    tls%panix.com@localhost
  "All of my opinions are consistent, but I cannot present them all
   at once."    -Jean-Jacques Rousseau, On The Social Contract


Home | Main Index | Thread Index | Old Index