Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sparc64 5.1_RC1 SMP crashes



It's probably not an SMP issue per se, but has to do with the total number 
of running processes.

UltraSPARC processors have 4K context IDs, one of which is required for 
the pmap of each active process (well, 4095 actually since 0 is reserved 
for the kernel).  When you have more than 4K processes, and the kernel 
wants to execute a new process, a context ID must be stolen from another 
process and that context flushed from the MMU.  The code that does the 
context stealing has gotten much more complicated over the years and 
now seems to be tickling some sort of bug.

Eduardo


On Sat, 15 May 2010, Chris Ross wrote:

>  Further research shows that this is due to running multiple compilations
> simultaneously, so it's an SMP issue.  If I run a pkgsrc build without setting
> MAKE_JOBS (to 3 in my use case), it seems to build just fine.  But, at one
> point I tried running a separate build of a separate package at the same time,
> and that *also* caused this sort of crash.
> 
>  The crashes noted below, I apologize for not mentioning at the time, were
> from single pkgsrc package builds, but with MAKE_JOBS set (to 3), and within a
> package that allows that to perform the intended build -j change.
> 
>                                   - Chris
> 
> On May 13, 2010, at 01:09, Chris Ross wrote:
> > Twice this evening, while building packages from pkgsrc, I've had the
> > following two panics (at the same place) on my Quad-processor E420R, which
> > is using raidframe to RAID1 it's two scsi disk drives:
> > 
> > This is a custom kernel, but I do have a netbsd.gdb to run against it.  I
> > think my /var/crash is full, so I'll have to fix that when it comes back up,
> > but.
> > 
> > Anyone have any suggestions as to how to diagnose this one?
> > 
> > Thanks...
> > 
> >                                    - Chris
> > 
> > 
> > panic: kernel diagnostic assertion
> > "pmap_ctx(LIST_FIRST(&curcpu()->ci_pmap_ctxlist)) != 0" failed: file
> > "/data/NetBSD/src-5/sys/arch/sparc64/sparc64/pmap.c", line 3168
> > Begin traceback...
> > End traceback...
> > Frame pointer is at 0x11dbf181
> > Call traceback:
> > 1319340(11, 5, 0, 0, 1846800, 0, 11dbf251) fp = 11dbf251
> > 1243a40(104, 0, ffff, 150f511, 1243780, 0, 11dbf311) fp = 11dbf311
> > 1382fcc(1519100, 14a6540, 150e1f0, 150d880, c60, 104, 11dbf3e1) fp =
> > 11dbf3e1
> > 131f150(14a6540, 150d880, c60, 150e1f0, 4093e750, 800, 11dbf4a1) fp =
> > 11dbf4a1
> > 120ee04(11dd5700, 0, 0, 6, badcafe, badcafe, 11dbf561) fp = 11dbf561
> > 100ab60(e0018000, 1210f420, 0, 18b5a80, badcafe, 11eb3bd0, 11dbf621) fp =
> > 11dbf621
> > 409bfb10(0, badcafe, badcafe, badcafe, badcafe, badcafe, ffffffffffff8b11)
> > fp = ffffffffffff8b11
> > 
> > 
> > panic: kernel diagnostic assertion
> > "pmap_ctx(LIST_FIRST(&curcpu()->ci_pmap_ctxlist)) != 0" failed: file
> > "/data/NetBSD/src-5/sys/arch/sparc64/sparc64/pmap.c", line 3168
> > Begin traceback...
> > End traceback...
> > Frame pointer is at 0x1e2a2cd1
> > Call traceback:
> > 1319340(11, 5, 0, 0, 1846800, 0, 1e2a2da1) fp = 1e2a2da1
> > 1243a40(104, 0, ffff, 150f511, 1243780, 0, 1e2a2e61) fp = 1e2a2e61
> > 1382fcc(1519100, 14a6540, 150e1f0, 150d880, c60, 104, 1e2a2f31) fp =
> > 1e2a2f31
> > 131f150(14a6540, 150d880, c60, 150e1f0, 1e2a3850, 6376600, 1e2a2ff1) fp =
> > 1e2a2ff1
> > 11c9ae8(11daf640, ffffffffffffffff, 1e25f800, 404, 1814800, 100040,
> > 1e2a30b1) fp = 1e2a30b1
> > 1205b7c(1200ca30, 0, ffffffffffffffff, 1e2a0000, 0, 112512a2, 1e2a3171) fp =
> > 1e2a3171
> > 1323e4c(c, 0, 64, 11250000, 1205360, 4, 1e2a3511) fp = 1e2a3511
> > 10092d4(1e2a3ed0, 1e2a3f58, 4053ede0, 0, 4053ede0, 800, 1e2a3621) fp =
> > 1e2a3621
> > 10fa20(40818280, 4080fc00, 40810400, 1, 0, 4f4e53, ffffffffffff8791) fp =
> > ffffffffffff8791
> 


Home | Main Index | Thread Index | Old Index