Subject: Re: 1.5S vs sparc/MP
To: None <pk@cs.few.eur.nl>
From: Simon J. Gerraty <sjg@quick.com.au>
List: tech-smp
Date: 03/07/2001 01:17:12
> I fixed a few things last week-end that might help a bit against those
> watchdogs. Also, the kernel lock is now acquired when a process enters
> kernel mode (where it matters, I hope). So, modulo cache flushe issues,
> a MP kernel should run again without violating the locking protocol.

I've had mixed results today.  No watchdog resets, but without any of
the printf's in my semaphore routines, I again get a "lockmgr: no
context" panic plus the machine locks solid - cannot get to ddb.  I'm
wondering if the semaphores are helping at all...

Matt Green, booted a kernel with the semaphore stuff on his SS20 with
dual supersparcs (which doesn't need them so is a good test to see if
I broke anything) and it seemed to work ok, so I've uploaded the
actual kernel that panics and locks up on my machine to
ftp.netbsd.org:~sjg/tmp/netbsd.mp to see what it does on his, feel
free to try it ;-)

If I enable the printfs for say smp_cache_flush only, all runs fine
but we eventually hang after getting to: 

IPsec: Initialized Security Association Processing.
root on sd0a dumps on sd0b
root file system type: ffs
...
{0}sema_init(0xf02b1534, 0, semcflush)
{0}sema_signal(0xf02b1534,1) == 1
{0}sema_wait(0xf02b1534) == 0
{1}sema_signal(0xf02b1534,1) == 1
{0}sema_wait(0xf02b1534) == 0
{0}sema_clear(0xf02b1534) count==0, sleepers==0
[BREAK]
Stopped at      cpu_Debugger+0x4:       jmpl            [%o7 + 0x8], %g0
db{0}> ps
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
 5                  0          0          0 3 0xa0204         aiodoned semvseg
 4                  0          0          0 3 0xa0204          ioflush  syncer
 3                  0          0          0 3 0x20204           reaper  reaper
 2                  0          0          0 3 0xa0204       pagedaemon pgdaemo
 1                  0          1          0 3 0x84004             init vmmaplk
 0                 -1          0          0 3 0xa0204          swapper schedul

Does this suggest that the reaper has the kernel lock?  Doesn't look
like he'd ever hold it when he goes to sleep.
All the others have P_BIGLOCK set though. Interesting that aiodoned is
shown sleeping on semvsegment, which is cache_semaphore when used for
smp_vcache_flush_segment(), yet the semaphore was last used by
smp_cache_flush() and is clear (see below).  So not sure why aiodoned
is sleeping on semvsegment still.  Because he couldn't be woken up due
to P_BIGLOCK? Hmm, should sema_signal() be doing anything with the
kernel lock?  Should probably have the sleeper decrement sleepers when
he wakesup, rather than when sema_signal calls wakeup_one(), I'll try
that shortly.

db{0}> x/x cache_semaphore
cache_semaphore:        0
db{0}> 
cache_semaphore+0x4:    f02371c0
db{0}> 
cache_semaphore+0x8:    0
db{0}> 
cache_semaphore+0xc:    0
db{0}> 
cache_semaphore+0x10:   0
db{0}> 
cachestats:     88
db{0}> x/s f02371c0
openboot_special4m.194+0x498:   semcflush
db{0}> trace
zsc_intr_hard(0x8, 0xf0600ed0, 0xf0254800, 0xfe000000, 0x809c4000, 0xa00) at zsc
_intr_hard+0x68
zshard(0x0, 0xf01a514c, 0x0, 0xf00, 0xf0002000, 0xf00) at zshard+0x40
sparc_interrupt44c(0x1e9000e5, 0xf0293c00, 0xfe000004, 0x0, 0xf0002000, 0xf00020
00) at sparc_interrupt44c+0x120
mi_switch(0xf605d588, 0x80, 0xf606b220, 0xf605d588, 0xf0257a40, 0x3) at mi_switc
h+0x1cc
ltsleep(0x0, 0x28, 0xf02122c8, 0x64, 0x0, 0xf0269800) at ltsleep+0x2b0
sched_sync(0xf0258400, 0xf0258400, 0xf0254400, 0xf0212000, 0xf0259800, 0xf026340
0) at sched_sync+0x210
proc_trampoline(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) at proc_trampoline+0x8
db{0}> 

At least we got as far as exec'ing init (sort of ;-)

Anything jump out you?

--sjg