tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Problems with hangs under NetBSD-5.x



On Thu, Aug 06, 2009 at 04:42:13PM +0100, Mindaugas Rasiukevicius wrote:
> Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote:
> > It turns out my issue seems to be caused by a hardware bug.
> > In my case the kernel was completely dead exept I still could
> > enter ddb from serial console. Disabling hyperthreading seems to have
> > helped in my case.
> 
> Why do you think it is a hardware bug i.e. do you have some way to validate
> this? No problem with disabled HT can mean synchronization issue.

See my post to port-i386/port-amd64, but basically:
diagnostic code I added show we call splx() with a bogus value
(which I guess is what's causing various problems later when it's not checked)
Looking in ddb at the address where the value comes from, it's
correct.
This bogus value always comes from a struct kmutex, which is
on i386 32bits wide and is read/written as bytes. The byte next to
mtxs_ipl is the simple lock used for the mutex ...
mtxs_ipl itself is initialised when the lock is created and never changed
after.

> 
> Also, if it's CPU issue (e.g. it requires special patch/workaround in
> software side), then issues would be seen in random subsystems,

in my case I suspect locked byte operations can affect values around
the byte on the other HT in a non-trivial way.

> while
> problem is currently isolated in VFS/FFS layers.

No, in my case it was a complete hang of the system: dead VFS, dead
network, dead soft interrupts (including softclock) on CPU0.
hardclock and serial were still working.
So we may be talking about different issues here.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index