tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: hangs with awge(4) on pine64 rock64 board



On Sun, Jul 21, 2019 at 03:17:18PM +1000, matthew green wrote:
> hi folks..
> 
> 
> i've been debugging a hang on the rock64.  it's fairly easy to
> trigger -- send a lot of data at it.
> 
> from ddb i would usually see one cpu with an lwp, usually the
> idle lwp, fast lwp switched to softnet, and again fast switched
> to the softser lwp.  it seemed to be a kernel lock issue as the
> kernel lock was held and at least one thread was waiting for
> it.  i couldn't really tell what was up.
> 
> i tried enabling NET_MPSAFE (which changes the behaviour of
> awge(4) / dwc_gmac.c, beyond the network stack.)  that kernel
> ran for a lot longer, but ended up locking up again, this time
> the rt_lock was being waited upon.  but again, i couldn't find
> where it was held or what context should be giving it up, though
> i did again think about arm's pic_dispatch() being the last
> lock and unlock of kernel_lock.  then i realised that even with
> NET_MPSAFE, awge(4)'s frontends don't setup MPSAFE interrupts.
> with a kernel patched to do that under NET_MPSAFE i've had over
> 5 hours of heavy network access without a hang.
> 
> i don't know what is the underlying issue here.  it could be
> some network stack bug, it could be an awge/gmac bug, it could
> be an arm or arm64 bug.. 
> 
> anyone have a clue where to investigate next?  alternatively,
> how far off is NET_MPSAFE default? :)

It looks like something I fixed some time ago in the arm pmap:
http://mail-index.netbsd.org/source-changes/2019/04/23/msg105355.html

maybe arm64 has a similar issue.


-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index