Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

hangs with awge(4) on pine64 rock64 board



hi folks..


i've been debugging a hang on the rock64.  it's fairly easy to
trigger -- send a lot of data at it.

from ddb i would usually see one cpu with an lwp, usually the
idle lwp, fast lwp switched to softnet, and again fast switched
to the softser lwp.  it seemed to be a kernel lock issue as the
kernel lock was held and at least one thread was waiting for
it.  i couldn't really tell what was up.

i tried enabling NET_MPSAFE (which changes the behaviour of
awge(4) / dwc_gmac.c, beyond the network stack.)  that kernel
ran for a lot longer, but ended up locking up again, this time
the rt_lock was being waited upon.  but again, i couldn't find
where it was held or what context should be giving it up, though
i did again think about arm's pic_dispatch() being the last
lock and unlock of kernel_lock.  then i realised that even with
NET_MPSAFE, awge(4)'s frontends don't setup MPSAFE interrupts.
with a kernel patched to do that under NET_MPSAFE i've had over
5 hours of heavy network access without a hang.

i don't know what is the underlying issue here.  it could be
some network stack bug, it could be an awge/gmac bug, it could
be an arm or arm64 bug.. 

anyone have a clue where to investigate next?  alternatively,
how far off is NET_MPSAFE default? :)


.mrg.


Home | Main Index | Thread Index | Old Index