Subject: Re: interesting sonic results...
To: Mark Abene <phiber@radicalmedia.com>
From: Wayne Knowles <wdk@frc.niwa.cri.nz>
List: port-arc
Date: 02/16/2001 08:48:14
On Thu, 15 Feb 2001, Mark Abene wrote:

> I've enabled "ethdebug" in the sonic driver, and was surprised to see that
> even though userland was frozen, I could still see incoming packets hitting
> the sonic driver if I pinged the Magnum.  Of course there were no echo replies
> being sent by the Magnum, so the plot thickens.  Also, I saw periodically...
> 
> sonic: receive descriptors exhausted
> sonic: receive buffers exhausted
> 
> ...which I normally don't see without ethdebug enabled when pinging the hung
> machine.  Not sure why.  But I'm still leaning towards a possible DMA
> corruption problem.  I can think of no other reason why the kernel's IP stack
> would no longer reply to echo requests after userland hangs.  This has really
> turned out to be one hell of a puzzle!

Mark,

When a packet is received it is placed into a FIFO queue by the
ethernet driver and a software interrupt is scheduled via schednetisr()

Once the kernel is in a safe state the software interrupt will be
processed.  By the sounds of it either the packet isn't getting queued for
processing or the softnet inetrrupt isn't getting delivered because the
kernel is tied up doing other thins.

Try and place a Debugger() or panic() call in a good spot in the network
drivers and do a stack trace which will show what the kernel was upto when
the interrupt arrived, and you can also see the CPU registers to see what
interrupts are mosked off etc.
Repeat a few times to see if the kernel is looping in a constant area when
the problem occurs. 
By the sounds of it it is tied up at a high spl level where serial
interrupts (preventing DDB entry) as well as softnet interrupts are always
masked.... 

Warning are probably a side effect of the problem since nothing is
dequeuing the packet.

Also if your splnet mask isn't set correctly you could receive a recursive
network interrupt that will smash your FIFO queue pointers....
Same applies with spltty, splbio, splclock and others.  Have you updated
intr.h to reflect the correct interrupts??? 

Regards,
Wayne
-- 
  _____	   	Wayne Knowles,  Systems Manager
 / o   \/   	National Institute of Water & Atmospheric Research Ltd
 \/  v /\   	P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
  `---'     	Email:   w.knowles@niwa.cri.nz