Subject: Re: Data corruption with dump (mmap related??)
To: Wayne Knowles <w.knowles@niwa.cri.nz>
From: Wayne Knowles <w.knowles@niwa.cri.nz>
List: port-mipsco
Date: 08/29/2000 11:09:12
On Mon, 28 Aug 2000, Wayne Knowles wrote:

> On 27 Aug 2000, Chris G. Demetriou wrote:
> > If it was read from a raw device, the device is the userland buffer
> > would be filled directly by device DMA.
> 
> That was what I was trying to find out in all my unix books and sources
> yesterday, but couldn't find that one simple sentence.
> 
> I can now see where you are coming from.   Yes, you were pointing me
> back in the right direction.  There is a chance the last FIFO flush didn't
> happen on the last block - but I think that is less likely.

Problem solved!!

The cause of the problem was the DMA chaining interrupt was at a lower
priority than the NCR SCSI controller.

The last segment to be DMA'ed was only 4 bytes in length which fits into
the NCR 53c94 output FIFO.   The DMA controller will interrupt requiring
the DMA to be setup for the last 4 byte transfer.

If that DMA chaning interrupt couldn't be serviced immediately (higher spl
level when kernel doing something else) a few microseconds later the NCR
controller will fill its FIFO and also interrupt the CPU.

The SCSI interrupt sees the terminal count has been reached, calls
asc_dma_intr to finish the job off.   The FIFO cannot be flushed because
the block count hasn't been setup for the last dma segment (DMA chaining
still wasn't serviced).

Since the NCR 53c94 FIFO is only 16 bytes in size, any short DMA in this
size combined with the machine 'doing something else' causes the problem
to occur.

Servicing the DMA chaining interrupt before the NCR SCSI interrupt solves
this problem. 

> 
> Looks like the problem is somewhere around bus_dmamap_load where is builds
> the DMA segments for dma chaining.   IIRC, the same bus_space.c code is
> shared between mipsco, pmax, hpcmips and arc ports.   Hopefully that is
> why pmax is also infected.

Everything in bus_dmamap checks out correctly - I went thru with a fine
tooth comb last night.

I'm not sure why the pmax fails a similar test.  Perhaps it suffers from a
similar problem with the DMA and the FIFO.  It also has a NCR SCSI
controller.

Will commit the fix later today along with some extra paranoia checks in
the DMA code.

A big thanks for your help Chris.

Wayne
-- 
  _____	   	Wayne Knowles,  Systems Manager
 / o   \/   	National Institute of Water & Atmospheric Research Ltd
 \/  v /\   	P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
  `---'     	Email:   w.knowles@niwa.cri.nz