Subject: Re: Data corruption with dump (mmap related??)
To: Chris G. Demetriou <cgd@sibyte.com>
From: Wayne Knowles <w.knowles@niwa.cri.nz>
List: port-mips
Date: 08/27/2000 20:59:00
On 27 Aug 2000, Chris G. Demetriou wrote:

> Wayne Knowles <w.knowles@niwa.cri.nz> writes:
> > What I did the other day tracking down the cause was overflush the cache.
> > I added 32 to the size flushed and it didn't change tha fault.  AT one
> > stage I flushed the entire cache to be sure.
> 
> (and those didn't help, i'm assuming. 8-)

Correct - No change to the problem

> more thinking aloud:
> 
> * you might check to see if some other chunk of space is being
> clobbered with the data you want (some suspects might be the area
> right _before_ the buffer read address, for instance).  (if it really
> were this, i wouldn't expect it to get better by retrying though.)
> (a more complex check would, upon detecting the error, troll /dev/mem
> to look for the desired pattern probably in the first bytes of some
> page, and see how that page and your process's physical page at that
> VA relate.)

I suspect that could be the happening, but as you know difficult to prove.
The machine is good for several make build's over many days without a
problem, but just before rsync dies on me for no explainable reason and a
repeat worked :(

> 
> * does your DMA controller for your SCSI chip have any 'weird'
> alignment requirements?  in particular, looking through your asc.c
> code i note you have code to prime the DMA fifo if the start address
> isn't block aligned, is there anything similar necessary for the
> tail-end of the buffer, and in particular does a partial fifo's worth
> of data at the end of an xfer get dumped into memory?  in
> asc_dma_intr, i notice resid has a max value of 15, and you seem to
> throw away any data in the FIFO in the !DMA_PULLUP case... if the
> residual data doesn't get dumped into memory, you'd lose it here.
> might add a printf, unless you know this isn't the cause.  (again,

The DMA has a number of restrictions - mainly it has to be aligned at 64b
and it can only xfer 64bytes at a time.

The !PULLUP case is writing data to the scsi controller.  We need to fill
the FIFO to complete the DMA operation for a SCSI read, but on write it
has the data already so we can just flush the FIFO when the operation is
complete.

> behave the same way. 8-) (BTW, in the priming code, why don't you use
> read/write_multi_2? 8-)

Good point about multi_2 - will change that shortly.

My impressions are that the SCSI DMA stuff isn't related to the problem
since the userland buffer isn't being changed at all.  If it was the DMA
FIFO it will get over-written with some random data from the fifo.

I have managed to add some debug code to uiomove that picks up on the
situation - hopefully I can use that do trigger some extra debugging.

Wayne
-- 
  _____	   	Wayne Knowles,  Systems Manager
 / o   \/   	National Institute of Water & Atmospheric Research Ltd
 \/  v /\   	P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
  `---'     	Email:   w.knowles@niwa.cri.nz