Subject: Re: Data corruption with dump (mmap related??)
To: Chris G. Demetriou <cgd@sibyte.com>
From: Wayne Knowles <w.knowles@niwa.cri.nz>
List: port-mips
Date: 08/27/2000 16:09:06
On 26 Aug 2000, Chris G. Demetriou wrote:

> > In doing this test I observered a very useful peice of information:
> > 
> >   The buffer addresses that dump reads into are not aligned - in fact they
> >   are offset by 4 bytes from an aligned address.  It also fails when I set
> >   it to 8 (see below)
> 
> where does it fail when you set it to 8?  is it always the last 4
> bytes, or in the 8-byte case is it the 4 previous to that?

Chris,

Sorry - I forgot to mention that - it corrupts the last 8 bytes of the
block.  Similar with +12 case, so the corruption grows with the change in
offset.

> 
> Raw devices cause user pages to be wired and the data to be DMA'd
> directly.  One potentially interesting question here is, is the cache
> flushing routing going to work right.  All of the mips cache flush
> routines assume that the source and destination are line-aligned...

What I did the other day tracking down the cause was overflush the cache.
I added 32 to the size flushed and it didn't change tha fault.  AT one
stage I flushed the entire cache to be sure.

> > In the above case if buf=0x20044804 or 08 or 0c and we do a 2k read it
> > will fail.
> 
> how about greater values?  does one of your caches have 16-byte lines?

It works when we hit 0x10 and above - I didn't go much higher than 0x14
which also worked according to my notes.

> if the only values that work are < your cache line size, this is
> probably a flushing bug.  (if not, it's something in fault or TLB
> handling.)

Cache line size is 4 bytes on a R3k... simple and easy - no L2 cache.  It
is also a physically indexed cache so unless I'm mistaken it doesn't need
any flushing to be coherent between context switches (section 6.2.1 of
Schimmel).   
We should have cohenerncy between kernel and userland when the TLB entries
get updated.

> what kinds of cpus are in the machines that you noticed this on?
> (MIPS I, MIPS III, etc.)

I am experiencing it on a MIPSI class CPU.  Machine is MIPS Magnum 3000 
Orginally I thought it was something wrong in my port, until 
Toru Nishimura duplicated the fault with NetBSD/pmax (I'm not sure of the
exact configuration, but I assume R3K based)

Nobody else has reported anything back yet (either good or bad)

NetBSD 1.5E (RC3230) #35: Thu Aug 24 02:03:22 NZST 2000
   wdk@grunt.zl2bkc.niwa.cri.nz:/home2/netbsd/src/sys/arch/mipsco/compile/RC3230
Mips 3230 Magnum (Pizazz)
total memory = 33554432
avail mem = 29253632
using 435 buffers containing 1781760 bytes of memory
mainbus0 (root)
cpu0 at mainbus0: MIPS R3000 CPU (0x230) Rev. 3.0 with MIPS R3010 FPC Rev. 4.0
cpu0: 32KB Instruction, 32KB Data, direct mapped cache

> it seems to me that if it's not a cache problem, then it's gotta be
> something weird happening when that next page is being mapped into
> your address space, and for that i'd look at the 'pagefault' code in
> trap.c...

Thanks... will take a closer look in that area.

It would also be nice to know if the problem exists on a R4000 machine...
anyone else out there???

Wayne
-- 
  _____	   	Wayne Knowles,  Systems Manager
 / o   \/   	National Institute of Water & Atmospheric Research Ltd
 \/  v /\   	P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
  `---'     	Email:   w.knowles@niwa.cri.nz