Subject: Cache on 3/260 - how does it work?
To: None <port-sun3@NetBSD.ORG>
From: David Jones <dej@eecg.toronto.edu>
List: port-sun3
Date: 04/08/1995 20:41:14
I have been having plenty of unexplainable problems with NetBSD on my 3/260.
I have now looked closely at two reproducible panics, and I think the problem
might be with the 260's VAC.

First, a question: What does it mean for the VAC to be disabled?
Does it mean:

1) That tag lookups no longer occur, so all accesses go to the main memory, or
2) That tag UPDATES in response to reads/writes no longer occur.

If #2 happens, but not #1, then when the VAC is disabled, accesses to previously
cached data will still fetch the cached data.  In other words, in addition to
disabling the cache, you also need to flush it.

I understand that there are no docs on the 3/260 VAC.  If there were, we'd be
enabling it and flushing it properly.  In any case, on to my observations:


case 1:

panic in vm_page_activate()

Registers:

00000004 00002404 0001e000 00000000 00000001 00000000 00000001 02048000
30000000 0e100024 02033b38 0e58f000 0e565180 0e113260 0ffe3e98 0dfffd8c

Exception frame is at 0xffe3e38 and is as follows:

SR: 2000
PC: 0e0705f8
Format/vector: b008
SSW: 0155 (data fault, byte-read, supervisor data)
VA: 2033b5b

vm_page_activate starts as follows:

link a6,#0
movl a2,sp@-
moval a6@8,a2
btst #0,a2@(0x23)

Stack frame at 0xffe3e94:

0e11d620  - previous value of a2
0ffe3ef8  - old A6, A6 now points here
0e06b21a  - return address
0e11d620  - argument to vm_page_activate()


Now, it is clear that "a6@8" refers to the argument to vm_page_activate(),
in this case 0xe11d620.  So you'd expect 0xe11d620 to be in a2 at the time
of the fault.  Yet, the register dump clearly shows a2 to be 0x2033b38.
And indeed, 0x2033b38+0x23 is 0x2033b5b, the faulting address.

How did a load of 0xe11d620 result in 0x2033b38 being read?

Note that the argument passed to vm_page_activate() is stored
on the stack at VA 0xffe3ea0.


case 2, another panic:

panic in lock_done()

Registers:

00000004 0000000d 0000a000 00000000 0000c000 00000001 00000000 00000001
00000004 000006ed 0e10d860 0e52a500 0e52e600 0e10d838 0ffe3e8c 0dfffdc8

Exception frame is at 0xffe3e30 and is as follows:

SR: 2000
PC: e068b54
Format/vector: b008
SSW: 0145 (data fault)
VA: 18

Stack frame is as follows:

ffe3e8c:
  ffe3e98   old A6
  e06cf46   return address
        4   argument to lock_done()

ffe3e98:
  ffe3ef8   old A6
  e069ca4   return address, called from vm_map_lookup_done()
	0   argument to vm_map_lookup_done: map
  e52e5c0   argument: entry

Next few values are the saved registers d2-d7/a2-a5 saved by vm_fault().
They are irrelevant.

ffe3ed0:
  e0771be
  20000000
	1
	0
	5
     a000
  e52e540
  e52e5c0

ffe3ef8: (the frame pointer of vm_fault)
  ffe3f44   old A6
  e07e308   return address
  e52a500   map, argument to vm_fault()
  c000      va, argument to vm_fault()
  1         type
  0         wiring


Prior to the call to vm_map_lookup_done, vm_fault() was executing as follows:

movl a6@(-0x4),sp@-
movl a6@(0x8),sp@-
bsrl _vm_map_lookup_done

Now, a6@(-0x4) is e52e5c0, yet the value seen by vm_map_lookup_done is 0.
a6@(0x8) is e52a500, and this is correctly passed to vm_map_lookup_done().

How did a load of 0xe52e5c0 result in 0 being read?

Note that the address of the argument passed to vm_page_lookup_done() 
on the stack is VA 0xffe3ea0.

This the SAME address misread by the first case!

I don't think I have a RAM failure - substitution of one value for another
is not symptomatic of RAM failure.  Flipped bits, maybe, but complete
substitution, no.  Could it be a dirty cache line in the VAC?


-- 
David Jones, M.A.Sc student, Electronics Group (VLSI), University of Toronto
           email: dej@eecg.toronto.edu, finger for PGP public key
         For a good time, telnet torfree.net and log in as `guest'.
          Click me!