Subject: Re: Xscale optimisations
To: None <Richard.Earnshaw@arm.com>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 10/14/2003 18:07:09
> > Mmmm IIRC we only ever saw bursts of the memory bus for cache line writes.
> > (Although it wsn't me driving the analiser that day.)
> 
> Hmm, yes, I suspect I was mistaken on that.   The SA110 timing apps note 
> does seem to confirm your observations.

yes - we were expecting to see burst writes, but didn't....

> > I know I got faster memcpy (on sa1100) by fetching the target buffer
> > into the data cache (an lda offset by a magic number would do the trick,
> > didn't stall since the target data was never used!)
> 
> Which would be faster would probably depend on the relative 
> sequential/non-sequential times and the number of words to be written to a 
> line.  Plus some compensation for the fact that other useful data will 
> likely be cast out of the cache.  It is believable that 2(N+7S) < 8N (ie 
> 2.33 S < N) for many memory systems and thus that fetching a line into 
> cache would most likely be more efficient than writing to memory that was 
> out of the cache.

N = first, S = subsequent

> Actually, the DNARD PAL comments suggest it's more complicated than that: 
> AFAICT a cache line fill will take 14 clock ticks and a line write 12 
> clocks.  8 individual stores could take as many as 56 clocks, so there 
> would be a clear win to pre-fetching the line (potentially a factor 4 
> performance improvement).

We wre running with a 200MHz (ish) clock and:
        .long   DRAM_CTL        /* MDCNFG - DRAM config */
        .long   0x0c0c0c0f      /* MDCAS0 - DRAM CAS wave form */ 
        .long   0x0c0c0c0c      /* MDCAS1 - DRAM CAS wave form */
        .long   0xfffffffc      /* MDCAS2 - DRAM CAS wave form */

#define DRAM_DE         0xf     /* Enable banks 0 - 3 */
#define DRAM_DRAC       3       /* 12*10 drams */
#define DRAM_CDB2       0       /* CAS clock is DCLK */
#define DRAM_TRP        3       /* RAS precharge 30ns */
#define DRAM_TRASR      5       /* RAS width during refresh 60ns */
#define DRAM_TDL        3       /* number of cycles after CAS to latch data */
#define DRAM_DRI        368     /* number of cycles between DRAM refreshes */

#define DRAM_CTL        ((DRAM_DRI   << 17) | (DRAM_TDL  << 15) | \
                         (DRAM_TRASR << 11) | (DRAM_TRP  <<  7) | \
                         (DRAM_CDB2  <<  6) | (DRAM_DRAC <<  4) | DRAM_DE)

but I no longer have the datasheets nor a system.

	David

-- 
David Laight: david@l8s.co.uk