Subject: Re: Xscale optimisations
To: None <Richard.Earnshaw@arm.com>
From: David Laight <firstname.lastname@example.org>
Date: 10/14/2003 18:07:09
> > Mmmm IIRC we only ever saw bursts of the memory bus for cache line writes.
> > (Although it wsn't me driving the analiser that day.)
> Hmm, yes, I suspect I was mistaken on that. The SA110 timing apps note
> does seem to confirm your observations.
yes - we were expecting to see burst writes, but didn't....
> > I know I got faster memcpy (on sa1100) by fetching the target buffer
> > into the data cache (an lda offset by a magic number would do the trick,
> > didn't stall since the target data was never used!)
> Which would be faster would probably depend on the relative
> sequential/non-sequential times and the number of words to be written to a
> line. Plus some compensation for the fact that other useful data will
> likely be cast out of the cache. It is believable that 2(N+7S) < 8N (ie
> 2.33 S < N) for many memory systems and thus that fetching a line into
> cache would most likely be more efficient than writing to memory that was
> out of the cache.
N = first, S = subsequent
> Actually, the DNARD PAL comments suggest it's more complicated than that:
> AFAICT a cache line fill will take 14 clock ticks and a line write 12
> clocks. 8 individual stores could take as many as 56 clocks, so there
> would be a clear win to pre-fetching the line (potentially a factor 4
> performance improvement).
We wre running with a 200MHz (ish) clock and:
.long DRAM_CTL /* MDCNFG - DRAM config */
.long 0x0c0c0c0f /* MDCAS0 - DRAM CAS wave form */
.long 0x0c0c0c0c /* MDCAS1 - DRAM CAS wave form */
.long 0xfffffffc /* MDCAS2 - DRAM CAS wave form */
#define DRAM_DE 0xf /* Enable banks 0 - 3 */
#define DRAM_DRAC 3 /* 12*10 drams */
#define DRAM_CDB2 0 /* CAS clock is DCLK */
#define DRAM_TRP 3 /* RAS precharge 30ns */
#define DRAM_TRASR 5 /* RAS width during refresh 60ns */
#define DRAM_TDL 3 /* number of cycles after CAS to latch data */
#define DRAM_DRI 368 /* number of cycles between DRAM refreshes */
#define DRAM_CTL ((DRAM_DRI << 17) | (DRAM_TDL << 15) | \
(DRAM_TRASR << 11) | (DRAM_TRP << 7) | \
(DRAM_CDB2 << 6) | (DRAM_DRAC << 4) | DRAM_DE)
but I no longer have the datasheets nor a system.
David Laight: email@example.com