Subject: _delay: and SCSI
To: None <port-sun3@NetBSD.ORG>
From: Michael Richardson <mcr@latour.sandelman.ocunix.on.ca>
List: port-sun3
Date: 12/04/1995 23:42:12
  The comments in the ncr5380 code says that we get screwed because
delay takes a lot longer than it should.
  Looking at the code, I can sort of see why, but not entirely.
  sun3_startup.c:sun3_verify_hardware() sets cpuspeed=20. Right.

  At 20Mhz, a CPU cycle takes 50ns. Working things through, I see
things work out properly. (I wrote three paragraphs and then noticed
that the subql is taking 8 off each time). 

  If the overhead is 80 = 10 loops @ 400ns = 4us, isn't a great deal
of this overhead having to do with the multiplication? Alas, I don't
think my good old 68000 timing charts will help me here. 
  The comment indicates that the minimum delay is about 5 us. By the
time you do the multiplication, manipulate the stack, I can see why.
How important is this to the scsi code?

  Instead of multiplying, why not put one loop in another? Hmm. Let's see.
  e.g: (please excuse my rusty motorola syntax 68k)

	.globl	_delay
_delay:
	| d0 = usec
	| d1 = _cpuspeed
	movel	_cpuspeed,d1
	| save it somewhere safe
	movel	sp@(4),a0
	subql	#1,d1
	| subtract some overhead (number to be determined)
	moveq	#80,d1
	subl	d1,d0
Ldelay2:
	movel	a0,d0
| This loop takes 8 clocks per cycle.
Ldelay:
	subql	#8,d0
	jgt	Ldelay
	dbf	d1,Ldelay2
	rts

  This adds 8 clock ticks per Mhz of CPU. 
  That is the actual delay is _cpuspeed*8/_cpuspeed=8 us longer than
expected. Hmm. No much better. 
  Perhaps we need a dispatch table with a _cpuspeed=16,20,25 Mhz _smalldelay()?