Subject: Re: Nubus card debugging
To: David A. Gatwood <dgatwood@gatwood.net>
From: Scott Reynolds <scottr@clank.org>
List: port-mac68k
Date: 09/06/2002 02:34:31
On Monday, August 5, 2002, at 03:43 PM, David A. Gatwood wrote:

> Judging by that, the code is really, really overkill, and could be
> reduced to eight accesses (four pairs) and two non-nested for loops.

It's not as much overkill as you seem to think. In its worst case the 
current code makes 32 accesses. Sure, it's 4x what you're talking about 
here, but how much time does this actually take? (By the way, notice 
how the iterator in the outer loop is not merely decremented but rather 
right-shifted.)

> char array[4];
>
> for (i=1; i<=4; i++) {
> 	array[4-i] = 0;
> 	if (mac68k_bus_space_probe(base + size - i)) {
> 		array[4-i] = inb(base + size - i);
> 	}
> }
>
> for (i=1; i<16; i++) {
> 	if (i<2) pos=1;
> 	else if (i<4) pos=2;
> 	else if (i<8) pos=3;
> 	else pos=4;
> 	if (array[i] == (~(i << 4) & 0xf0) | i) {
> 		break;
> 	}
> }

I'm not sure why this is any more clear than the existing code. If 
you're concerned about efficiency I'd propose that you simply move the 
mac68k_bus_space_probe() and the bus_space_read_1() (or in your case 
the inb()) out of the inner loop and use a single temporary variable, 
rather than an array. This reduces the worst case number of accesses 
from 32 to 8, just as yours does.

Also, all of these conditionals will add significantly to the execution 
time of your code, as they'll cause some amount of flushing to happen 
in the execution stages on missed branches. I'm not sure your code is 
more efficient than mine even without modification. At least it puts a 
bit more load on the instruction cache. :-)

> if (i== 16) {
> 	// no (valid) card present
> } else {
> 	// the byte lanes are a bitmask stored in i,
> 	// i.e. byte lane 0 is 2^0, byte lane 1 is 2^1, etc.
> }

Finally, and this is just a personal preference, having to test an 
index to see if it went past the end of the values used in a for loop 
seems less clear than using a flag variable. I generally code flags 
into loops simply because they make the intent of the loops clear. They 
are not clever, nor are they particularly efficient, but I'd rather not 
lose clarity for the sake of efficiency when it's simply not critical.

I seem to recall that there's a relevant Dykstra quote about the evils 
of optimizing, but I'll be hanged if I can remember it...

--scott