Subject: Re: bus_space and barriers
To: Witold J. Wnuk <w.wnuk@zodiac.mimuw.edu.pl>
From: Witold J. Wnuk <w.wnuk@zodiac.mimuw.edu.pl>
List: tech-kern
Date: 10/20/2000 23:10:42
I'm bit repeating myself in this mail, but I want to make it clear.


On 19-Oct-00, 15:00:27 Witold J. Wnuk wrote:
>  
>  On 18-Oct-00, 23:52:44 Chris G. Demetriou wrote:
> >  "Witold J. Wnuk" <w.wnuk@zodiac.mimuw.edu.pl> writes:
> > > [ ... ]
> > > It would need to be read-write barrier after read and after write
> > > in forth case then. Alternatively before.
> >  
> >  yes, the question is, how do you encode that reasonably.  on the
> >  alpha, read-write barriers really are fairly expensive, and for
> >  many,
> >  many devices all you need are write barriers a very large fraction
> >  of
> >  the time.
> >  
> >  The point is, if you embed them, then you you pretty much need to
> >  choose something close to the most complete combination... which
> >  can
> >  be kinda painful.
>  


More data:

Common context:

1. Setting device parameters - 3-10 register writes - usually no
barriers needed at all (some hardware may require specified order but it
isn't common)

2. Starting DMA - setting device and DMA parameters and than starting
transfer (for example by writing GO bit in status register; sometimes
DMA starts after address is loaded) - write barrier needed before GO

3. Updating register - reading and writing modified value - no barriers
are needed - order is logically enforced

4. Reading status register and conditional action - order is logically
enforced

5. Using indirect access registers (write) - writing DATA register and
then writing INDEX register or writing INDEX register and then writing
DATA register - write barrier needed

6. Using indirect access registers (read) - writing INDEX register and
then reading DATA register - read-write barrier needed

7. Same but with polling control register in between - read-write
barrier needed before polling

8. Polling - no barriers needed in and after

9. Probe - write and read back - read-write barrier needed

10. Other similar uses, perhaps some not similar at all



Example - dev/pci/fms.c needs one read-write barrier and three write
barriers - for 8 reads and 26 writes


Yes, excesive barriers are no good.



It does not seem possible to have only some implicit barriers (ie. only
write barriers after writes). It would not reflect common use and will
be very messy and hard to understand.



And, there is nothing wrong with having two sets of functions - one with
barriers and one without. It can't be worse than it is now.


And, I am more convinced now, bus_space_read should do read from the bus
POV, bus_space_write should do write from the bus POV. C guarantes that
functions are executed in order. We should mimic that behaviour.

This is what bus_space is for, right?



Converting driver to be correct with current semantics is easy for
person who knows the hardware in question. 5 to 20 minutes of work.
It is very risky process though. Mistake would be almost imposible to
find without examining the code - running the driver on Alpha wouldn't
help much.




It is possible to save "mb" between writes:

something like:

extern int readsp;

#define write() do {                            \
                if (readsp) asm("mb");          \
                else asm("wmb");                \
                readsp = 0;                     \
                asm("write");                   \
        } while (0)


#define read() do {                             \
                asm("mb");                      \
                asm("read");                    \
                readsp = 1;                     \
        } while (0)

o()
{
        write();
        write();
        write();
        read();
        read();
        read();
        write();
        read();
        write();
        read();
        write();
}

Will be optimized by gcc to:


        cmpl $0,readsp
        je .L5
        mb
        jmp .L6
        .align 4
.L5:
        wmb
.L6:
        write
        wmb
        write
        wmb
        write
        mb
        read
        mb
        read
        mb
        read
        mb
        write
        mb
        read
        mb
        write
        mb
        read
        mb
        write
        movl $1,readsp


It is a hack, but it works. (it needs -O2 to be optimized; if write()
and read() were much more complicated they wouldn't be optimized)
(note that this is i386) It is questionable how it will behave with
interrupts, etc; but I couldn't find example where it would fail.




Ideas, comments?

Should I assume that nobody disagrees with the need for two separate
versions - with barriers and without barriers?

How to name them?


Greetings,


        Witold J. Wnuk




PS: there are some questionable cases. for example - write and _then_
delay(). write barrier won't help here. barrier and read and use of
read value will. if I understand hardware manual right (I'm not sure
that I do - I have just glanced at it) 21264 may wait as long as 1000
cycles before flushing write buffer.