tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM



On Tue, 11 Feb 2014, David Laight wrote:

> On Tue, Feb 11, 2014 at 04:19:26PM +0000, Eduardo Horvath wrote:
> > 
> > We really should enhance the bus_dma framework to add bus_space-like 
> > accessor routines so we can implement something like this.  Using bswap is 
> > a lousy way to implement byte swapping.  Yes, on x86 you have byte swap 
> > instructions that allow you to work on register contents.  But most RISC 
> > CPUs do the byte swapping in the load/store path.  That really doesn't 
> > map well to the bswap API.  Instead of one load or store operation to 
> > swap a 64-bit value, you need a load/store plus another dozen shift and 
> > mask operations.  
> > 
> > I proposed such an extension years ago.  Someone might want to resurrect 
> > it.
> 
> What you don't want to have is an API that swaps data in memory
> (unless that is really what you want to do).
> 
> IIRC modern gcc detects uses of its internal byteswap function
> that are related to memory read/write and uses the appropriate
> byte-swapping memory access.
> 
> I can see the advantage of being able to do byteswap in the load/store
> path, but sometimes that can't be arranged and a byteswap instruction
> is very useful.

When do you ever really want to byte swap the contents of one register to 
another register?  Byte swapping almost always involves I/O, which 
means reading or writing memory or a device register.  In this case we 
are specifically talking about DMA, in which case there is always a load 
or store operation involved.

The current API we have using the bswap routines is a real pain in the 
neck for DMA.  You really want the byte swaps to happen when needed.  They 
should be controlled by the DMA attributes of the device you're talking to 
along with the characteristics of the CPU and page in question.  A 
big-endian CPU talking to a device that runs only little-endian needs to 
do byte swapping when accessing DMA structures.  But what if the device 
can also support big-endian DMA?  So each driver needs to determine 
whether it needs to do byte swapping during setup time and have code to 
conditionally byte swap data if needed for each access to a structure that 
needs DMA.

> I really can't imagine implementing it being a big problem!

Yes, it a big problem.  For a 2 byte swap you need to do 2 shift 
operations, one mask operation (if you're lucky) and one or operation.  
Double that for a 4 byte swap.  And even if you argue that a dozen CPU 
cycles here or there don't make much difference, the byte swap code is 
replicated all over the place since the routines are macros, so you're 
paying for it with your I$ bandwidth.

Eduardo


Home | Main Index | Thread Index | Old Index