Subject: Re: FreeBSD Bus DMA (was Re: AdvanSys board support)
To: \"Justin T. Gibbs\ <gibbs@plutotech.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 06/10/1998 23:10:15
Justin,

I dont see how this answers the specific case I asked about at all.
Let me try again.  Suppose we have the following fictional case, based
on closely a real one with some names changed to somethign you may
be more familiar with.

Suppose we have an Alpha CPU with a physical address space greather
than 4 Gbytes.  The Alpha CPU has an PCI bus attachment.  To DMA any
datum over the PCI bus, any datum *at all*, we must set a hardware
mapping register _in the bus controller_. The PCI bridge has mapping
registers for each 8K page of PCI memory space. Each register maps an
8K page of bus-address-space used for the request (on the PCI bus) to
a specific system memory address.  If that mapping isnt' set up ahead
of time, we get a bus error and/or a machinecheck.  This bus-adaptor
mapping is indepndent and orthogonal to any "scatter/gather" mapping
that goes on in the host.

The real example I'm asking about is even worse:
the topology  looks like

    CPU <-- bridge 1 --> 32-bit bus <--bridge 2 -->  16-bit bus

where there are two independent mappings, with different pagesizes,
one on each of the bridges.  Imagine that there's an ISA bridge on the
far side of the PCI bus, and that ISA bridge has its *own* mapping
registers, for 4k pages, for its entire 16meg DMA space. Again, no DMA
to or from ISA space is possible without setting up a mapping from ISA
address to PCI address, and from that PCI address to system addresses.

On systems like this, you can't _ever_ get away with using a
"default map".



Perhaps the special case of a linear, no-op mapping from bus addresses
to system memory addresses comes up often enough that it's worth
optimizing for.  But as far as I can see, changing the API to assume
there's a "default object" and to use callbacks, just doesn't work.
More details below. If I'm wrong, I'd be very happy for you to point
out where and why.

>> Am I misunderstanding what you mean by
>>hierarchy?

>Most likely.  Perhaps my example above clears this up.

No, if anything it makes me more confident that I'm not misunderstanding.
One basic problem I keep coming back to here is bus adaptors which
require DMA-mapping setup. I havent seen you address those cases.
Am I missing something?


>In order to cut the space, you must move to a callback to win in all cases.

Yes. But there are machines where the above _has_ to be done, for all
transfers, even for devices where (in your worldview) there 

	   "is no S/G map"

and so the information to construct the relveant mappings simply
doesn't exist.

I see two problems here:

1) In your world, (again, if I understand it correctly) the device
driver can decice that since _it_ doesn't need a map for S/G purposes,
it needn't construct one at all.  As above, this just doesnt work.

I don't see any way to square that with the original claim, that (in
comparison to your reworked system), the NetBSD interface

    }sacrifices speed and memory resources for absolutely no gain in
    }portability.

Am I missing something here?


2) I just don't think the callbacks really cut it.  You're trading
   space for time, on the assumption that most of the time, the
   address mapping required by the bus adaptor (e.g., host bridge) is
   the identity mapping (system memory addresses and bus addresses for
   DMA are the same).  That may be a good assumption for x86es, but
   it's just not a valid assumption for the machines NetBSD runs on.

>The
>AdvanSys controllers, for instance, simply PIO their S/G list directly to
>the card (not a great design, but that's what it is) so no static storage
>of any type is wanted.

On an x86, perhaps. But there are other machines where static storage
_is_ needed, because even this kind of device *Just Wont Work* unless
you also set up mapping registers in the CPU-to-IObus bridge, or
bridges, with the DMA address used by the transfer. And, possibly,
tear them down when it's done.

(sorry to keep hitting it, but I did say this before, but the point
seems not to have gotten through.)


> If you are willing to force a single S/G copy, you
>would have to export the S/G list format in some way into the MI code so
>that it could be constructed properly.  This could turn nasty.

Yes.  As i keep saying, there are systems where you have *no choice*
but to do this. And it could indeed turn very nasty, especially if the
I/O topology is such that you need to walk the dmamap more than
once. I know of two or three sytems that _need_ that, just off the top
of my head.  I think in your design, that just doesn't work at all
(due to the "lifetime" restrictions of the callback.  Is that right,
or did I read too fast?

So, i think the right way to avoid the "nastiness" is to live with the
MI representation of a dmamap.

(BTW, the dmamap isn't a "S/G". It can be _used_ for that, but it may
also be needed for setting mapping registers in a host bus adaptor.
Which seem to be outside your definition of when an "S/G" is
necessary.  Maybe I'm wrong, but could usign the "S/G" terminology be
clouding some of these issues?)

>The CAM SCSI layer is quite paranoid about keeping the order of
>transactions the same as that specified by the client.  

So the SCSI CAM layer blocks?  Good for it.  But my question was
specifically asking about network interfaces and the network
subsystem.  If the SCSI drivers or NIC drivers block, they could use
the WAITOK flag to NetBSD's bus_dma interface and so serialize their
memory requrements.  That could be achieved inside the bus-dma layer
for a given host/bus combination. What is it I'm missing here?


Aside: 

I'm a bit puzzled about the 256k for a 1Mbyte transfer with a 1542,
and the claim that FreeBSD cuts that in half.  I get 8 bytes of DMAmap
per 4k page, which at 256 entries for 1 Mbyte, comes to 64k
bytes. that doesn't seem like a horrendous overhead, for that big a
transfer, on top of what the controller itself needs.  (and note that
in this case the dmamap need't be in <= 16M memory).

I dont know the device intimately enough to calculate how much space
it needs, but I dont see where you can get any more than a 64k saving.
then again, I admit I'm a neophyte at SCSI controllers, so maybe Im
missing something.