Subject: Re: How do I allocate physically contiguous RAM?
To: None <khym@bga.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 10/30/1997 15:35:34
> On Wed, 29 Oct 97 12:37:28 WET 
>  Dave Huang <khym@bga.com> wrote:
> 
>>  > I'm trying to write an ethernet driver, and would like to get a 32K
>>  > chunk or so of page-aligned, physically contiguous memory (for DMA
>>  > porpoises). Is there an easy way to do this? I took a look at the
>>  > bus_dma stuff, but it looks like it's unimplemented on mac68k. I doubt
>>  > if I have the skills to implement the bus_dma interface, so I'm
>>  > looking for another way :)
>> 
>> You _really_ ought to implement bus_dma... 


>I think Dave's hesitancy stemmed from the fact that any DMA engine
>he's using is on the card, not on the motherboard. It's like bus
>mastering in ISA (I think).

I took that as read. If I can try and fill in some of the edges in
Jason's summary: What you need depends a lot on the NIC board hardware
and the capabilities of the DMA engine it's using.

If the card and DMA engine in question has programmable scatter/gather
registers that can access mbufs (or whatever packet buffers you use)
anywhere in host physical memory, and doesn't have requirements about
physical contiguity for multipage packets (or you can guarantee
packets never span multiple pages) then bus_dma really is not that
much extra overhead.

If for some reason the card/DMA engine insist on a physically
contiguous 32kbyte region of memory -- for packet-buffer descriptors,
or NIC status, or a communication area with the host interface (some
FDDI interfaces reall want almost this much) -- then you do need to
allocate 32kbyte of physically contiguous memory, *and* (to be
portable) you need to call into bus_dma to *make* it DMA-safe.  The
last step is necessary because some systems, like Alphas, or the Qbus
adaptors in the Black Daemon Book, bus_dma has to allocate bus-level
resources to set up a DMA mapping.

If you need the 32K as a physically-contiguous DMAable shared-memory
`packet buffer', like the wd80x3/old SMC cards do, then maybe it's
appropriate to allocate the buffer up front and to explicitly copy
packets into and out of that buffer, like the if_ed driver does (or
did).  See bus_dmamem_map().

I could be misremembering, but I _think_ that's the most recent advice
I got from jason on how to best handle similarly-ugly hardware on the
TurboChannel bus.  The issue here is that bus_dmamap_load_mbuf() (or
_uio()) isn't what you want, because that's a bus-specific method.  It
knows about the constraints of the bus hardware (the mboard DMA
engine, in this case) ebut *not* about any *card*-specific constraints
of your specific hardware. The driver has to deal with those.  Or
preferably, you write a driver that has hooks for a bus-specific
``attachment'' or front-end, that jumps through whatever hoops are
necessary on a given bus.  The LANCE driver is a great exapmle of
this: compare the ISA and PCI front-ends to the TurboChannel
front-ends, for example.

If this is modern hardware with reasonable design, then I'd assume the
NIC chip itself can DMA packets to anywhere in a 32-bit address space;
and any constraints are from the DMA engine. In that case, the
chipset-specific driver shouldn't use *any* knowledge of the
motherboard DMA. It should just make bus-dma or bus.h calls. That way, all the
motherboard-specific stuff is inside bus_dma, and the chips-specific
driver should be useable elsewhere.

I don't know that much about recent Macs, but I think there's so much
variation between models that maybe the Alpha ports' bus.h is a better
starting model than the i386 version.


>There are some serial cards which could use such support
>if it ever gets in. :-)

Yes, we know, we know.  IMO, doing this properly really needs a rework
of the interface between the tty subsystem and the lower-level tty
drivers. ON transmit, you want to pass multiple chars to a serial
device, so the serial device can choose between DMA and per-char I/O.
The receive-side has a more complicated througput/latency tradeoff.