Subject: On bounce buffers
To: None <tech-kern@netbsd.org>
From: Frank van der Linden <fvdl@netbsd.org>
List: tech-kern
Date: 05/07/2003 05:44:53
Recently I looked at making the ISA DMA bounce support in the bus_dma
backend for i386 and amd64 more generic, to support bounce buffers for
32 bit PCI devices on systems with more than 4G of memory.

I bumped into some issues, most notably that:

	* The code uses kernel_map for allocation of bounce buffers,
	  making it not safe for interrupts
	* The above means that the BUS_DMA_ALLOCNOW must always be
	  used, since it avoids on-the-fly bounce buffer allocation.
	* On-the-fly allocation may be prone to deadlocks. Consider
	  the case where pageout goes via a disk controller, which
	  needs to allocate a bounce buffer with the same size as
	  the pageout. Out of a specific physical memory range to
	  boot. It's not hard to see how that might deadlock.

There are some solutions for this:

	* Use an interrupt safe map, submapped out of kernel_map.
	  This would work, with the proper splvm() protection,
	  but the problem is that submaps take a statically sized
	  chunk out of kernel virtual memory, which may be a waste
	  (certainly, the i386 port can be short on it at times).
	* Always pre-allocate bounce buffers, i.e. make BUS_DMA_ALLOCNOW
	  the default. This is, again, a bit of a waste, because it's
	  unlikely (though not totally impossible) that it'll actually
	  all be used at the same time.

All things considered, I think that always pre-allocating the buffers
is the way to go. Systems with busmastering ISA cards need to do
that already, or they won't work because of the kernel_map usage,
which isn't safe for interrupts. So no change there. And in a system
with more than 4G, the extra memory usage is not a big deal. I think
it should be no more than 100M on a system stacked with several
controllers and gigabit ethernet devices. Example: the ahc driver
would grab some 16M. But again, I think that's acceptable on a system
with that much memory (note that there's NO change in for systems with
less than 4G).

It's a bit of a shame to waste kernel virtual memory on this, when
you look at it, the thought occurs that just allocating the physical
pages, and mapping them in for bus_dmamap_load or even just
bus_dmamap_sync, would work. This would indeed work, but the
problem is then that you're back to having to allocate virtual
memory in an interrupt-safe way. Which can only be done from
a submap, which is statically sized, so that defeats the whole
purpose. Also, there's be far more pmap enter/remove operations,
with TLB shootdown traffic on MP systems.

Lastly, it seems to be a good idea to have different strategies
(VM_PHYSSEG strategies) for anon vs. file/exec pages. Anon pages
are much less likely to be subject to I/O than file/exec pages.
So, if I have a system with, say, 16G, I'd like the anon pages
to be allocated out of the > 4G range, whereas the file pages
should come out of the < 4G range, to avoid bounce buffers.

Comments?

- Frank

-- 
Frank van der Linden                                            fvdl@netbsd.org
===============================================================================
NetBSD. Free, Unix-like OS. > 45 different platforms.    http://www.netbsd.org/