Subject: Re: But why?
To: None <miguel@nuclecu.unam.mx>
From: David S. Miller <davem@caip.rutgers.edu>
List: tech-kern
Date: 10/23/1996 22:56:47
   Date: Wed, 23 Oct 1996 21:09:36 -0500
   From: Miguel de Icaza <miguel@nuclecu.unam.mx>

   Anyways, Alan Cox is one of my heros: with this optimization of his
   for Linux/SPARC, we can deliver an udp datagram to user space with
   just one memcpy (we allocate skbuffs on the dma area that the lance
   card has access to, and flip skbufs around.  Move the skbuff up and
   down as much as we want in the networking stack and just do one
   memcpy to userland at the final delivery time).

   So nice.

Indeed nice.  On the SS10 with Happy Meal ethernet cards the code path
at each end works out to something like:

write(sockfd, &buf, size)
	trap --> sys_write() --> sock_read()
		csum_copy_partial(skb)
			ip_header_csum()
				happy_meal_xmit(skb) --> the wire

the wire --> happy_meal_rx() (the card dma's right into the networking
			      buffer and checksums for me)
	memcpy_tofs(skb->data, user_buf);
	trap --> sys_read() (in libc)

I see the data _once_ at each end, and that is at the point in time
you copy in/out of user space, and that is it.  I use no extraneous
buffers for tx/rx as the card can directly dma into any buffer I give
it.  Now picture this running full duplex 100mbit and it's pretty
intense.

Also note that the points that I do touch the data happen to heat up
the cache/tlb for the user mode task as a nice side effect.

David S. Miller
davem@caip.rutgers.edu