Subject: Re[2]: calculation in UDP
To: None <tech-net@netbsd.org>
From: Thomas Finneid <tfinneid@ifi.uio.no>
List: tech-net
Date: 12/08/2002 20:50:59
On Sun, 8 Dec 2002 07:49:06 -0800 Jason R Thorpe <thorpej@wasabisystems.com> wrote:

> On Sun, Dec 08, 2002 at 04:31:30PM +0100, Thomas Finneid wrote:
> 
>  > Ok, so this is why the ip processing is consuming so many clock
> cycles. I
>  > have'nt gotten to the details of the ip part just yet. I thought the
>  > checksum might have be done in the socket layer when copying the data
> from
>  > the user process, but I could'nt find it there either. 
> 
> Right.  And, to be accurate, it's not like this makes IP processing "more
> expensive", because the deferred checksum only happens if a higher-level
> protocol set it up, and that higher-level protocol would have had to pay
> the cost of the checksum anyway.  We are just deferring it until we know
> whether or not it will be needed.

Ok I understand. The checksum thing allmost made me pull out all my hear in
one go, fortuneately it did'nt come to that. 

I did observe some off behaviour with the ip layer processing though, which
is that the processing times with the hw checksumming is slightly higher
than the processing times without it. Take a look at the measurement
results at (these are preliminary results)

http://www.stud.ifi.uio.no/~tfinneid/results/meta-data-p003-g003.html

I am using an 3c905C 10/100 card, which according to what I found out
supports hw checksumming. And I turned on hw checksumming for the interface
with the command

        ifconfig ex0 ip4csum

There are several interresting points about these results, but what I am
thinking about right now is that most of the results show that hw assisted
checksumming does not offload the cpu much. There might be obvious reasons
for this, such as the ip layer being idle/waiting while the hw performs the
checksumming operation. That would then be wasted processor cycles, of
sorts, which could have been used for processing other tcp/udp/ip packets
in the mean time. Or just that I was wrong about the nic supporting hw
checksumming.

Any comments or thoughts?


BTW, the focus of my thesis is on how to reduce the number of processing
cycles used so that the load of the machine becomes lighter. In other words
it does not necessarily need to be faster, it just means that the saved
processor cycles can be used elsewhere in the mean time, such as
application level processing. The domain is high performance serves,
which needs any extra clockcycles it can get its hand on.




-- 
Thomas Finneid

email: tfinneid@ifi.uio.no