Subject: Re: [RFC] Interface to hardware-assisted data movers
To: None <cgd@broadcom.com>
From: Darren Reed <darrenr@reed.wattle.id.au>
List: tech-kern
Date: 06/22/2002 12:04:02
In some email I received from cgd@broadcom.com, sie wrote:
> 
> * the load balancing algorithm, etc., seems a bit ad-hoc.
>   additionally, static assignments of sessions to back-ends for all
>   times also seems limiting.  why restrict by describing it that way?
> 
>   random thought that popped into my head: if you have some kind of HW
>   assist module which gets removed from the system (!!), in current
>   scheme all dmover clients who happened to have their sessions
>   assigned to that module will need to squish and create sessions
>   anew.
> 
>   requirement that hw be used first is kinda lame...  what if your xor
>   engine is maxed out but you've got a dual-processor system that's
>   idle waiting on xors to finish?

I've skipped some parts of this dicussion, but perhaps I can add
a few comments here...

...it would seem, from this point, that operations being registered
for by dmover/xform back ends should only be allowed for operations
that are already supported by the kernel in a hardware unassisted
manner.  That way the kernel can do a small request (say) while a
hardware thing is busy doing a big request without being penalised.
Although it didn't get mentioned, it'd otherwise seem possible to
compile a kernel without DES/3-DES but then issue requests to a
xform backend.  hmmm, would that be considered a "useful" feature?

...maybe during autoconfiguration, information about how much work
each dmover/xform "backend" can do is stored somewhere.  At bootup
the kernel would try to measure how fast its own native dmover/xform
operations, whereas cards would have some sort of table with this
info. in it.  To use the bid idea you mentioned, Chris, maybe this
is a seed for calculating the value of a "bid" ?

Furthermore, there may be times when it is faster to use the CPU in
a system than try and program a device to do some particular work.
There was a paper at last year's Usenix Security symposium of what
a particular engineer had to do in order to get a particular crypto
card to work better than a few kb/sec crypto.  Although this may be
device specific, if setup times for hardware to do particular op's
is larger than for CPU based, why use the hardware ?