Re: PR/36864 CVS commit: src/sys/opencrypto

To: gnats-bugs%NetBSD.org@localhost
Subject: Re: PR/36864 CVS commit: src/sys/opencrypto
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock%nagler-company.com@localhost>
Date: Mon, 21 Feb 2011 15:49:31 +0100

Hi,

I've had a look at it.

In general it is a good idea - see conclusion below.

Let's do some calculations ...

I've compressed /dev/zero with gzip -9 - result:
   size     output
    1024 ->   29  -> factor 35
   10240 ->   45  -> factor 227
  102400 ->  133  -> factor 769
 1024000 -> 1026  -> factor 998
10240000 -> 9981  -> factor 1025

remark: 1024000000 -> 995308 -> factor 1028, it looks like we won't getmuch better factors at all.


ZBUF is 10 at the moment, so we get the following size factors,
aggregated size factors and buffer sizes (assumed last is 1000000)

   4 -    4 -    1950
   8 -   12 -    3901
  16 -   28 -    7812
  32 -   60 -   15625
  64 -  124 -   31250
 128 -  292 -   62500
 256 -  548 -  125000
 512 - 1060 -  230000
1024 - 2084 -  500000
2048 - 4132 - 1000000 -> max buffer 1978038 (near 2 MB)

remark: If we will never get much better factor than 1030 at all, ZBUF =8 would be large enougth. My test above seems to point to that.


There are two use-cases for the opencrypto stuff - as far as I know:
1. IPSEC
2. /dev/crypto

Let's look at 1 first:

The reasonable max. size for network packets for common network types is1500 or 9k for jumbo frames.So lets assume 10k. This will lead to a max factor of 227 in compressingonly 0 bytes. This mean, you have 45 bytes of input size, resulting in a180 byte start buffer. In order to do the decompression you need 6allocations (180+360+720+1440+2880+5760 = 11340).

This is much better than 10240/180 = 57 buffer allocations.

In my PR, I've mentioned a minimum start size for the algorithm. I'vefixed it to 2k with the idea that a power of two is a good idea and allnormal network packets will fit into one buffer when decompressed.Till now I think that it would be a good idea to create a sysctlvariable that defines a minimum start buffer size that should be used if4*size is smaller. The variable should have a default value of 0 - nominimum size.This allows the admin to set it to e.g. 1500 (or 9k if jumbo frames areused for compressed traffic) so that "normal" network packets will fitinto one buffer regardless of the compression factor.

For 10k pakets 1500 will lead to only 3 allocations (1500, 3000, 6000)

I think this would be a even better performance boost for IPSEC, becauseif we can get everything into one buffer, we avoid copying everythingagain at end of decompression ...



Let's look at 2:

In general we cannot assume anything about the size that gets compressedor decompressed.The limit of 1000000 Bytes max. chunk size may be a very hard thinghere. (Sorry, I do not use /dev/crypto, so I don't know anything aboutrealy reasonable sizes.)In general, I think for /dev/crypto it would be better to allow any sizefor compression/decompression and do the copyin/copyout in parts. (Theremust be a reason why all compression lib interfaces are designed likethis ...)Again, I do not know if this would break some semanticts of theinterface for /dev/crypto with overlapping in-/out-buffers.

And I've no Idea if all hardware devices would be able to handle this.

Perhaps there is already an interface mode for /dev/crypto that allowspartial work and continuations - I've never looked for this up to now.remark: That may conflict with the change done some PR's before bysetting Z_FINISH in compression - that needs to get "tunable" to supportthis ...

The current Patch will limit decompression to something a little bitless 2MB. This may be OK, but I would be surprised if it is, because 2MBis not very much for user-level stuff.



Conclusion:
For IPSEC this change is a great step into the right direction.
But I recommend adding a sysctl variable with a minimum start size too.

Due to the fact that I do not use /dev/crypto, I cannot decide if thelimit to ca. 2MB in decompression is OK or not.



Last, some ideas from my side for /dev/crypto:

For /dev/crypto usage I think that the internal interfaces of opencryptoare not optimal.The aproach to get everything into some "objects" and operate on them,creating new objects, sounds like a commonly used aproach in objectoriented programming.This aproach is not optimal for data-stream processing, because itassumes that we have endless memory if we are going to process largeramounts of data - it will not scale. It is no good idea to do this onuser level and inside the kernel such an aproach has lost nothing - myoppion.The current patch will speed up /dev/crypto work, so it is a step intothe right direction for /dev/crypto usage too.In the interface of /dev/crypto the user needs to reserve enougthuser-level memory to copy back everything.So the user-process knows something about the expected size of thething. (Or makes a dummy call just in order to get the required sizeinformation without any data transfer.)Perhaps this can be used to do some even better heuristics whenallocating buffer space. If there are some limit-checks at /dev/cryptosyscall level, the size used there should be used here too. (At least acomment that references theese values is missing here.)Perhaps a max result size parameter makes sence in the calling interfaceof deflate/gzip-routines, to avoid useless decompression that cannot becopied back to user level due to size reasons. Or the "rest" getsdecompressed into a static garbage buffer in order get the result-size,if required.The "best sollution" would be (from my point of view), if the wholeinternal interface get changed, so that partial data transfer maydirectly happen from/to the user level and the result-size getsallocated (and expanded if needed) by the caller.Of cause this would break interchangeability with other implementationsand I'm not shure it this realy make sense.I also simply do not know what kind of "productive" user-level programsare "designed" to use compresion via /dev/crypto.All network data stream related things should be inside of the kernel -my oppinion - for performance reasons. Sorry, I'm no friend of placeingkernel functionality (mainly data stream or network packet processing)in user-level programs.(racoon (IKE) and IPSEC is a good example here: (fast) data-streamprocessing inside the kernel - key-exchange, access validation and othercomplex things in a user-level program that configures the kernel IPSECparts.)For testing and development of something /dev/crypto might be a verynice thing, of cause.


Best regards

W. Stukenbrock


Matthias Drochner wrote:

The following reply was made to PR kern/36864; it has been noted by GNATS.

From: "Matthias Drochner" <drochner%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost

Cc:Subject: PR/36864 CVS commit: src/sys/opencrypto

Date: Fri, 18 Feb 2011 22:02:10 +0000

 Module Name:   src
 Committed By:  drochner
 Date:          Fri Feb 18 22:02:09 UTC 2011

Modified Files:

        src/sys/opencrypto: deflate.c

Log Message:

 redo result buffer allocation, to avoid dynamic allocations:
 -use exponentially growing buffer sizes instead of just linear extension
 -drop the dynamic allocation of buffer metadata introduced in rev.1.8 --
  if the initial array is not sufficient something is wrong
 -apply some (arbitrary, heuristic) limit so that compressed data
  which extract into insane amounts of constant data don't kill the system
 This addresses PR kern/36864 by Wolfgang Stukenbrock. Some tuning
 might be useful, but hopefully this is an improvement already.

To generate a diff of this commit:

 cvs rdiff -u -r1.17 -r1.18 src/sys/opencrypto/deflate.c

Please note that diffs are not public domain; they are subject to the

 copyright notices on the relevant files.

References:
- PR/36864 CVS commit: src/sys/opencrypto
  - From: Matthias Drochner

Prev by Date: Re: PR/44405 CVS commit: src/sys/arch/amd64/amd64
Next by Date: Re: PR/36864 CVS commit: src/sys/opencrypto
Previous by Thread: PR/36864 CVS commit: src/sys/opencrypto
Next by Thread: Re: PR/36864 CVS commit: src/sys/opencrypto
Indexes:

Home | Main Index | Thread Index | Old Index