Subject: Re: m_defrag() addition
To: David Laight <david@l8s.co.uk>
From: Sam Leffler <sam@errno.com>
List: tech-kern
Date: 03/02/2005 17:14:39
[picking on a semi-random entry in this thread]
David Laight wrote:
>>So, I think the way it needs to be done is:
>>
>> error = m_defrag(&m, ...);
>> if (error != 0) {
>> /* m_defrag() failed, m still points to old chain. */
>> } else {
>> /* m_defrag() succeeded, m now points to new chain, old
>> chain freed. */
>> }
>
>
> Why not follow the semantics of realloc()?
>
> new = m_defrag(old, ...);
> if (!new) {
> /* couldn't defrag, old still intact */
> ....
> return;
> }
> /* new contains mbuf seq */
>
> Which saves taking the address of the pointer variable.
> (and all the effects that has on optimisers)
I worked on m_defrag and variants last week in response to the interest
shown in sharing an api with FreeBSD. I found that depending on the
application you may want very different algorithms. For example in
drivers you often want minimal effort to comply with fragmentation
requirements before resorting to tossing a frame and _may_ not want to
spend the effort (and resources) to fully compact and copy the original
mbuf chain. This implies (to me) an in-place algorithm that can make a
best effort. OTOH it may be important to do anything possible to
accomplish the goal in which case you may choose a different strategy.
To deal with these conflicting needs I'm thinking of two "defrag"
methods, one that works in-place and makes a best effort to meet a goal
and another that is similar to the current FreeBSD m_defrag--that does
optimal compaction and fails only if there aren't mbufs/clusters
available (I wanted this flavor to use a different name but
unfortunately it's got history behind it).
As to this suggestion, it can easily be done but if you use an algorithm
that modifies the mbuf chain (e.g. by compacting in-place) then the
caller must be aware that on return the chain may not only not be
compacted but also be different. This adds complexity to the api that
is worrisome (e.g. in looking at usage I found a couple of drivers doing
the wrong thing with it--including my own :)) and contrary to the normal
api's for mbuf routines. OTOH it is necessary for doing something like:
m = m_defrag_try(old, ..., maxfrags);
if (m == NULL)
m = m_defrag_optimal(old, ...);
What's going on right now in FreeBSD is that I'm encouraging the
m_defrag owner to stop using it for things like aligning tx frames to
satisfy h/w requirements (by using a different routine). Then I've done
optimization of the current m_defrag code to eliminate some O(n^2)
behaviour and avoid some wasted work. Past that I want to test code
that uses a maxfrags parameter and a different algorithm for coalescing
data.
I've been staying in touch with Jaromir and obviously you folks can go
your own route but I'd like to see us share api's.
Sam