tech-kern: Re: m_defrag() addition

Subject: Re: m_defrag() addition
To: David Laight <david@l8s.co.uk>
From: Sam Leffler <sam@errno.com>
List: tech-kern
Date: 03/02/2005 17:14:39
[picking on a semi-random entry in this thread]

David Laight wrote:
>>So, I think the way it needs to be done is:
>>
>>	error = m_defrag(&m, ...);
>>	if (error != 0) {
>>		/* m_defrag() failed, m still points to old chain. */
>>	} else {
>>		/* m_defrag() succeeded, m now points to new chain, old 
>>		chain freed. */
>>	}
> 
> 
> Why not follow the semantics of realloc()?
> 
> 	new = m_defrag(old, ...);
> 	if (!new) {
> 		/* couldn't defrag, old still intact */
> 		....
> 		return;
> 	}
> 	/* new contains mbuf seq */
> 
> Which saves taking the address of the pointer variable.
> (and all the effects that has on optimisers)

I worked on m_defrag and variants last week in response to the interest 
shown in sharing an api with FreeBSD.  I found that depending on the 
application you may want very different algorithms.  For example in 
drivers you often want minimal effort to comply with fragmentation 
requirements before resorting to tossing a frame and _may_ not want to 
spend the effort (and resources) to fully compact and copy the original 
mbuf chain.  This implies (to me) an in-place algorithm that can make a 
best effort.  OTOH it may be important to do anything possible to 
accomplish the goal in which case you may choose a different strategy. 
To deal with these conflicting needs I'm thinking of two "defrag" 
methods, one that works in-place and makes a best effort to meet a goal 
and another that is similar to the current FreeBSD m_defrag--that does 
optimal compaction and fails only if there aren't mbufs/clusters 
available (I wanted this flavor to use a different name but 
unfortunately it's got history behind it).

As to this suggestion, it can easily be done but if you use an algorithm 
that modifies the mbuf chain (e.g. by compacting in-place) then the 
caller must be aware that on return the chain may not only not be 
compacted but also be different.  This adds complexity to the api that 
is worrisome (e.g. in looking at usage I found a couple of drivers doing 
the wrong thing with it--including my own :)) and contrary to the normal 
api's for mbuf routines.  OTOH it is necessary for doing something like:

	m = m_defrag_try(old, ..., maxfrags);
	if (m == NULL)
		m = m_defrag_optimal(old, ...);

What's going on right now in FreeBSD is that I'm encouraging the 
m_defrag owner to stop using it for things like aligning tx frames to 
satisfy h/w requirements (by using a different routine).  Then I've done 
optimization of the current m_defrag code to eliminate some O(n^2) 
behaviour and avoid some wasted work.  Past that I want to test code 
that uses a maxfrags parameter and a different algorithm for coalescing 
data.

I've been staying in touch with Jaromir and obviously you folks can go 
your own route but I'd like to see us share api's.

	Sam