tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: qsort_r
On Sun, Dec 08, 2013 at 10:50:15PM +0000, David Holland wrote:
> > > I have done it by having the original, non-_r functions provide a
> > > thunk for the comparison function, as this is least invasive. If we
> > > think this is too expensive, an alternative is generating a union of
> > > function pointers and making tests at the call sites; another option
> > > is to duplicate the code (hopefully with cpp rather than C&P) but that
> > > seems like a bad plan.
> >
> > I'd prefer to not have another indirect call. The only difference
> > is the definition and expanding a CMP macro differently?
>
> Yes. But I'd rather not duplicate the code...
So, to ground this a bit, the realistic options are:
starting from this (various details elided):
void
sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
{
...
r = cmp(a, b);
...
}
(1) thunks [what's in my current patch]
void
sort_r(void *p, size_t n, size_t sz,
int (*cmp)(const void *, const void *, void *), void *data)
{
...
r = cmp(a, b, data);
...
}
struct sort {
int (*cmp)(const void *, const void *);
};
static int
thunk(const void *a, const void *b, void *data)
{
return ((struct sort *)data)->cmp(a, b)
}
void
sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
{
struct sort x = { .cmp = cmp };
sort_r(p, n, sz, thunk, &x);
}
pros:
+ already written
+ noninvasive
+ reasonably tidy
+ does not duplicate the output code
cons:
- extra indirect calls
(2) union of function pointers
union sort {
int (*cmp)(const void *, const void *);
int (*cmp_r)(const void *, const void *, void *);
};
static void
sort_internal(void *p, size_t n, size_t sz, bool is_r, union sort cmpu,
void *data)
{
...
r = is_r ? cmpu.cmp(a, b) : cmpu.cmp_r(a, b, data);
...
}
void
sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
{
union sort cmpu = { .cmp = cmp };
sort_internal(p, n, sz, true, cmpu, NULL);
}
void
sort_r(void *p, size_t n, size_t sz,
int (*cmp_r)(const void *, const void *, void *), void *data)
{
union sort cmpu = { .cmp_r = cmp };
sort_internal(p, n, sz, true, cmpu, data);
}
pros:
+ not excessively invasive
+ no indirect calls
+ does not duplicate the output code
+ most likely to have unnecessary logic removed by the compiler
cons:
+ a bit ugly
(3) cloning the code with cpp
--- in sort.c ---
#ifdef MAKE_ME_REENTRANT
#define SORT sort_r
#define EXTRA , void *data
#define CMP(a, b) cmp(a, b, data)
#else
#define SORT sort
#define EXTRA
#define CMP(a, b) cmp(a, b)
#endif
void
SORT(void *p, size_t n, size_t sz,
int (*cmp)(const void *, const void * EXTRA) EXTRA)
{
...
r = CMP(a, b);
...
}
--- in sort_r.c ---
#define MAKE_ME_REENTRANT
#include sort.c
pros:
+ noninvasive
+ no runtime overhead
cons:
+ ugly
+ duplicates the output code
Of these I think if we're concerned about the overhead of the thunks
in #1, #2 is the best bet.
While with normal calling conventions sort can end up being identical
object code to sort_r, as various people have noted, I think this kind
of optimization is best left to the compiler. With #2, the compiler
has a fighting chance of being able to do this. (I haven't checked;
I'd be moderately, but not greatly, surprised if any of the compilers
we have do.)
--
David A. Holland
dholland%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index