tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: qsort_r



On Sun, Dec 08, 2013 at 10:50:15PM +0000, David Holland wrote:
 >  > > I have done it by having the original, non-_r functions provide a
 >  > > thunk for the comparison function, as this is least invasive. If we
 >  > > think this is too expensive, an alternative is generating a union of
 >  > > function pointers and making tests at the call sites; another option
 >  > > is to duplicate the code (hopefully with cpp rather than C&P) but that
 >  > > seems like a bad plan.
 >  > 
 >  > I'd prefer to not have another indirect call. The only difference
 >  > is the definition and expanding a CMP macro differently?
 > 
 > Yes. But I'd rather not duplicate the code...

So, to ground this a bit, the realistic options are:

starting from this (various details elided):

   void
   sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
   {
           ...
           r = cmp(a, b);
           ...
   }


(1) thunks [what's in my current patch]

   void
   sort_r(void *p, size_t n, size_t sz,
          int (*cmp)(const void *, const void *, void *), void *data)
   {
           ...
           r = cmp(a, b, data);
           ...
   }

   struct sort {
           int (*cmp)(const void *, const void *);
   };

   static int
   thunk(const void *a, const void *b, void *data)
   {
           return ((struct sort *)data)->cmp(a, b)
   }

   void
   sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
   {
           struct sort x = { .cmp = cmp };
           sort_r(p, n, sz, thunk, &x);
   }

pros:
   + already written
   + noninvasive
   + reasonably tidy
   + does not duplicate the output code
cons:
   - extra indirect calls


(2) union of function pointers

   union sort {
           int (*cmp)(const void *, const void *);
           int (*cmp_r)(const void *, const void *, void *);
   };

   static void
   sort_internal(void *p, size_t n, size_t sz, bool is_r, union sort cmpu,
                 void *data)
   {
           ...
           r = is_r ? cmpu.cmp(a, b) : cmpu.cmp_r(a, b, data);
           ...
   }

   void
   sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *))
   {
           union sort cmpu = { .cmp = cmp };
           sort_internal(p, n, sz, true, cmpu, NULL);
   }

   void
   sort_r(void *p, size_t n, size_t sz,
          int (*cmp_r)(const void *, const void *, void *), void *data)
   {
           union sort cmpu = { .cmp_r = cmp };
           sort_internal(p, n, sz, true, cmpu, data);
   }

pros:
   + not excessively invasive
   + no indirect calls
   + does not duplicate the output code
   + most likely to have unnecessary logic removed by the compiler
cons:
   + a bit ugly


(3) cloning the code with cpp

   --- in sort.c ---

   #ifdef MAKE_ME_REENTRANT
   #define SORT            sort_r
   #define EXTRA           , void *data
   #define CMP(a, b)       cmp(a, b, data)
   #else
   #define SORT            sort
   #define EXTRA
   #define CMP(a, b)       cmp(a, b)
   #endif

   void
   SORT(void *p, size_t n, size_t sz,
        int (*cmp)(const void *, const void *  EXTRA)  EXTRA)
   {
           ...
           r = CMP(a, b);
           ...
   }

   --- in sort_r.c ---

   #define MAKE_ME_REENTRANT
   #include sort.c


pros:
   + noninvasive
   + no runtime overhead
cons:
   + ugly
   + duplicates the output code


Of these I think if we're concerned about the overhead of the thunks
in #1, #2 is the best bet.

While with normal calling conventions sort can end up being identical
object code to sort_r, as various people have noted, I think this kind
of optimization is best left to the compiler. With #2, the compiler
has a fighting chance of being able to do this. (I haven't checked;
I'd be moderately, but not greatly, surprised if any of the compilers
we have do.)

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index