tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[PATCH] implementing a faster algorithm for strpbrk(3) in libc



The new algorithm does not use any array initialisation.
Instead of that the only integer variable "index" is initialized.
It is not using any bitwise operations and shifts as well.

The well-known algorithm (an efficient representation for sparse sets) is
mentioned as exercise 2.12 in "The Design and Analysis of Computer Algorithms"
by Alfred Aho, John Hopcroft and Jeffrey Ullman. It is described here
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7319
and here
http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html
---
 lib/libc/string/strpbrk.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/lib/libc/string/strpbrk.c b/lib/libc/string/strpbrk.c
index 62c83f5..2f663d4 100644
--- a/lib/libc/string/strpbrk.c
+++ b/lib/libc/string/strpbrk.c
@@ -33,23 +33,40 @@ __RCSID("$NetBSD$");
 #include <limits.h>
 #include <string.h>
 
+#define FAST_STRPBRK 1
+#define UC(a) ((unsigned int)(unsigned char)(a))
+
+#if FAST_STRPBRK
+#define ADD_NEW_TO_SET(i) (set[inv[i]=index++]=(i))
+#define IS_IN_SET(i) (inv[i]<index && set[inv[i]]==(i))
+#define ADD_TO_SET(i) (IS_IN_SET(i)||ADD_NEW_TO_SET(i))
+#else
+#define IS_IN_SET(i) (set[(i) >> 3] & idx[(i) & 7])
+#define ADD_TO_SET(i) (set[(i) >> 3] |= idx[(i) & 7])
+#endif
+
 char *
 strpbrk(const char *s, const char *charset)
 {
+#if FAST_STRPBRK
+       uint8_t set[256], inv[256], index=0;
+#else
        static const size_t idx[8] = { 1, 2, 4, 8, 16, 32, 64, 128 };
        uint8_t set[32];
-#define UC(a) ((unsigned int)(unsigned char)(a))
+#endif
 
        _DIAGASSERT(s != NULL);
        _DIAGASSERT(charset != NULL);
 
+#if !FAST_STRPBRK
        (void)memset(set, 0, sizeof(set));
+#endif
 
        for (; *charset != '\0'; ++charset)
-               set[UC(*charset) >> 3] |= idx[UC(*charset) & 7];
+               ADD_TO_SET(UC(*charset)) ;
 
        for (; *s != '\0'; ++s)
-               if (set[UC(*s) >> 3] & idx[UC(*s) & 7])
+               if (IS_IN_SET(UC(*s)))
                        return __UNCONST(s);
        return NULL;
 }
-- 
1.5.6.3



Home | Main Index | Thread Index | Old Index