NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: memove performance of NetBSD



Hi,

 Chuck Swiger <cswiger%mac.com@localhost>:
> On Jan 24, 2010, at 10:14 PM, Channa wrote:
>> Thanks you very much for the information i checked the links.
>
> You're most welcome.
>
> [ ... ]
>> However i tested the algorithm as below:
>>
>> memmove(dst,src,sizeof(src) // Performance is good
>> memmove(dst,src+4,sizeof(src) // Performance is good since 'src' is aligned
>>
>> If I perform memove as below:
>>
>> memmove(dst,src + 1,sizeof(src) // Performance degrades
>> memmove(dst,src + 2,sizeof(src) // Performance degrades
>>
>> since in the above calls of memmove() the source is unaligned.
>
[..]
For what it's worth, people who have benchmarked bcopy/memmove tend to
find that unaligned accesses happen infrequently and generally for
fairly small lengths (ie, typical string / log message processing).
If your code does frequent large unaligned copies, you will likely
find that adjusting structs and your code to work with the native
alignment will result in more benefit than trying to hack on
bcopy/memmove....
[..]

I checked the NetBSD-current implementation , in the below file in
NetBSD CVS repo:
'src/common/lib/libc/string/bcopy.c'

I verified the current implementation , the copy operation or move
operation is done word by word.

I have modified the algorithm as below:
<1> Align the source if not aligned by copying only few bytes to destination
<2> Copy 4 words at a time.

I verified the modification with the performance of memmove for
aligned and unaligned combination, the performance does not degrade
for unaligned access.

I have attached the patch for 'bcopy.c' file showing the modifications
i have done.

Please let me know your view on the same.

Thanks & Regards,
Channagoud
--- bcopy.c.org 2010-01-28 10:27:28.000000000 +0530
+++ bcopy.c     2010-01-28 10:34:00.000000000 +0530
@@ -68,6 +68,11 @@
 #define        wmask   (wsize - 1)
 
 /*
+ * Add 4 bytes blokcs to be copied at a time
+ */
+#define        FOUR_BLOCKS     (sizeof (long) << 2)
+
+/*
  * Copy a block of memory, handling overlap.
  * This is the routine that actually implements
  * (the portable versions of) bcopy, memcpy, and memmove.
@@ -87,6 +92,10 @@
        const char *src = src0;
        size_t t;
        unsigned long u;
+       /*
+        * copy length
+        */
+       int len =length;
 
 #if !defined(_KERNEL)
        _DIAGASSERT(dst0 != 0);
@@ -107,18 +116,32 @@
                 * Copy forward.
                 */
                u = (unsigned long)src; /* only need low bits */
+               /*
+                * This loop is entered when source and dest are not aligned
+                */
                if ((u | (unsigned long)dst) & wmask) {
                        /*
                         * Try to align operands.  This cannot be done
                         * unless the low bits match.
                         */
+#if 0
                        if ((u ^ (unsigned long)dst) & wmask || length < wsize)
                                t = length;
                        else
                                t = wsize - (size_t)(u & wmask);
                        length -= t;
                        TLOOP1(*dst++ = *src++);
+#endif 
+                       /*
+                        * Align the source 
+                        */
+                       while ((unsigned long) src % sizeof(long)) {
+                               *dst++ = *src++;
+                               len--;
+                       }
+
                }
+#if 0
                /*
                 * Copy whole words, then mop up any trailing bytes.
                 */
@@ -126,6 +149,25 @@
                TLOOP(*(word *)(void *)dst = *(const word *)(const void *)src; 
src += wsize; dst += wsize);
                t = length & wmask;
                TLOOP(*dst++ = *src++);
+#endif
+               /*
+                * Insted of copying one word at a time , copy 4 words
+                */
+               while (len >= FOUR_BLOCKS) {
+                       *dst++ = *src++;
+                       *dst++ = *src++;
+                       *dst++ = *src++;
+                       *dst++ = *src++;
+                       len -= FOUR_BLOCKS;
+               }
+       
+               /*
+                * Copy if left out bytes whose size is less than FOUR_BLOCKS
+                */
+               while (len) {
+                       *dst++ = *src++:
+                       len--;
+               }
        } else {
                /*
                 * Copy backwards.  Otherwise essentially the same.
@@ -138,17 +180,47 @@
                _DIAGASSERT((unsigned long)src >= (unsigned long)src0);
                u = (unsigned long)src;
                if ((u | (unsigned long)dst) & wmask) {
+#if 0
                        if ((u ^ (unsigned long)dst) & wmask || length <= wsize)
                                t = length;
                        else
                                t = (size_t)(u & wmask);
                        length -= t;
                        TLOOP1(*--dst = *--src);
+#endif
+                       /*
+                        * Align the source 
+                        */
+                       while ((unsigned long) src % sizeof(long)) {
+                               *--dst = *--src;
+                               len--;
+                       }
+
                }
+#if 0
                t = length / wsize;
                TLOOP(src -= wsize; dst -= wsize; *(word *)(void *)dst = 
*(const word *)(const void *)src);
                t = length & wmask;
                TLOOP(*--dst = *--src);
+#endif
+               /*
+                * Insted of copying one word at a time , copy 4 words
+                */
+               while (len >= FOUR_BLOCKS) {
+                       *--dst = *--src;
+                       *--dst = *--src;
+                       *--dst = *--src;
+                       *--dst = *--src;
+                       len -= FOUR_BLOCKS;
+               }
+       
+               /*
+                * Copy if left out bytes whose size is less than FOUR_BLOCKS
+                */
+               while (len) {
+                       *--dst = *--src;
+                       len--;
+               }
        }
 done:
 #if defined(MEMCOPY) || defined(MEMMOVE)


Home | Main Index | Thread Index | Old Index