Subject: lib/35535: memcpy() is very slow if not aligned
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <fuyuki@hadaly.org>
List: netbsd-bugs
Date: 02/01/2007 11:20:01
>Number:         35535
>Category:       lib
>Synopsis:       memcpy() is very slow if not aligned
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    lib-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 01 11:20:00 +0000 2007
>Originator:     Kimura Fuyuki
>Release:        4.99.9
>Organization:
>Environment:
NetBSD lapis.hadaly.org 4.99.9 NetBSD 4.99.9 (LAPIS) #59: Thu Feb  1 16:18:21 JST 2007  fuyuki@lapis.hadaly.org:/usr/obj/sys/arch/amd64/compile/LAPIS amd64
>Description:
On NetBSD/amd64 (perhaps in i386 too) memcpy() could be very slow because no alignment effort has been taken place.
Sometimes it is very difficult or impossible for applications to align the dest addr so that the minimal effort should be taken in the library code.

>How-To-Repeat:
Use the following benchmark program to see the the unaligned memcpy() is so slow.

http://www.hadaly.org/fuyuki/memcpy_bench.c

<plain libc>

$ time ./memcpy_bench 65536 1000000 0 0
dst:0x503000 src:0x513000 len:65536
./memcpy_bench 65536 1000000 0 0  7.30s user 0.00s system 99% cpu 7.349 total

$ time ./memcpy_bench 65536 1000000 1 1
dst:0x503001 src:0x514001 len:65536
./memcpy_bench 65536 1000000 1 1  48.46s user 0.00s system 99% cpu 48.713 total

<patch (below) applied>

$ time ./memcpy_bench 65536 1000000 0 0
dst:0x503000 src:0x513000 len:65536
./memcpy_bench 65536 1000000 0 0  7.36s user 0.00s system 99% cpu 7.406 total

$ time ./memcpy_bench 65536 1000000 1 1
dst:0x503001 src:0x514001 len:65536
./memcpy_bench 65536 1000000 1 1  11.40s user 0.00s system 99% cpu 11.468 total

>Fix:
The following patch decreases the amarok's cpu% to <1 on my prescott celeron.

http://www.hadaly.org/fuyuki/bcopy.S.patch

See also (especially 8.2):
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF