Subject: patch for amd64 asm string functions
To: None <port-amd64@netbsd.org>
From: Blair Sadewitz <blair.sadewitz@gmail.com>
List: port-amd64
Date: 08/04/2007 18:20:31
------=_Part_91149_4505636.1186266031495
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I'd like to hear opinions on the following:

I've been using the amd64 string functions
(src/common/lib/libc/arch/x86_64/string/) modified by
<fuyuki@hadaly.org> (see netbsd-bugs in January)
for months now without incident (except when troubleshooting to make
sure that it wasn't a problem).  I have an EMT64 processor, and after
rebuilding the tree with -mtune=nocona and applying this patch, the
system is noticeably faster.  I have confirmed the results he spoke of
in his emails to netbsd-bugs with his memcpy benchmark (see:
<http://www.hadaly.org/fuyuki/>).

Also, when we move to gcc 4.2, we should probably build the tree with
-mtune=generic, which tunes fairly for both AMD and Intel processors.
Until then--according to what I've read on the GCC lists and such--the
best thing to do is use --mtune=nocona, as the performance hit for AMD
processors is negligable (they do, after all, have to compete, and
that means running code optimized for Intel processors).  On the other
hand, EMT64 processors pay a substantial price (up to 20% loss in some
benchmarks I've seen) for -mtune=k8 (from -march=k8, our default).  I
don't think there's much of a difference between -march=nocona and
-mtune=nocona anyway, especially now that Intel has cloned AMD's
instruction set more faithfully.

The only thing I changed from the author's original patch is adding an
#ifdef _KERNEL...#endif at lines 62 and 81 of memset.S, as the kernel
doesn't do huge memcpy operations AFAIK, and so using the hints there
has virtually no chance of being productive and a substantial chance
of being counterproductive.

Regards,

--Blair

------=_Part_91149_4505636.1186266031495
Content-Type: application/octet-stream; name=amd64-string.diff
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="amd64-string.diff"

SW5kZXg6IGNvbW1vbi9saWIvbGliYy9hcmNoL3g4Nl82NC9zdHJpbmcvYmNvcHkuUwo9PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09ClJDUyBmaWxlOiAvY3Zzcm9vdC9zcmMvY29tbW9uL2xpYi9saWJjL2FyY2gveDg2XzY0L3N0
cmluZy9iY29weS5TLHYKcmV0cmlldmluZyByZXZpc2lvbiAxLjEKZGlmZiAtdSAtZCAtcjEuMSBi
Y29weS5TCi0tLSBjb21tb24vbGliL2xpYmMvYXJjaC94ODZfNjQvc3RyaW5nL2Jjb3B5LlMJMjAg
RGVjIDIwMDUgMTk6Mjg6NTEgLTAwMDAJMS4xCisrKyBjb21tb24vbGliL2xpYmMvYXJjaC94ODZf
NjQvc3RyaW5nL2Jjb3B5LlMJMjcgSnVsIDIwMDcgMjE6MjY6NDggLTAwMDAKQEAgLTUwLDE2ICs1
MCwyMyBAQAogI2VuZGlmCiAjZW5kaWYKICNpZiBkZWZpbmVkKE1FTUNPUFkpIHx8IGRlZmluZWQo
TUVNTU9WRSkKLQltb3ZxCSVyZGksJXIxMQkvKiBzYXZlIGRlc3QgKi8KKwltb3ZxCSVyZGksJXJh
eAkvKiBzYXZlIGRlc3QgKi8KICNlbHNlCiAJeGNoZ3EJJXJkaSwlcnNpCiAjZW5kaWYKLQltb3Zx
CSVyZHgsJXJjeAotCW1vdnEJJXJkaSwlcmF4Ci0Jc3VicQklcnNpLCVyYXgKLQljbXBxCSVyY3gs
JXJheAkvKiBvdmVybGFwcGluZz8gKi8KKwltb3ZxCSVyZGksJXJjeAorCXN1YnEJJXJzaSwlcmN4
CisJY21wcQklcmR4LCVyY3gJLyogb3ZlcmxhcHBpbmc/ICovCiAJamIJMWYKIAljbGQJCQkvKiBu
b3BlLCBjb3B5IGZvcndhcmRzLiAqLworCXRlc3RxCSVyZHgsJXJkeAorNDoJanoJMmYKKwl0ZXN0
cQkkNywlcmRpCQkvKiBkZXN0IGlzIGFsaWduZWQ/ICovCisJanoJM2YKKwltb3ZzYgorCWRlY3EJ
JXJkeAorCWptcAk0YgorMzoJbW92cQklcmR4LCVyY3gKIAlzaHJxCSQzLCVyY3gJCS8qIGNvcHkg
Ynkgd29yZHMgKi8KIAlyZXAKIAltb3ZzcQpAQCAtNjcsMTEgKzc0LDkgQEAKIAlhbmRxCSQ3LCVy
Y3gJCS8qIGFueSBieXRlcyBsZWZ0PyAqLwogCXJlcAogCW1vdnNiCi0jaWYgZGVmaW5lZChNRU1D
T1BZKSB8fCBkZWZpbmVkKE1FTU1PVkUpCi0JbW92cQklcjExLCVyYXgKLSNlbmRpZgotCXJldAor
MjoJcmV0CiAxOgorCW1vdnEJJXJkeCwlcmN4CiAJYWRkcQklcmN4LCVyZGkJLyogY29weSBiYWNr
d2FyZHMuICovCiAJYWRkcQklcmN4LCVyc2kKIAlzdGQKQEAgLTg2LDggKzkxLDUgQEAKIAlzdWJx
CSQ3LCVyZGkKIAlyZXAKIAltb3ZzcQotI2lmIGRlZmluZWQoTUVNQ09QWSkgfHwgZGVmaW5lZChN
RU1NT1ZFKQotCW1vdnEJJXIxMSwlcmF4Ci0jZW5kaWYKIAljbGQKIAlyZXQKSW5kZXg6IGNvbW1v
bi9saWIvbGliYy9hcmNoL3g4Nl82NC9zdHJpbmcvYnplcm8uUwo9PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09ClJDUyBmaWxl
OiAvY3Zzcm9vdC9zcmMvY29tbW9uL2xpYi9saWJjL2FyY2gveDg2XzY0L3N0cmluZy9iemVyby5T
LHYKcmV0cmlldmluZyByZXZpc2lvbiAxLjEKZGlmZiAtdSAtZCAtcjEuMSBiemVyby5TCi0tLSBj
b21tb24vbGliL2xpYmMvYXJjaC94ODZfNjQvc3RyaW5nL2J6ZXJvLlMJMjAgRGVjIDIwMDUgMTk6
Mjg6NTEgLTAwMDAJMS4xCisrKyBjb21tb24vbGliL2xpYmMvYXJjaC94ODZfNjQvc3RyaW5nL2J6
ZXJvLlMJMjcgSnVsIDIwMDcgMjE6MjY6NDggLTAwMDAKQEAgLTEsNDQgKzEsMyBAQAotLyoKLSAq
IFdyaXR0ZW4gYnkgSi5ULiBDb25rbGluIDxqdGNATmV0QlNELm9yZz4uCi0gKiBQdWJsaWMgZG9t
YWluLgotICogQWRhcHRlZCBmb3IgTmV0QlNEL3g4Nl82NCBieSBGcmFuayB2YW4gZGVyIExpbmRl
biA8ZnZkbEB3YXNhYmlzeXN0ZW1zLmNvbT4KLSAqLwotCi0jaW5jbHVkZSA8bWFjaGluZS9hc20u
aD4KLQotI2lmIGRlZmluZWQoTElCQ19TQ0NTKQotCVJDU0lEKCIkTmV0QlNEOiBiemVyby5TLHYg
MS4xIDIwMDUvMTIvMjAgMTk6Mjg6NTEgY2hyaXN0b3MgRXhwICQiKQotI2VuZGlmCi0KLUVOVFJZ
KGJ6ZXJvKQotCW1vdnEJJXJzaSwlcmR4Ci0KLQljbGQJCQkJLyogc2V0IGZpbGwgZGlyZWN0aW9u
IGZvcndhcmQgKi8KLQl4b3JxCSVyYXgsJXJheAkJLyogc2V0IGZpbGwgZGF0YSB0byAwICovCi0K
LQkvKgotCSAqIGlmIHRoZSBzdHJpbmcgaXMgdG9vIHNob3J0LCBpdCdzIHJlYWxseSBub3Qgd29y
dGggdGhlIG92ZXJoZWFkCi0JICogb2YgYWxpZ25pbmcgdG8gd29yZCBib3VuZHJpZXMsIGV0Yy4g
IFNvIHdlIGp1bXAgdG8gYSBwbGFpbgotCSAqIHVuYWxpZ25lZCBzZXQuCi0JICovCi0JY21wcQkk
MTYsJXJkeAotCWpiCUwxCi0KLQltb3ZxCSVyZGksJXJjeAkJLyogY29tcHV0ZSBtaXNhbGlnbm1l
bnQgKi8KLQluZWdxCSVyY3gKLQlhbmRxCSQ3LCVyY3gKLQlzdWJxCSVyY3gsJXJkeAotCXJlcAkJ
CQkvKiB6ZXJvIHVudGlsIHdvcmQgYWxpZ25lZCAqLwotCXN0b3NiCi0KLQltb3ZxCSVyZHgsJXJj
eAkJLyogemVybyBieSB3b3JkcyAqLwotCXNocnEJJDMsJXJjeAotCWFuZHEJJDcsJXJkeAotCXJl
cAotCXN0b3NxCi0KLUwxOgltb3ZxCSVyZHgsJXJjeAkJLyogemVybyByZW1haW5kZXIgYnkgYnl0
ZXMgKi8KLQlyZXAKLQlzdG9zYgotCi0JcmV0CisvKgkkTmV0QlNEOiBiemVyby5TLHYgMS4xIDIw
MDUvMTIvMjAgMTk6Mjg6NTEgY2hyaXN0b3MgRXhwICQJKi8KKyNkZWZpbmUgQlpFUk8KKyNpbmNs
dWRlICJtZW1zZXQuUyIKSW5kZXg6IGNvbW1vbi9saWIvbGliYy9hcmNoL3g4Nl82NC9zdHJpbmcv
bWVtc2V0LlMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL2N2c3Jvb3Qvc3JjL2NvbW1vbi9saWIvbGli
Yy9hcmNoL3g4Nl82NC9zdHJpbmcvbWVtc2V0LlMsdgpyZXRyaWV2aW5nIHJldmlzaW9uIDEuMQpk
aWZmIC11IC1kIC1yMS4xIG1lbXNldC5TCi0tLSBjb21tb24vbGliL2xpYmMvYXJjaC94ODZfNjQv
c3RyaW5nL21lbXNldC5TCTIwIERlYyAyMDA1IDE5OjI4OjUxIC0wMDAwCTEuMQorKysgY29tbW9u
L2xpYi9saWJjL2FyY2gveDg2XzY0L3N0cmluZy9tZW1zZXQuUwkyNyBKdWwgMjAwNyAyMToyNjo0
OCAtMDAwMApAQCAtMTAsMTEgKzEwLDE5IEBACiAJUkNTSUQoIiROZXRCU0Q6IG1lbXNldC5TLHYg
MS4xIDIwMDUvMTIvMjAgMTk6Mjg6NTEgY2hyaXN0b3MgRXhwICQiKQogI2VuZGlmCiAKKyNpZm5k
ZWYgQlpFUk8KIEVOVFJZKG1lbXNldCkKKyNlbHNlCitFTlRSWShiemVybykKKyNlbmRpZgorI2lm
bmRlZiBCWkVSTwogCW1vdnEJJXJzaSwlcmF4Ci0JYW5kcQkkMHhmZiwlcmF4CiAJbW92cQklcmR4
LCVyY3gKIAltb3ZxCSVyZGksJXIxMQorI2Vsc2UKKwltb3ZxCSVyc2ksJXJjeAorCXhvcnEJJXJh
eCwlcmF4CQkvKiBzZXQgZmlsbCBkYXRhIHRvIDAgKi8KKyNlbmRpZgogCiAJY2xkCQkJCS8qIHNl
dCBmaWxsIGRpcmVjdGlvbiBmb3J3YXJkICovCiAKQEAgLTI2LDYgKzM0LDkgQEAKIAljbXBxCSQw
eDBmLCVyY3gKIAlqbGUJTDEKIAorI2lmbmRlZiBCWkVSTworCWFuZHEJJDB4ZmYsJXJheAorCiAJ
bW92YgklYWwsJWFoCQkJLyogY29weSBjaGFyIHRvIGFsbCBieXRlcyBpbiB3b3JkICovCiAJbW92
bAklZWF4LCVlZHgKIAlzYWxsCSQxNiwlZWF4CkBAIC0zNCwyNiArNDUsNDkgQEAKIAltb3ZsCSVl
YXgsJWVkeAogCXNhbHEJJDMyLCVyYXgKIAlvcnEJJXJkeCwlcmF4CisjZW5kaWYKIAogCW1vdnEJ
JXJkaSwlcmR4CQkvKiBjb21wdXRlIG1pc2FsaWdubWVudCAqLwogCW5lZ3EJJXJkeAogCWFuZHEJ
JDcsJXJkeAotCW1vdnEJJXJjeCwlcjgKLQlzdWJxCSVyZHgsJXI4CisJc3VicQklcmR4LCVyY3gK
Kwl4Y2hncQklcmR4LCVyY3gKIAotCW1vdnEJJXJkeCwlcmN4CQkvKiBzZXQgdW50aWwgd29yZCBh
bGlnbmVkICovCi0JcmVwCisJcmVwCQkJCS8qIHNldCB1bnRpbCB3b3JkIGFsaWduZWQgKi8KIAlz
dG9zYgogCi0JbW92cQklcjgsJXJjeAorCW1vdnEJJXJkeCwlcmN4CiAJc2hycQkkMywlcmN4CQkJ
Lyogc2V0IGJ5IHdvcmRzICovCisKKyNpZiAhZGVmaW5lZChfS0VSTkVMKQkJCS8qIFhYWCBJIGRv
bid0IHRoaW5rIHdlIG5lZWQgdGhpcworCQkJCQkgICBpbiB0aGUga2VybmVsLiAgQW0gSSByaWdo
dD8gICAgKi8KIAlyZXAKKyNlbHNlCisJLyoKKwkgKiBJZiB0aGUgc3RyaW5nIGlzIHZlcnkgbG9u
ZywgaXQncyB3b3J0aCBhdm9pZGluZyBjYWNoZQorCSAqIHBvbGx1dGlvbiBieSB1c2luZyBub24t
dGVtcG9yYWwgaGludHMuICBTaG91bGQgYmUgZmFzdGVyIGlmCisJICogdGhlIHN0cmluZyBzaXpl
IGlzIGJpZ2dlciB0aGFuIHRoZSBsYXN0LWxldmVsIGNhY2hlIHNpemUuCisJICovCisJY21wcQkk
MHhmZmYsJXJkeAkJLyoga2VlcCB6ZXJvcGFnZSBmcm9tIGNhY2hlIHB1cmdpbmcgKi8KKwlqbGUJ
TDIKKworTDM6CW1vdm50aQklcmF4LCglcmRpKQorCWFkZHEJJDgsJXJkaQorCWRlY3EJJXJjeAor
CWpueglMMworCXNmZW5jZQorCWptcAlMNAorCitMMjoJcmVwCisjZW5kaWYKIAlzdG9zcQogCi0J
bW92cQklcjgsJXJjeAkJLyogc2V0IHJlbWFpbmRlciBieSBieXRlcyAqLworTDQ6CW1vdnEJJXJk
eCwlcmN4CQkvKiBzZXQgcmVtYWluZGVyIGJ5IGJ5dGVzICovCiAJYW5kcQkkNywlcmN4CiBMMToJ
cmVwCiAJc3Rvc2IKKyNpZm5kZWYgQlpFUk8KIAltb3ZxCSVyMTEsJXJheAorI2VuZGlmCiAKIAly
ZXQK
------=_Part_91149_4505636.1186266031495--