tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
notes from running will-it-scale
Hello,
I recently took an opportunity to run cross-systems microbenchmarks
with will-it-scale and included NetBSD (amd64).
https://people.freebsd.org/~mjg/freebsd-dragonflybsd-netbsd-v2.txt
[no linux in this doc, I will probably create a new one soon(tm)]
The system has a lot of problems in the vfs layer, vm is a mixed bag
with multithreaded cases lagging behind and some singlethreaded being
pretty good (and at least one winning against the other systems).
Notes:
- rtdscp is very expensive in vms, yet the kernel unconditionally
performs by calling vfs_timestamp. Both FreeBSD and DragonflyBSD have
a knob to change the resolution (and consequently avoid the
instruction), I think you should introduce it and default to less
accuracy on vms. Sample results:
stock pipe1: 2413901
patched pipe1: 3147312
stock vfsmix: 13889
patched vfsmix: 73477
- sched_yield is apparently a nop when the binary is not linked with
pthread. this does not match other systems and is probably a bug.
- pmap_zero_page/pmap_copy_page compile to atrocious code which keeps
checking for alignment. The compiler does not know what values can be
assigned to pmap_direct_base and improvises.
0xffffffff805200c3 <+0>: add 0xf93b46(%rip),%rdi #
0xffffffff814b3c10 <pmap_direct_base>
0xffffffff805200ca <+7>: mov $0x1000,%edx
0xffffffff805200cf <+12>: xor %eax,%eax
0xffffffff805200d1 <+14>: test $0x1,%dil
0xffffffff805200d5 <+18>: jne 0xffffffff805200ff <pmap_zero_page+60>
0xffffffff805200d7 <+20>: test $0x2,%dil
0xffffffff805200db <+24>: jne 0xffffffff8052010b <pmap_zero_page+72>
0xffffffff805200dd <+26>: test $0x4,%dil
0xffffffff805200e1 <+30>: jne 0xffffffff80520116 <pmap_zero_page+83>
0xffffffff805200e3 <+32>: mov %edx,%ecx
0xffffffff805200e5 <+34>: shr $0x3,%ecx
0xffffffff805200e8 <+37>: rep stos %rax,%es:(%rdi)
0xffffffff805200eb <+40>: test $0x4,%dl
0xffffffff805200ee <+43>: je 0xffffffff805200f1 <pmap_zero_page+46>
0xffffffff805200f0 <+45>: stos %eax,%es:(%rdi)
0xffffffff805200f1 <+46>: test $0x2,%dl
0xffffffff805200f4 <+49>: je 0xffffffff805200f8 <pmap_zero_page+53>
0xffffffff805200f6 <+51>: stos %ax,%es:(%rdi)
0xffffffff805200f8 <+53>: and $0x1,%edx
0xffffffff805200fb <+56>: je 0xffffffff805200fe <pmap_zero_page+59>
0xffffffff805200fd <+58>: stos %al,%es:(%rdi)
0xffffffff805200fe <+59>: retq
0xffffffff805200ff <+60>: stos %al,%es:(%rdi)
0xffffffff80520100 <+61>: mov $0xfff,%edx
0xffffffff80520105 <+66>: test $0x2,%dil
0xffffffff80520109 <+70>: je 0xffffffff805200dd <pmap_zero_page+26>
0xffffffff8052010b <+72>: stos %ax,%es:(%rdi)
0xffffffff8052010d <+74>: sub $0x2,%edx
0xffffffff80520110 <+77>: test $0x4,%dil
0xffffffff80520114 <+81>: je 0xffffffff805200e3 <pmap_zero_page+32>
0xffffffff80520116 <+83>: stos %eax,%es:(%rdi)
0xffffffff80520117 <+84>: sub $0x4,%edx
0xffffffff8052011a <+87>: jmp 0xffffffff805200e3 <pmap_zero_page+32>
The thing to do in my opinion is to just provide dedicated asm funcs.
This is the equivalent on FreeBSD (ifunc'ed):
ENTRY(pagezero_std)
PUSH_FRAME_POINTER
movl $PAGE_SIZE/8,%ecx
xorl %eax,%eax
rep
stosq
POP_FRAME_POINTER
ret
END(pagezero_std)
ENTRY(pagezero_erms)
PUSH_FRAME_POINTER
movl $PAGE_SIZE,%ecx
xorl %eax,%eax
rep
stosb
POP_FRAME_POINTER
ret
END(pagezero_erms)
--
Mateusz Guzik <mjguzik gmail.com>
Home |
Main Index |
Thread Index |
Old Index