tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

notes from running will-it-scale



Hello,

I recently took an opportunity to run cross-systems microbenchmarks
with will-it-scale and included NetBSD (amd64).

https://people.freebsd.org/~mjg/freebsd-dragonflybsd-netbsd-v2.txt
[no linux in this doc, I will probably create a new one soon(tm)]

The system has a lot of problems in the vfs layer, vm is a mixed bag
with multithreaded cases lagging behind and some singlethreaded being
pretty good (and at least one winning against the other systems).

Notes:
- rtdscp is very expensive in vms, yet the kernel unconditionally
performs by calling vfs_timestamp. Both FreeBSD and DragonflyBSD have
a knob to change the resolution (and consequently avoid the
instruction), I think you should introduce it and default to less
accuracy on vms. Sample results:
stock pipe1: 2413901
patched pipe1: 3147312
stock vfsmix: 13889
patched vfsmix: 73477
- sched_yield is apparently a nop when the binary is not linked with
pthread. this does not match other systems and is probably a bug.
- pmap_zero_page/pmap_copy_page compile to atrocious code which keeps
checking for alignment. The compiler does not know what values can be
assigned to pmap_direct_base and improvises.

   0xffffffff805200c3 <+0>:       add    0xf93b46(%rip),%rdi        #
0xffffffff814b3c10 <pmap_direct_base>
   0xffffffff805200ca <+7>:       mov    $0x1000,%edx
   0xffffffff805200cf <+12>:      xor    %eax,%eax
   0xffffffff805200d1 <+14>:      test   $0x1,%dil
   0xffffffff805200d5 <+18>:      jne    0xffffffff805200ff <pmap_zero_page+60>
   0xffffffff805200d7 <+20>:      test   $0x2,%dil
   0xffffffff805200db <+24>:      jne    0xffffffff8052010b <pmap_zero_page+72>
   0xffffffff805200dd <+26>:      test   $0x4,%dil
   0xffffffff805200e1 <+30>:      jne    0xffffffff80520116 <pmap_zero_page+83>
   0xffffffff805200e3 <+32>:      mov    %edx,%ecx
   0xffffffff805200e5 <+34>:      shr    $0x3,%ecx
   0xffffffff805200e8 <+37>:      rep stos %rax,%es:(%rdi)
   0xffffffff805200eb <+40>:      test   $0x4,%dl
   0xffffffff805200ee <+43>:      je     0xffffffff805200f1 <pmap_zero_page+46>
   0xffffffff805200f0 <+45>:      stos   %eax,%es:(%rdi)
   0xffffffff805200f1 <+46>:      test   $0x2,%dl
   0xffffffff805200f4 <+49>:      je     0xffffffff805200f8 <pmap_zero_page+53>
   0xffffffff805200f6 <+51>:      stos   %ax,%es:(%rdi)
   0xffffffff805200f8 <+53>:      and    $0x1,%edx
   0xffffffff805200fb <+56>:      je     0xffffffff805200fe <pmap_zero_page+59>
   0xffffffff805200fd <+58>:      stos   %al,%es:(%rdi)
   0xffffffff805200fe <+59>:      retq
   0xffffffff805200ff <+60>:      stos   %al,%es:(%rdi)
   0xffffffff80520100 <+61>:      mov    $0xfff,%edx
   0xffffffff80520105 <+66>:      test   $0x2,%dil
   0xffffffff80520109 <+70>:      je     0xffffffff805200dd <pmap_zero_page+26>
   0xffffffff8052010b <+72>:      stos   %ax,%es:(%rdi)
   0xffffffff8052010d <+74>:      sub    $0x2,%edx
   0xffffffff80520110 <+77>:      test   $0x4,%dil
   0xffffffff80520114 <+81>:      je     0xffffffff805200e3 <pmap_zero_page+32>
   0xffffffff80520116 <+83>:      stos   %eax,%es:(%rdi)
   0xffffffff80520117 <+84>:      sub    $0x4,%edx
   0xffffffff8052011a <+87>:      jmp    0xffffffff805200e3 <pmap_zero_page+32>

The thing to do in my opinion is to just provide dedicated asm funcs.
This is the equivalent on FreeBSD (ifunc'ed):

ENTRY(pagezero_std)
        PUSH_FRAME_POINTER
        movl    $PAGE_SIZE/8,%ecx
        xorl    %eax,%eax
        rep
        stosq
        POP_FRAME_POINTER
        ret
END(pagezero_std)

ENTRY(pagezero_erms)
        PUSH_FRAME_POINTER
        movl    $PAGE_SIZE,%ecx
        xorl    %eax,%eax
        rep
        stosb
        POP_FRAME_POINTER
        ret
END(pagezero_erms)

-- 
Mateusz Guzik <mjguzik gmail.com>



Home | Main Index | Thread Index | Old Index