NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-i386/38619: Possible context switch / benchmark improvements



>Number:         38619
>Category:       port-i386
>Synopsis:       Possible context switch / benchmark improvements
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    port-i386-maintainer
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Fri May 09 21:55:00 +0000 2008
>Originator:     Andrew Doran
>Release:        4.99.62
>Organization:
The NetBSD Project
>Environment:
n/a
>Description:
Truncated output from tprof/tpfmt.sh, sampled during a 10 minute run 
with mysql sysbench, 200 client threads, on a dual core system.

It shows some areas where possible improvements could be made for
context switch performance, esp. where multithreaded processes are
involved.

Many of the samples seem to be from points where MMU state is being 
modified, and from points where cache lines would regularly move
between the two cores.

58147   c010ced2        mutex_enter+0x12
36515   c0790ab7        _atomic_inc_32+0x7
34714   c010cff4        mutex_spin_enter+0x34
29616   c0790a4e        _atomic_add_32_nv+0xe
24226   c0100392        cpu_switchto+0x22
23846   c05136d7        lcr3+0x7
22954   c0790b20        _atomic_cas_32+0x10
18604   c01004b0        Xsyscall
18339   c0513697        lldt+0x7
18258   c0790a6b        _atomic_and_32+0xb
14277   c050208a        pmap_activate+0x2a
14055   c04525ef        sleepq_wake+0x2f
11758   c0790adb        _atomic_or_32+0xb
11477   c0790b43        _membar_consumer+0x3
10178   c0100c8c        spllower+0x2c
10103   bb9eec67        start
9788    c0100723        copyout+0x33
9226    bb7dccc7        start
8641    c0435472        fd_putfile+0x32
8619    c049ba67        uipc_usrreq+0x2a7
8596    c048fb83        mbstat_type_add+0x33
8272    c0495de8        soreceive+0x758
8189    c049ba71        uipc_usrreq+0x2b1
8047    c0518367        syscall+0x47
7950    c0495690        soreceive
7709    c0790aad        _atomic_dec_32_nv+0xd

>How-To-Repeat:

>Fix:
_atomic_add_32_nv is from chgsbsize() which is operating on a
single record for uid 0. For each send/receive pair it gets done
twice. By default the ulimit is infinite, it would be nice to avoid
the atomic op unless needed.

_atomic_inc_32 is mainly from fd_getfile(). That typically operates on
thread private state unless one fd is shared between multiple threads
so there is not much that can be done? It's also used by pmap_load() to 
tweak reference counts. An ugly hack would be to remember a 'last pmap' 
in addition to the current pmap, to try and avoid changing pmap ref
counts so often.

lldt is from pmap_load(). Maybe we can use a generic LDT if the 
application has not modified its LDT, and avoid switching LDT in most
cases.

_atomic_or_32, _atomic_and_32 are probably from pmap_load(). It might
be possible+worthwhile to enhance the lazy loading / shootdown code 
clear the masks in the shootdown IPI handlers instead of on pmap
switch.

_atomic_cas_32 is probably from fd_putfile()?



Home | Main Index | Thread Index | Old Index