NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
port-i386/38619: Possible context switch / benchmark improvements
>Number: 38619
>Category: port-i386
>Synopsis: Possible context switch / benchmark improvements
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: port-i386-maintainer
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Fri May 09 21:55:00 +0000 2008
>Originator: Andrew Doran
>Release: 4.99.62
>Organization:
The NetBSD Project
>Environment:
n/a
>Description:
Truncated output from tprof/tpfmt.sh, sampled during a 10 minute run
with mysql sysbench, 200 client threads, on a dual core system.
It shows some areas where possible improvements could be made for
context switch performance, esp. where multithreaded processes are
involved.
Many of the samples seem to be from points where MMU state is being
modified, and from points where cache lines would regularly move
between the two cores.
58147 c010ced2 mutex_enter+0x12
36515 c0790ab7 _atomic_inc_32+0x7
34714 c010cff4 mutex_spin_enter+0x34
29616 c0790a4e _atomic_add_32_nv+0xe
24226 c0100392 cpu_switchto+0x22
23846 c05136d7 lcr3+0x7
22954 c0790b20 _atomic_cas_32+0x10
18604 c01004b0 Xsyscall
18339 c0513697 lldt+0x7
18258 c0790a6b _atomic_and_32+0xb
14277 c050208a pmap_activate+0x2a
14055 c04525ef sleepq_wake+0x2f
11758 c0790adb _atomic_or_32+0xb
11477 c0790b43 _membar_consumer+0x3
10178 c0100c8c spllower+0x2c
10103 bb9eec67 start
9788 c0100723 copyout+0x33
9226 bb7dccc7 start
8641 c0435472 fd_putfile+0x32
8619 c049ba67 uipc_usrreq+0x2a7
8596 c048fb83 mbstat_type_add+0x33
8272 c0495de8 soreceive+0x758
8189 c049ba71 uipc_usrreq+0x2b1
8047 c0518367 syscall+0x47
7950 c0495690 soreceive
7709 c0790aad _atomic_dec_32_nv+0xd
>How-To-Repeat:
>Fix:
_atomic_add_32_nv is from chgsbsize() which is operating on a
single record for uid 0. For each send/receive pair it gets done
twice. By default the ulimit is infinite, it would be nice to avoid
the atomic op unless needed.
_atomic_inc_32 is mainly from fd_getfile(). That typically operates on
thread private state unless one fd is shared between multiple threads
so there is not much that can be done? It's also used by pmap_load() to
tweak reference counts. An ugly hack would be to remember a 'last pmap'
in addition to the current pmap, to try and avoid changing pmap ref
counts so often.
lldt is from pmap_load(). Maybe we can use a generic LDT if the
application has not modified its LDT, and avoid switching LDT in most
cases.
_atomic_or_32, _atomic_and_32 are probably from pmap_load(). It might
be possible+worthwhile to enhance the lazy loading / shootdown code
clear the masks in the shootdown IPI handlers instead of on pmap
switch.
_atomic_cas_32 is probably from fd_putfile()?
Home |
Main Index |
Thread Index |
Old Index