tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 70,000 TLB shootdown IPIs per second



On Wed, Dec 05, 2012 at 08:48:31AM -0500, Thor Lancelot Simon wrote:
> I have been doing some testing on a fileserver recently donated to TNF.
> 
> The system has an Areca 1280 controller (arcmsr driver) with a single
> RAID6 volume configured on 12 disks; 32GB of RAM; two quad-core Xeon L5420
> CPUs.
> 
> I have tested under NetBSD-6 and NetBSD-current as well as the tls-maxphys
> branch.  The test is a 'dd bs=2m of=/test count=65536".  Write throughput,
> while acceptable (300-350MB/sec; 350-400MB/sec with tls-maxphys) is about
> 2/3 what I get on the same hardware under Linux.  Read throughput is less
> good, 250-300MB/sec.
> 
> The filesystem is FFSv2, 32k block/4k frag, with WAPBL.
> 
> Watching systat while I do the dd tests, I see up to 70,000 TLB shootdown
> IPIs per second.  Is this really right?  I am not sure I know how to count
> these under Linux but I don't see any evidence of them.  Is there a pmap
> problem?  I'm running port-amd64.
> 
> I have also seen some other odd things I'll detail elsewhere, but just for
> a start, can anyone explain to me whether this number of TLB shootdowns
> should be expected, whether each should really generate its own IPI, and
> what the performance impact may be?


time for a little fun with dtrace, now that it works on amd64:


#!/usr/sbin/dtrace -qs

fbt::pmap_tlb_shootnow:entry
{
        @a[ stack() ] = count();
}



and the top few entries from that with a portion of your dd test are:


              netbsd`pmap_deactivate+0x3b
              netbsd`mi_switch+0x329
              netbsd`idle_loop+0xe0
              netbsd`0xffffffff80100817
             4349

              netbsd`pmap_deactivate+0x3b
              netbsd`mi_switch+0x329
              netbsd`kpreempt+0xe2
              netbsd`0xffffffff80114295
              netbsd`ubc_uiomove+0x113
              netbsd`ffs_write+0x2c5
              netbsd`VOP_WRITE+0x37
              netbsd`vn_write+0xf9
              netbsd`dofilewrite+0x7d
              netbsd`sys_write+0x62
              netbsd`syscall+0x94
              netbsd`0xffffffff801006a1
             6168

              netbsd`pmap_deactivate+0x3b
              netbsd`softint_dispatch+0x3a2
              netbsd`0xffffffff8011422f
             8028

              netbsd`pmap_deactivate+0x3b
              netbsd`mi_switch+0x329
              netbsd`idle_loop+0xe0
              netbsd`cpu_hatch+0x16b
              netbsd`0xffffffff805cd345
            18813

              netbsd`pmap_deactivate+0x3b
              netbsd`mi_switch+0x329
              netbsd`sleepq_block+0xa4
              netbsd`cv_wait+0x101
              netbsd`workqueue_worker+0x4e
              netbsd`0xffffffff80100817
            20973

              netbsd`pmap_update+0x3b
              netbsd`uvm_pagermapout+0x29
              netbsd`uvm_aio_aiodone+0x94
              netbsd`workqueue_worker+0x7f
              netbsd`0xffffffff80100817
            20990

              netbsd`pmap_update+0x3b
              netbsd`uvm_unmap_remove+0x316
              netbsd`uvm_pagermapout+0x69
              netbsd`uvm_aio_aiodone+0x94
              netbsd`workqueue_worker+0x7f
              netbsd`0xffffffff80100817
            20990

              netbsd`pmap_update+0x3b
              netbsd`uvm_pagermapin+0x18e
              netbsd`genfs_gop_write+0x2f
              netbsd`genfs_do_putpages+0xc74
              netbsd`VOP_PUTPAGES+0x3a
              netbsd`ffs_write+0x316
              netbsd`VOP_WRITE+0x37
              netbsd`vn_write+0xf9
              netbsd`dofilewrite+0x7d
              netbsd`sys_write+0x62
              netbsd`syscall+0x94
              netbsd`0xffffffff801006a1
            85652

              netbsd`pmap_update+0x3b
              netbsd`ubc_alloc+0x514
              netbsd`ubc_uiomove+0xe1
              netbsd`ffs_write+0x2c5
              netbsd`VOP_WRITE+0x37
              netbsd`vn_write+0xf9
              netbsd`dofilewrite+0x7d
              netbsd`sys_write+0x62
              netbsd`syscall+0x94
              netbsd`0xffffffff801006a1
           685211

              netbsd`pmap_update+0x3b
              netbsd`ubc_release+0x26a
              netbsd`ubc_uiomove+0x113
              netbsd`ffs_write+0x2c5
              netbsd`VOP_WRITE+0x37
              netbsd`vn_write+0xf9
              netbsd`dofilewrite+0x7d
              netbsd`sys_write+0x62
              netbsd`syscall+0x94
              netbsd`0xffffffff801006a1
           685211



so this is working as currently designed, though obviously there's plenty of
room for improvement.  other OSs have been moving to using large-page
permanent mappings of RAM for accessing cached file data (linux has
done that for ages), and I've been wanting to do it for us too but I haven't
had the time to embark on it.  I did add support for the requisite "direct map"
stuff to amd64 just over a year ago, so at least that part is done.

the second biggest IPI offender in this workload is the current need to
map pages into the kernel's address space in order to send them to a disk
driver for I/O, even if the disk driver just ends up arranging for the pages
to be read or written by DMA.  this is another thing that I've wanted to
improve for a long time, but that's also a non-trivial project.

as for a short-term workaround, some workloads will probably
do better with a larger setting of UBC_WINSHIFT, but that will
most likely hurt other workloads.

-Chuck


Home | Main Index | Thread Index | Old Index