Subject: Re: memory stats on softupdates
To: None <tech-kern@netbsd.org>
From: George Georgalis <george@galis.org>
List: tech-kern
Date: 10/02/2007 18:48:55
On Fri, Sep 14, 2007 at 05:16:56PM +0100, Andrew Doran wrote:
>On Fri, Sep 14, 2007 at 11:51:00AM -0400, George Georgalis wrote:
>
>> I've been working on reproducing an archive (with public data)
>> that causes a kernel panic on netbsd-3 and RC1. It was suggested
>> I use sysctl to get stats on how much memory softupdates is using,
>> but that seems available in FreeBSD only.
>> 
>> Is there a way I can get memory stats for soft updates in a netbsd
>> generic?  It would also be useful to know how many pipe resources
>> are being used.
>> 
>> Maybe there is some other resources I could look at too? The
>> crash happens when extracting a 21Gb tar.bz2 archive with lots of
>> hardlinks and data with 10:1 compression ratio.
>
>Yes, have a look at the output of vmstat -m. The softdep pools are as below
>Obviously this system is not running softdep, so there are no numbers. You
>probably also want to track the number of buffers in use. Look at the buf*
>pools and/or use 'systat bufcache'.
>
>The maximum number of softdep operations is bounded by max_softdeps, by
>default it's set to:
>
>	max_softdeps = desiredvnodes * 4;
>
>Andrew
>
>Memory resource pool statistics
>Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
>sdpcpool     124        0    0        0     0     0     0     0     0   inf    0
>pagedeppl     68        0    0        0     0     0     0     0     0   inf    0
>inodedeppl    88        0    0        0     0     0     0     0     0   inf    0
>newblkpl      36        0    0        0     0     0     0     0     0   inf    0
>bmsafemappl   36        0    0        0     0     0     0     0     0   inf    0
>allocdirectpl 80        0    0        0     0     0     0     0     0   inf    0
>indirdeppl    32        0    0        0     0     0     0     0     0   inf    0
>allocindirpl  64        0    0        0     0     0     0     0     0   inf    0
>freefragpl    40        0    0        0     0     0     0     0     0   inf    0
>freeblkspl   172        0    0        0     0     0     0     0     0   inf    0
>freefilepl    36        0    0        0     0     0     0     0     0   inf    0
>diraddpl      36        0    0        0     0     0     0     0     0   inf    0
>mkdirpl       32        0    0        0     0     0     0     0     0   inf    0
>dirrempl      36        0    0        0     0     0     0     0     0   inf    0
>newdirblkpl   20        0    0        0     0     0     0     0     0   inf    0
>

Thanks. These must be the items related specifically to soft updates?

Trying to narrow my problem I've applied the note in 
ftp://ftp.netbsd.org/pub/NetBSD-daily/netbsd-3/200709300000Z/LAST_MINUTE
regarding setting vm.bufmem_hiwater and vm.bufmem_lowater, my
hiwater was negative 1.7 GB (-1718112256) in a 16Gb quad core
(2x2cpu) opteron. The problem persisted with reasonable values set
at boot. Next step was to reduce to 2GB RAM, remove FC altogether,
and stress test sata (w/o softupdates) --- no problem. I'm now
stressing over LSI FC with 2Gb RAM. If that passes, I'll try again
with soft updates. If that works I think I've identified the cause
as having over 2GB of RAM on this host. (it failed with 4GB too)

As I began this run, I got "deadbeef" to stderr with the vmstat -H
command... it has doesn't do it now, but how significant is a
corrupted hash chain?

All testing of late has been on pre RC2, netbsd-4 kernel and
userland.

// George



+ vmstat -efH
vmstat: kptr deadbeefdeadbf67: hash chain corrupted: kvm_read: Bad address
5609 forks total
144 forks blocked parent
165 forks shared address space with parent

event                                         total     rate type
uvmmap ubackmerge                            429657       77 misc
uvmmap uforwmerge                                 2        0 misc
uvmmap unomerge                              355362       64 misc
uvmmap kbackmerge                             15150        2 misc
uvmmap kforwmerge                               458        0 misc
uvmmap kbimerge                               10521        1 misc
uvmmap knomerge                             3607786      650 misc
uvmmap map_call                             4418936      797 misc
uvmmap mlk_call                            25763213     4647 misc
uvmmap mlk_hint                            21661633     3907 misc
uvmmap uke_alloc                              30382        5 misc
uvmmap uke_free                               28126        5 misc
uvmmap ukh_alloc                                147        0 misc
uvmmap ukh_free                                  45        0 misc
pdpolicy reactexec                            49096        8 misc
pdpolicy reactanon                            90103       16 misc
vmcmd calls                                   98091       17 misc
vmcmd extends                                  5006        0 misc
vmcmd kills                                   10374        1 misc
timecounter binuptime                     114252449    20612 misc
timecounter bintime                       114252521    20612 misc
timecounter nanotime                       85544836    15432 misc
timecounter microtime                      28709081     5179 misc
timecounter getnanouptime                     10197        1 misc
timecounter getmicrouptime                  4226616      762 misc
timecounter getmicrotime                   37979641     6851 misc
timecounter setclock                              2        0 misc
bus_dma nbouncebufs                               1        0 misc
bus_dma loads                               2466191      444 misc
cpu0 softclock                               552687       99 intr
cpu0 softnet                                   3779        0 intr
cpu0 softserial                                   1        0 intr
cpu0 timer                                   554867      100 intr
cpu0 FPU flush IPI                                2        0 intr
cpu0 FPU synch IPI                              531        0 intr
cpu0 TLB shootdown IPI                     61738730    11138 intr
cpu1 timer                                   552869       99 intr
cpu1 FPU flush IPI                                6        0 intr
cpu1 FPU synch IPI                              549        0 intr
cpu1 TLB shootdown IPI                    107342194    19365 intr
cpu2 timer                                   553216       99 intr
cpu2 FPU flush IPI                                1        0 intr
cpu2 FPU synch IPI                              654        0 intr
cpu2 TLB shootdown IPI                    108080308    19498 intr
cpu3 timer                                   552016       99 intr
cpu3 FPU flush IPI                                1        0 intr
cpu3 FPU synch IPI                              638        0 intr
cpu3 TLB shootdown IPI                    105741881    19076 intr
ioapic0 pin 21                                 4596        0 intr
ioapic0 pin 14                                    6        0 intr
ioapic0 pin 22                               155263       28 intr
ioapic0 pin 23                                   34        0 intr
ioapic0 pin 16                              1955320      352 intr
ioapic0 pin 17                                  254        0 intr
ioapic0 pin 3                                     1        0 intr

                    total     used     util      num  average  maximum
hash table        buckets  buckets        %    items    chain    chain


-- 
George Georgalis, information system scientist <IXOYE><