tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: lookup on memory shortage



Hi,
I got a lockup again. I had top running, here's what it displayed before
the box wedged:

|load averages:  2.20,  1.24,  0.98;               up 3+14:52:30        03:40:25
|40 processes: 3 runnable, 35 sleeping, 1 zombie, 1 on CPU
|CPU states:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
|Memory: 294M Act, 144M Inact, 12M Wired, 11M Exec, 77M File, 16K Free
|Swap: 256M Total, 256M Used, 4K Free
|
|  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
| 3744 root      43    0  5520K  348K RUN        2:50  1.10%  0.24% cc1
|    1 root      85    0   748K    4K wait       0:49  0.00%  0.00% <init>
| 5951 bouyer    43    0   756K  796K CPU        0:33  0.00%  0.00% top
|  408 bouyer    85    0   764K  412K select     0:12  0.00%  0.00% screen-4.0.3
|  380 bouyer    85    0   824K  656K select     0:08  0.00%  0.00% sshd
|  218 root      85    0  1780K 5672K pause      0:06  0.00%  0.00% ntpd
|11203 root      85    0   748K    4K wait       0:05  0.00%  0.00% <pbulk-build
|  400 bouyer    43    0   756K   12K RUN        0:01  0.00%  0.00% screen-4.0.3
|  339 root      85    0   752K    4K kqueue     0:01  0.00%  0.00% <master>
|24186 root      85    0   748K    4K kqueue     0:01  0.00%  0.00% <tail>
|15925 root      85    0   124K  116K RUN        0:00  0.00%  0.00% sh
|13806 root      85    0   752K  908K piperd     0:00  0.00%  0.00% cron
|  364 root      85    0   756K  172K nanoslp    0:00  0.00%  0.00% getty
|  326 root      85    0   756K  172K nanoslp    0:00  0.00%  0.00% getty
|  395 root      85    0  1156K    4K pause      0:00  0.00%  0.00% <tcsh>
|  375 bouyer    85    0  1084K    4K pause      0:00  0.00%  0.00% <tcsh>
|  194 bouyer    85    0  1020K    4K pause      0:00  0.00%  0.00% <tcsh>

This time I don't understand where the memory has gone, because there's
no big processes running (unless cc1 has grown a lot after the last top
display, and before the box hanging).

I had on console:

|Out of memory allocating ksiginfo for pid 218
|Out of memory allocating ksiginfo for pid 218
|Out of memory allocating ksiginfo for pid 218
|Out of memory allocating ksiginfo for pid 218
|Out of memory allocating ksiginfo for pid 218

And here's some play with ddb:

|db> show uvm
|Current UVM status:
|  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
|  127464 VM pages: 75264 active, 36757 inactive, 3014 wired, 0 free
|  pages  91023 anon, 19645 file, 2884 exec
|  freemin=256, free-target=341, wired-max=42488
|  faults=-522007880, traps=-519495790, intrs=92796461, ctxswitch=525566965
|  softint=228469838, syscalls=-1350524482, swapins=2478, swapouts=2516
|  fault counts:
|    noram=2248, noanon=0, pgwait=68, pgrele=0
|    ok relocks(total)=180961(180970), anget(retrys)=735139129(40445), 
amapcopy=496101704
|    neighbor anon/obj pg=702075445/2109263411, 
gets(lock/unlock)=1486862887/140522
|    cases: anon=485256283, anoncow=201489739, obj=1273861484, 
prcopy=213001397, przero=1523229842
|  daemon and swap counts:
|    woke=1168338, revs=1167792, scans=32311371, obscans=12587557, 
anscans=18608493
|    busy=35500, freed=12698798, reactivate=240287, deactivate=22692608
|    pageouts=31461, pending=79093, nswget=39981
|    nswapdev=1, swpgavail=65535
|    swpages=65535, swpginuse=65535, swpgonly=59752, paging=0
|
|db> ps /l
| PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
| 15925         1 2         0           d1766080                 sh
| 13806         1 3        80           d1766a00               cron piperd
| 3744          1 2         0           d1767c80                cc1
| 29722         1 3        80           cb44ace0                gcc wait
| 21814         1 3        80           d17667a0                 sh wait
| 28972         1 3        80           d17662e0               make wait
| 21184         1 3        80           cd57a360                 sh wait
| 18671         1 3        80           cb53bc80               make wait
| 4071          1 3        80           d1766c60                 sh wait
| 2134          1 3        80           cd57aa80               make wait
| 21839         1 3        80           cb760340                 sh wait
| 16547         1 2         0           cb44aa80             pickup
| 5951          1 2         0           d1766540                top
| 11203         1 3        80           cb7600e0        pbulk-build wait
| 27865         1 3        80           cb760800                 sh wait
| 24186         1 3        80           d1661d00               tail kqueue
| 17250         1 3        80           cd57ace0                 sh wait
| 194           1 3        80           cb7605a0               tcsh pause
| 395           1 3        80           cb760cc0               tcsh pause
| 413           1 3        80           cb6e00c0                ksh pause
| 401           1 3        80           cb6e0320               tcsh pause
| 408           1 2         0           cb6e0580       screen-4.0.3
| 400           1 2         0           cb6e07e0       screen-4.0.3
| 375           1 3        80           cb6e0a40               tcsh pause
| 380           1 2         0           cb6e0ca0               sshd
| 293           1 3        80           cb53b0a0               sshd netio
| 326           1 2         0           cb53b300              getty
| 318           1 2         0           cb53b560              getty
| 364           1 2         0           ca26a020              getty
| 367           1 2         0           ca271c20              getty
| 360           1 2         4           ca272780               cron
| 351           1 2         0           cb4a4080               qmgr
| 347           1 3        80           cb53ba20              inetd kqueue
| 339           1 2         0           cb53b7c0             master
| 246           1 2         0           cb4a42e0               sshd
| 228           1 3        80           ca2722c0             powerd kqueue
| 218           1 2   1000000           cb4a4540               ntpd
| 109           1 2         0           ca26a280            syslogd
| 1             1 3        80           ca2712a0               init wait
|>0            35 5       204           d1767300           (zombie)
|              31 3       204           cb4a47a0              nfsio nfsiod
|              30 3       204           cb4a4a00              nfsio nfsiod
|              29 3       204           cb4a4c60              nfsio nfsiod
|              28 3       204           ca272060              nfsio nfsiod
|              27 3       204           ca272520            physiod physiod
|              26 3       204           ca2719c0        vmem_rehash vmem_rehash
|              25 3       204           ca271760           aiodoned aiodoned
|              24 2       204           ca271500            ioflush
|           >  23 7       204           ca271040           pgdaemon
|              22 3       204           ca26a4e0          cryptoret crypto_wait
|              21 2       204           ca2729e0             xenbus
|              20 3       204           ca272c40           xenwatch evtsq
|              10 3       204           ca26a740           pmfevent pmfevent
|               9 3       204           ca26a9a0            cachegc cachegc
|               8 3       204           ca26ac00              vrele vrele
|               7 3       204           ca267000            xcall/0 xcall
|               6 1       204           ca267260          softser/0
|               5 1       204           ca2674c0          softclk/0
|               4 1       204           ca267720          softbio/0
|               3 1       204           ca267980          softnet/0
|               2 1       205           ca267be0             idle/0
|               1 3       204           c044c080            swapper schedpwait


ddb bt/a isn't of much use:
|db> bt/a ca271040
|trace: pid 0 lid 23 at 0xca797f38
|breakpoint(ffffff00,80,ca797f68,9,1,a,ca797fa8,c03aa6ff,ca77268c,c05bc009) at 
ne
|tbsd:breakpoint+0x4
|xencons_tty_input(ca77268c,c05bc009,1,c03a3b9b,3b9aca00,0,6,0,4,2) at 
netbsd:xen
|cons_tty_input+0xa6
|xencons_handler(ca77268c,ca79ac0c,0,64,0,4,0,0,c03a1b85,4) at 
netbsd:xencons_han
|dler+0x5f
|evtchn_do_event(2,ca79ac0c,ca79abc4,0,fatal page fault in supervisor mode
|trap type 6 code 0 eip c038e6d1 cs 9 eflags 10246 cr2 ca798000 ilevel 8
|kernel: supervisor trap page fault, code=0

Here's a ps/a, and the 'show map' for all processes:

| PID          COMMAND      STRUCT PROC *            UAREA *     VMSPACE/VM_MAP
| 15925             sh           cd578528           cb346da0           cb6f30d4
| 13806           cron           cd578bf8           cb28bda0           ca278750
| 3744             cc1           d0733aa8           cb2d5da0           d166f008
| 29722            gcc           cc32ca38           cb242da0           d166fea8
| 21814             sh           cb539da4           cbc4dda0           cb6f3c34
| 28972           make           cb6eb6d8           cbb0eda0           d166f348
| 21184             sh           d0733070           cb312da0           cb6f3414
| 18671           make           cd578890           cb246da0           d166f0d8
| 4071              sh           cd57800c           cb4dada0           d166fa98
| 2134            make           d0733e10           cbc4ada0           d166fdd8
| 21839             sh           cb6eb1bc           cd8b2da0           cb6f3684
| 16547         pickup           cc32cda0           cb56eda0           cb6f3a94
| 5951             top           d17658d0           d0502da0           d166f828
| 11203     pbulk-buil           cb6eb008           cd8c2da0           cb6f3754
| 27865             sh           cb6eb524           cb792da0           cb6f3b64
| 24186           tail           cd578374           d1582da0           cb6f3344
| 17250             sh           cd578dac           cd8c5da0           cb6f38f4
| 194             tcsh           cb6eb370           cb79ada0           cb6f39c4
| 395             tcsh           cb6eb88c           cb75ada0           cb6f3d04
| 413              ksh           cb6eba40           cb752da0           cb6f3dd4
| 401             tcsh           cb6ebbf4           cb71eda0           ca278000
| 408       screen-4.0           cb6ebda8           cb715da0           cb6f3ea4
| 400       screen-4.0           cb539004           cb70bda0           ca2780d0
| 375             tcsh           cb5391b8           cb6d7da0           ca2781a0
| 380             sshd           cb53936c           cb6ceda0           ca278270
| 293             sshd           cb539520           cb6c6da0           ca278340
| 326            getty           cb5396d4           cb6bada0           ca278410
| 318            getty           cb539888           cb583da0           ca2784e0
| 364            getty           ca273a38           ca8a3da0           ca278d00
| 367            getty           ca273bec           ca8a6da0           ca278dd0
| 360             cron           ca2736d0           cb2cada0           ca278b60
| 351             qmgr           ca273000           cb562da0           ca278820
| 347            inetd           cb539bf0           cb57bda0           ca278680
| 339           master           cb539a3c           cb587da0           ca2785b0
| 246             sshd           ca2731b4           cb51bda0           ca2788f0
| 228           powerd           ca27351c           cb2fada0           ca278a90
| 218             ntpd           ca273368           cb51eda0           ca2789c0
| 109          syslogd           ca273884           ca8a0da0           ca278c30
| 1               init           ca273da0           ca8b2da0           ca278ea0
|>0             system           c044bec0           cb404da0           c04aa6c0
|
|db> sh map cb6f30d4
|MAP 0xcb6f30d4: [0x0->0xbf800000]
|        #ent=18, sz=68423680, ref=1, version=192, flags=0x41
|        pmap=0xca27945c(resident=1, wired=0)
|db> sh map ca278750
|MAP 0xca278750: [0x0->0xbf800000]
|        #ent=17, sz=70053888, ref=1, version=12, flags=0x41
|        pmap=0xca27907c(resident=1, wired=0)
|db> sh map d166f008
|MAP 0xd166f008: [0x0->0xbf800000]
|        #ent=983, sz=663093248, ref=1, version=51374, flags=0x41
|        pmap=0xd12ff9b4(resident=1, wired=0)
|db> sh map d166fea8
|MAP 0xd166fea8: [0x0->0xbf800000]
|        #ent=12, sz=69980160, ref=1, version=69, flags=0x41
|        pmap=0xd12ffb28(resident=1, wired=0)
|db> sh map cb6f3c34
|MAP 0xcb6f3c34: [0x0->0xbf800000]
|        #ent=19, sz=70107136, ref=1, version=200, flags=0x41
|        pmap=0xd12ff8bc(resident=1, wired=0)
|db> sh map d166f348
|MAP 0xd166f348: [0x0->0xbf800000]
|        #ent=16, sz=70053888, ref=1, version=64, flags=0x41
|        pmap=0xca279364(resident=1, wired=0)
|db> sh map cb6f3414
|MAP 0xcb6f3414: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=55, flags=0x41
|        pmap=0xca279aa8(resident=1, wired=0)
|db> sh map d166f0d8
|MAP 0xd166f0d8: [0x0->0xbf800000]
|        #ent=21, sz=72151040, ref=1, version=71, flags=0x41
|        pmap=0xd12ffd18(resident=1, wired=0)
|db> sh map d166fa98
|MAP 0xd166fa98: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=43, flags=0x41
|        pmap=0xd12ff5d4(resident=1, wired=0)
|db> sh map d166fdd8
|MAP 0xd166fdd8: [0x0->0xbf800000]
|        #ent=21, sz=72151040, ref=1, version=44, flags=0x41
|        pmap=0xd12ff2ec(resident=1, wired=0)
|db> sh map cb6f3684
|MAP 0xcb6f3684: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=44, flags=0x41
|        pmap=0xca2790f8(resident=1, wired=0)
|db> sh map cb6f3684
|MAP 0xcb6f3684: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=44, flags=0x41
|        pmap=0xca2790f8(resident=1, wired=0)
|db> sh map cb6f3a94
|MAP 0xcb6f3a94: [0x0->0xbf800000]
|        #ent=26, sz=72003584, ref=1, version=221, flags=0x41
|        pmap=0xca279000(resident=1, wired=0)
|db> sh map d166f828
|MAP 0xd166f828: [0x0->0xbf800000]
|        #ent=20, sz=70127616, ref=1, version=214, flags=0x41
|        pmap=0xd12fff08(resident=1, wired=0)
|db> sh map cb6f3754
|MAP 0xcb6f3754: [0x0->0xbf800000]
|        #ent=23, sz=75304960, ref=1, version=305, flags=0x41
|        pmap=0xca279174(resident=1, wired=0)
|db> sh map cb6f3b64
|MAP 0xcb6f3b64: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=197, flags=0x41
|        pmap=0xca2793e0(resident=1, wired=0)
|db> sh map cb6f3344
|MAP 0xcb6f3344: [0x0->0xbf800000]
|        #ent=13, sz=69980160, ref=1, version=201, flags=0x41
|        pmap=0xd12ffe8c(resident=1, wired=0)
|db> sh map cb6f38f4
|MAP 0xcb6f38f4: [0x0->0xbf800000]
|        #ent=20, sz=70107136, ref=1, version=204, flags=0x41
|        pmap=0xca27926c(resident=1, wired=0)
|db> sh map cb6f39c4
|MAP 0xcb6f39c4: [0x0->0xbf800000]
|        #ent=26, sz=69271552, ref=1, version=270, flags=0x41
|        pmap=0xca2792e8(resident=1, wired=0)
|db> sh map cb6f3d04
|MAP 0xcb6f3d04: [0x0->0xbf800000]
|        #ent=26, sz=69414912, ref=1, version=430, flags=0x41
|        pmap=0xca2794d8(resident=1, wired=0)
|db> sh map cb6f3dd4
|MAP 0xcb6f3dd4: [0x0->0xbf800000]
|        #ent=13, sz=69980160, ref=1, version=348, flags=0x41
|        pmap=0xca279554(resident=1, wired=0)
|db> sh map ca278000
|MAP 0xca278000: [0x0->0xbf800000]
|        #ent=26, sz=69259264, ref=1, version=267, flags=0x41
|        pmap=0xca27964c(resident=1, wired=0)
|db> sh map cb6f3ea4
|MAP 0xcb6f3ea4: [0x0->0xbf800000]
|        #ent=23, sz=70107136, ref=1, version=190, flags=0x41
|        pmap=0xca2795d0(resident=1, wired=0)
|db> sh map ca2780d0
|MAP 0xca2780d0: [0x0->0xbf800000]
|        #ent=23, sz=70107136, ref=1, version=237, flags=0x41
|        pmap=0xca2796c8(resident=1, wired=0)
|db> sh map ca2781a0
|MAP 0xca2781a0: [0x0->0xbf800000]
|        #ent=28, sz=69341184, ref=1, version=314, flags=0x41
|        pmap=0xca279744(resident=1, wired=0)
|db> sh map ca278270
|MAP 0xca278270: [0x0->0xbf800000]
|        #ent=69, sz=75927552, ref=1, version=192, flags=0x41
|        pmap=0xca2797c0(resident=1, wired=0)
|db> sh map ca278340
|MAP 0xca278340: [0x0->0xbf800000]
|        #ent=69, sz=75927552, ref=1, version=246, flags=0x41
|        pmap=0xca27983c(resident=1, wired=0)
|db> sh map ca278410
|MAP 0xca278410: [0x0->0xbf800000]
|        #ent=19, sz=70070272, ref=1, version=207, flags=0x41
|        pmap=0xca2798b8(resident=1, wired=0)
|db> sh map ca2784e0
|MAP 0xca2784e0: [0x0->0xbf800000]
|        #ent=19, sz=70070272, ref=1, version=210, flags=0x41
|        pmap=0xca279934(resident=1, wired=0)
|db> sh map ca278d00
|MAP 0xca278d00: [0x0->0xbf800000]
|        #ent=19, sz=70070272, ref=1, version=207, flags=0x41
|        pmap=0xca279e0c(resident=1, wired=0)
|db> sh map ca278dd0
|MAP 0xca278dd0: [0x0->0xbf800000]
|        #ent=19, sz=70070272, ref=1, version=96, flags=0x41
|        pmap=0xca279e88(resident=1, wired=0)
|db> sh map ca278b60
|MAP 0xca278b60: [0x0->0xbf800000]
|        #ent=17, sz=70053888, ref=1, version=3675, flags=0x41
|        pmap=0xca279d14(resident=106, wired=0)
|db> sh map ca278820
|MAP 0xca278820: [0x0->0xbf800000]
|        #ent=26, sz=72003584, ref=1, version=211, flags=0x41
|        pmap=0xca279b24(resident=1, wired=0)
|db> sh map ca278680
|MAP 0xca278680: [0x0->0xbf800000]
|        #ent=22, sz=70131712, ref=1, version=12, flags=0x41
|        pmap=0xca279a2c(resident=1, wired=0)
|db> sh map ca2785b0
|MAP 0xca2785b0: [0x0->0xbf800000]
|        #ent=26, sz=72003584, ref=1, version=640, flags=0x41
|        pmap=0xca2799b0(resident=1, wired=0)
|db> sh map ca2788f0
|MAP 0xca2788f0: [0x0->0xbf800000]
|        #ent=50, sz=73154560, ref=1, version=5440, flags=0x41
|        pmap=0xca279ba0(resident=1, wired=0)
|db> sh map ca278a90
|MAP 0xca278a90: [0x0->0xbf800000]
|        #ent=19, sz=70111232, ref=1, version=11, flags=0x41
|        pmap=0xca279c98(resident=1, wired=0)
|db> sh map ca2789c0
|MAP 0xca2789c0: [0x0->0xbf800000]
|        #ent=26, sz=72691712, ref=1, version=108, flags=0x45
|        pmap=0xca279c1c(resident=1418, wired=1413)
|db> sh map ca278c30
|MAP 0xca278c30: [0x0->0xbf800000]
|        #ent=19, sz=70082560, ref=1, version=172, flags=0x41
|        pmap=0xca279d90(resident=1, wired=0)
|db> sh map ca278ea0
|MAP 0xca278ea0: [0x0->0xbf800000]
|        #ent=20, sz=70090752, ref=1, version=77, flags=0x41
|        pmap=0xca279f04(resident=1, wired=0)
|db> sh map c04aa6c0
|MAP 0xc04aa6c0: [0x0->0xbfdfc000]
|        #ent=0, sz=0, ref=1, version=1, flags=0x41
|        pmap=0xc04ce780(resident=2174, wired=1629)

The map for cc1 looks large but I'm not sure it explains where the
memory did go. I wonder if there's some memory leak in the kernel that
only show up with certain usage patterns (the box can build packages for
days with almost no swap in use so it's not a slow memory leak).


Home | Main Index | Thread Index | Old Index