Subject: Re: 2.0_BETA weirdness (was Re: 1.6.2 weirdness)
To: Andrey Petrov <petrov@NetBSD.org>
From: ali (Anders Lindgren) <dat94ali@ludat.lth.se>
List: port-sparc64
Date: 09/19/2004 16:02:19
On Fri, 17 Sep 2004, Andrey Petrov wrote:
>
> That's exactly 24648. If you can find what application (ps/l) and under
> what conditions causes that, it would be big help in resolving an issue.

  It's happened again. ps/l and some other output included.

  It looks like it's a thread context switch in named9 (from pkgsrc) that
panics the machine? I was just running a "make install" in
/usr/pkgsrc/databases/p5-DBI when this happened, but I'm not sure it has
anything to do with disk load; the very first time I don't know what
was going on when it happened. The second time, with my DEBUG kernel,
it was probably doing nothing because it had finished the perl5 build
I started on it. Third time it was building mysql4-server from pkgsrc,
and now the fourth time, it was installing p5-DBI.

  I have built several other fairly large packages (apache, perl5) without
triggering this bug, so I'm not going to make guesses... named9 was at
the time running in a chroot jail in /var/chroot/named. As always, the
load on this machine is quite low; rarely uses swap or runs low on memory.

  Please let me know if there is anything more I can do to provide useful
information, and let me know if you want me to try to arrange access to
the serial console for you (it's on someone else's box, so I can't promise
anything).

ps/l output:
--------8<--------8<--------
db> ps/l
 PID         LID S     FLAGS       STRUCT LWP *            UAREA * WAIT
 9399          1 2       0x4          0x64b3320          0x6114000
 20704         1 3      0x84          0x64b2e20          0x55a8000 wait
 6441          1 3      0x84          0x64b3220          0x6118000 wait
 11031         1 3      0x80          0x64b2520          0x6286000 wait
 6651          1 3      0x80          0x64b2720          0x628c000 wait
 673           1 3      0x80          0x64b2d20          0x626a000 wait
 2290          1 3      0x80          0x64b2820          0x61e8000 wait
 24084         7 3 0x30900084          0x64b2420          0x6220000 sawait
               5 3  0xd00084          0x64b2f20          0x60dc000 select
               3 3  0xd00084          0x64b2120          0x655a000 select
               4 3  0xd00080          0x64b2020          0x60d8000 select
               2 3  0xd00080          0x64b2b20          0x7bf0000 sigwait
 27565         1 3      0x80          0x64b2220          0x66fa000 wait
 28478         1 3      0x80          0x5e93100          0x7ea8000 pause
 13482         1 3      0x80          0x5e93400          0x7ea0000 pause
 9631          1 3      0x84          0x5e92d00          0x7ece000 select
 25317         1 3      0x80          0x5e92700          0x7ed2000 netio
 27583         1 3      0x80          0x5e93d00          0x6766000 netcon
 18151         1 3      0x80          0x5e93c00          0x6718000 netcon
 18530         1 3      0x80          0x5e93700          0x6750000 netcon
 23673         1 3      0x80          0x5e93a00          0x6770000 netcon
 16125         1 3      0x80          0x5e93600          0x6738000 netcon
 22667         1 3      0x80          0x5e92600          0x6724000 netcon
 17033         1 3      0x80          0x5e93b00          0x675c000 netcon
 12776         1 3      0x80          0x5e93900          0x671c000 netcon
 26954         1 3      0x80          0x5e93e00          0x6756000 netcon
 21658         1 3      0x80          0x5e93500          0x65ce000 select
 24542         1 3      0x84          0x5e92f00          0x6744000 select
 23086         1 3      0x80          0x5e92e00          0x66bc000 select
>16856         5 8  0x900004          0x64b2320          0x60d4000
           >   2 3 0x4900004          0x5e92a00          0x6710000 biowait
               3 3  0xd00080          0x5e93300          0x667e000 select
               1 3  0x500080          0x5e93200          0x6670000 sigwait
 5006          1 3      0x80          0x5e92500          0x5eb8000 ttyin
 269           1 3      0x84          0x5e92c00          0x63d6000 nanosleep
 264           1 3      0x80          0x5e92b00          0x607e000 kqread
 208           1 3      0x80          0x5e92900          0x606e000 select
 120           1 3      0x80          0x5e92800          0x606a000 poll
 5             1 3       0x4          0x5e92400          0x5ea6000 aiodoned
 4             1 3       0x4          0x5e92300          0x5ea2000 syncer
 3             1 3       0x4          0x5e92200          0x5e9e000 pgdaemon
 2             1 3       0x4          0x5e92100          0x5e94000 sccomp
 1             1 3      0x84          0x5e92000          0x5e80000 wait
 0             1 3       0x4          0x1890eb8          0x1c02000 scheduler

--------8<--------8<--------

Output from "ps" command:
--------8<--------8<--------
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 9399         20704     2290          0 2  0x4002    1             perl
 20704         6441     2290          0 2  0x4002    1               sh    wait
 6441         11031     2290          0 2  0x4002    1             make    wait
 11031         6651     2290          0 2  0x4002    1               sh    wait
 6651           673     2290          0 2  0x4002    1             make    wait
 673           2290     2290          0 2  0x4002    1               sh    wait
 2290         28478     2290          0 2  0x4002    1             make    wait
 24084        27565     3613       1001 2  0x4502    5           mysqld       *
 27565            1     3613          0 2  0x4002    1               sh    wait
 28478        13482    28478          0 2  0x4002    1              ksh   pause
 13482         9631    13482        100 2  0x4002    1              ksh   pause
 9631         25317    25317        100 2   0x100    1             sshd  select
 25317          208    25317          0 2       0    1             sshd   netio
 27583        24542    24542       1000 2   0x100    1            httpd  netcon
 18151        24542    24542       1000 2   0x100    1            httpd  netcon
 18530        24542    24542       1000 2   0x100    1            httpd  netcon
 23673        24542    24542       1000 2   0x100    1            httpd  netcon
 16125        24542    24542       1000 2   0x100    1            httpd  netcon
 22667        24542    24542       1000 2   0x100    1            httpd  netcon
 17033        24542    24542       1000 2   0x100    1            httpd  netcon
 12776        24542    24542       1000 2   0x100    1            httpd  netcon
 26954        21658    24542       1000 2  0x4000    1             perl  netcon
 21658        24542    24542       1000 2   0x100    1            httpd  select
 24542            1    24542          0 2       0    1            httpd  select
 23086            1    23086          8 2   0x100    1      exim-4.42-1  select
>16856            1    16856         14 2   0x500    4            named       *
 5006             1     5006          0 2  0x4002    1              ksh   ttyin
 269              1      269          0 2       0    1             cron nanosle
 264              1      264          0 2       0    1            inetd  kqread
 208              1      208          0 2       0    1             sshd  select
 120              1      120          0 2       0    1          syslogd    poll
 5                0        0          0 2 0x20200    1         aiodoned aiodone
 4                0        0          0 2 0x20200    1          ioflush  syncer
 3                0        0          0 2 0x20200    1       pagedaemon pgdaemo
 2                0        0          0 2 0x20200    1         scsibus0  sccomp
 1                0        1          0 2  0x4000    1             init    wait
 0               -1        0          0 2 0x20200    1          swapper schedul
--------8<--------8<--------

The "bt" output again, for reference.

--------8<--------8<--------
db> bt
ltsleep(1c64200, 204, 11bc680, 0, 6058bc8, 1891800) at netbsd:ltsleep+0x664
uvmfault_anonget(0, 5eb1ce0, 6058bc0, 11bc400, 1897800, 0) at netbsd:uvmfault_anonget+0x13c
uvm_fault(6000, 42000000, ffffffffffffffff, 6712da0, 1, 3) at netbsd:uvm_fault+0x420
data_access_fault(6712f90, 30, 100b0d4, 41ffe2a1, 11081f, 1814c00) at netbsd:data_access_fault+0x2d4
?(67100a8, 41ffebc0, 78, 6710000, 100b168, 1ca3200) at 0x1008e40
rwindow_save(41ffebc0, 0, 0, 620e050, 3e2a000, 2000) at netbsd:rwindow_save+0xcc
cpu_getmcontext(5e92a00, 620e080, 620e040, 1e74750, 6713440, 1891800) at netbsd:cpu_getmcontext+0x10
sa_upcall_getstate(620e040, 5e92a00, 372, 11b2568, 620e000, 64b2348) at netbsd:sa_upcall_getstate+0x10
sa_upcall0(620e000, 2, 5e92a00, 0, 0, 0) at netbsd:sa_upcall0+0x98
sa_switch(6083960, 6078000, 1ca, 11b84c0, 1, 1891800) at netbsd:sa_switch+0x2a4
ltsleep(212e230, 11, 0, 0, 212e240, 0) at netbsd:ltsleep+0x560
biowait(212e230, 212e230, 3, 11b2568, 67139cc, 1891800) at netbsd:biowait+0x50
uvm_swap_io(0, cfb0, 1, 100000, 1897990, 1ca3200) at netbsd:uvm_swap_io+0x1a8
uvm_swap_get(5, 11bf000, 1, 11bc4a8, 46, 0) at netbsd:uvm_swap_get+0x5c
uvmfault_anonget(6713ca0, 5eb1ce0, 6058bc0, 11bc400, 1897800, 0) at netbsd:uvmfault_anonget+0x5ac
uvm_fault(6000, 42000000, ffffffffffffffff, 6713ce0, 1, 3) at netbsd:uvm_fault+0x420
data_access_fault(6713ed0, 30, 41307250, 41ffe2a1, 11081d, 1814c00) at netbsd:data_access_fault+0x2d4
?(44200000, 44200000, badcafe, badcafe, badcafe, badcafe) at 0x1008e40
ddb_regs(ffffffffffffffff, ffffffffffffffff, ffffffffffffffff,
ffffffffffffffff, ffffffffffffffff, ffffffffffffffff) at 0x4130c688
--------8<--------8<--------

... and the lock complaints while trying to reboot (Reboot hangs,
but serial console is functional enough to break into OFW and
reset the box from there):

--------8<--------8<--------
db> reboot
syncing disks...
simple_lock: lock held
lock: 0x1810af8, currently at: ../../../../kern/kern_synch.c:1159
last locked: ../../../../kern/kern_synch.c:421
last unlocked: ../../../../kern/kern_synch.c:658
hme0: status=30001<GOTFRAME,RXTOHOST,NORXD>

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4

switching with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_fault.c:1034
--------8<--------8<--------

-- 
/ali
:wq