Port-powerpc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

locking problems since 7.0?



Hi,

I'm running two sandpoint NAS servers 24h and repeatedly observed two
problems since upgrading them to 7.0:

1. Sometimes /dev/null turns into a regular file with size 0 during reboot,
so running multiuser fails and I have to attach a serial console to fix the
problem.

2. Sometimes the file system locks up over night, usually during /etc/daily.
I can still ping the system, but everything accessing the file system, like
a log in, locks up.


For the first case I inserted some debugging output in /etc, but only to
find out that /dev/null is still correct shortly before the reboot/halt,
but is changed into a regular when the system comes up again.


For the second case I have established a constant serial connection with a
second server, so I can drop into ddb(4) and analyze the problem when it
happens. I also equipped one of the sandpoint servers with a LOCKDEBUG
kernel (no problem since then).

Attached you find output from a ddb-session after such a lock-up. It seems
like it happened while executing find(1) during /etc/daily. Many processes,
like find, are in "biowait", waiting for "softbio", which is stuck in
"tstile" and cannot handle softbio interrupts.

Anybody seen similar problems since 7.0, or has any idea how to debug or fix
that?

-- 
Frank Wille
db> ps  
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
1997     1 3   0         0            2be7a00               find biowait
20743    1 3   0        80            1fd2300           postdrop netio  
15398    1 3   0        80            23c16e0           sendmail pipe_rd
17386    1 3   0        80            1fd25c0                tee pipe_rd
3380     1 3   0        80            3c3e0c0                 sh wait   
12519    1 3   0        80            2be7480                 sh wait
19166    1 3   0        80            38ee940               cron pipe_rd
11696    5 3   0         0            3b5e0e0              slave tstile 
11696    4 3   0         0            38ee3c0              slave tstile
11696    3 3   0         0            23c1c60              slave biowait
11696    2 3   0         0             c0b1a0              slave tstile 
11696    1 3   0        80            3b5e3a0             master select
15768    1 3   0        80            31c6c20          rpc.lockd select
7473     1 3   0        80            1fd2040          rpc.statd select
2898     1 3   0        80            38ee100              dhcpd select
134      1 3   0         0            3edc0a0              getty biowait
700      1 3   0        80            2c05980              httpd socket 
719      1 3   0        80            3c3e380              httpd socket
725      1 3   0        80            3c3e900              httpd socket
899      1 3   0        80            3c3e640              httpd socket
894      1 3   0        80            3c3ebc0              httpd socket
651      1 3   0        80             c0b720              httpd select
660     11 3   0        80             c0b9e0             mysqld sigwait
660     10 3   0        80            122a700             mysqld parked 
660      9 3   0        80             c0bca0             mysqld select
660      8 3   0        80            122a180             mysqld select
660      7 3   0        80            122a440             mysqld select
660      5 3   0        80            122a9c0             mysqld parked
660      4 3   0        80            122ac80             mysqld parked
660      3 3   0        80            23c1420             mysqld parked
660      2 3   0        80            23c1160             mysqld parked
660      1 3   0        80            2c05140             mysqld select
603      1 3   0         0            23c19a0               cron biowait
543      1 3   0   1000000            2c05400               qmgr tstile 
562      1 3   0        80            2c05c40              inetd kqueue
601      1 3   0        80            2c056c0                 sh wait  
602      1 3   0   1000000            31c63e0             master biowait
356      1 3   0         0            31c6120               sshd biowait
221      1 3   0        80            3b5ebe0             mountd select 
173      1 3   0         0            3b5e920            rpcbind biowait
150      1 3   0        80            38eec00            syslogd kqueue 
1        1 3   0        80            3edd340               init wait  
0       44 5   0       200            2be7740           (zombie)     
0       41 3   0       200            3b5e660            physiod physiod
0       40 3   0       200            3edc360           aiodoned aiodoned
0       39 3   0       200            3eddb80            ioflush biolock 
0       38 3   0       200            3edc620           pgdaemon pgdaemon
0       35 3   0       200            3f39060               usb2 usbevt  
0       34 3   0       200            3f39320               usb1 usbevt
0       33 3   0       200            3edd8c0         usbtask-dr usbtsk
0       32 3   0       200            3edd600         usbtask-hc usbtsk
0       31 3   0       200            3edc8e0               usb0 usbevt
0       30 3   0       200            3edd080              unpgc unpgc 
0       29 3   0       200            3edcba0        vmem_rehash vmem_rehash
0       19 3   0       200            3f395e0            atabus1 atath      
0       18 3   0       200            3f398a0            atabus0 atath
0       17 3   0       200            3f39b60             sysmon smtaskq
0       16 3   0       200            3f4e040               iic0 iicintr
0       15 3   0       200            3f4e300         pmfsuspend pmfsuspend
0       14 3   0       200            3f4e5c0           pmfevent pmfevent  
0       13 3   0       200            3f4e880         sopendfree sopendfr
0       12 3   0       200            3f4eb40           nfssilly nfssilly
0       11 3   0       200            3f68020            cachegc cachegc 
0       10 3   0       200            3f682e0              vrele vrele  
0        9 3   0       200            3f685a0             vdrain tstile
0        8 3   0       200            3f68860          modunload mod_unld
0        7 3   0       200            3f68b20            xcall/0 xcall   
0       32 3   0       200            3edd600         usbtask-hc usbtsk
0       31 3   0       200            3edc8e0               usb0 usbevt
0       30 3   0       200            3edd080              unpgc unpgc 
0       29 3   0       200            3edcba0        vmem_rehash vmem_rehash
0       19 3   0       200            3f395e0            atabus1 atath      
0       18 3   0       200            3f398a0            atabus0 atath
0       17 3   0       200            3f39b60             sysmon smtaskq
0       16 3   0       200            3f4e040               iic0 iicintr
0       15 3   0       200            3f4e300         pmfsuspend pmfsuspend
0       14 3   0       200            3f4e5c0           pmfevent pmfevent  
0       13 3   0       200            3f4e880         sopendfree sopendfr
0       12 3   0       200            3f4eb40           nfssilly nfssilly
0       11 3   0       200            3f68020            cachegc cachegc 
0       10 3   0       200            3f682e0              vrele vrele  
0        9 3   0       200            3f685a0             vdrain tstile
0        8 3   0       200            3f68860          modunload mod_unld
0        7 3   0       200            3f68b20            xcall/0 xcall   
0        6 1   0       200            3f6c000          softser/0      
0        5 1   0       200            3f6c2c0          softclk/0
0        4 3   0       200            3f6c580          softbio/0 tstile
0        3 1   0       200            3f6c840          softnet/0       
0    >   2 7   0       201            3f6cb00             idle/0
0        1 3   0       200             304220            swapper uvm

db> show event                       
evcnt type 0: vmcmd kills = 648284
evcnt type 0: vmcmd calls = 4772700
evcnt type 0: vmemev bt_inuse = 240
evcnt type 0: vmemev bt_count = 1660
evcnt type 0: vmemev bt_pages = 10  
evcnt type 0: pdpolicy reactanon = 29378421
evcnt type 0: pdpolicy reactfile = 2170945 
evcnt type 0: pdpolicy reactexec = 58945706
evcnt type 0: softint net/0 = 6731232      
evcnt type 0: softint net block/0 = 635
evcnt type 0: softint bio/0 = 535231551
evcnt type 0: softint bio block/0 = 1607207
evcnt type 0: softint clk/0 = 677594406    
evcnt type 0: softint clk block/0 = 93 
evcnt type 0: softint ser/0 = 4033653 
evcnt type 0: callout late/0 = 1047121
evcnt type 0: crosscall unicast = 2   
evcnt type 0: crosscall broadcast = 253866
evcnt type 0: namecache entries scanned = 4954107613
evcnt type 0: namecache entries collected = 4286155112
evcnt type 0: namecache over scan target = 5467673    
evcnt type 0: namecache under scan target = 38964477
evcnt type 0: namecache forced reclaims = 16177057  
evcnt type 1: cpu0 clock = 2825668254             
evcnt type 2: cpu0 traps = 309712409 
evcnt type 2: cpu0 kernel DSI traps = 78352599
evcnt type 2: cpu0 user DSI traps = 102150711 
evcnt type 2: cpu0 user ISI traps = 46654819 
evcnt type 2: cpu0 system call traps = 10299863570
evcnt type 2: cpu0 PGM traps = 424                
evcnt type 2: cpu0 FPU unavailable traps = 14043008
evcnt type 2: cpu0 FPU context switches = 12724359 
evcnt type 1: mpcpic irq 40 = 3695739             
evcnt type 1: mpcpic irq 41 = 366925 
evcnt type 1: mpcpic irq 16 = 549630106
evcnt type 1: mpcpic irq 17 = 222024   
evcnt type 1: mpcpic irq 18 = 9479612

"find" process:
db> show proc 0t1997                                                   
find: pid 1997 proc 3a46398 vmspace/map 3e881e0 flags 4000
  lwp 1 2be7a00 pcb d2430000                              
    stat 3 flags 0 cpu 0 pri 42
    wmesg biowait wchan 329be34
db> trace/t 0t1997             
trace: pid 1997 lid 1 at 0xd2433b70
0xd2433bd0: at cpu_switchto+0x30   
0xd2433be0: at mi_switch+0x17c  
0xd2433c20: at sleepq_block+0xe0
0xd2433c40: at cv_wait+0x9c     
0xd2433c60: at biowait+0x38
0xd2433c70: at bread+0x30  
0xd2433c90: at ffs_read+0x40c
0xd2433d10: at VOP_READ+0x3c 
0xd2433d40: at ufs_readdir+0x190
0xd2433dd0: at VOP_READDIR+0x44 
0xd2433e00: at vn_readdir+0x128
0xd2433e90: at sys___getdents30+0x98
0xd2433ec0: at syscall_plain+0x1cc  
0xd2433f20: user SC trap #390 by 0xfdf231cc: srr1=0xd032
            r1=0xffffdb10 cr=0x28224042 xer=0x20000000 ctr=0xfdf231c4


db> show sched_qs                                                      
	FL: flags
#define LW_IDLE         0x00000001 /* Idle lwp. */
#define LW_LWPCTL       0x00000002 /* Adjust lwpctl in userret */
#define LW_SINTR        0x00000080 /* Sleep is interruptible. */ 
#define LW_SYSTEM       0x00000200 /* Kernel thread */
#define LW_WSUSPEND     0x00020000 /* Suspend before return to user */
#define LW_BATCH        0x00040000 /* LWP tends to hog CPU */
#define LW_WCORE        0x00080000 /* Stop for core dump on return to user
#*/
#define LW_WEXIT        0x00100000 /* Exit before return to user */          
#define LW_PENDSIG      0x01000000 /* Pending signal for us */     
#define LW_CANCELLED    0x02000000 /* tsleep should not sleep */
#define LW_WREBOOT      0x08000000 /* System is rebooting, please suspend */
#define LW_UNPARKED     0x10000000 /* Unpark op pending */
#define LW_RUMP_CLEAR   0x40000000 /* Clear curlwp in RUMP scheduler */
#define LW_RUMP_QEXIT   0x80000000 /* LWP should exit ASAP */
	ST: RQ(run queue), SQ(sleep squeue)
	TCI: target cpu (not used)
	LRTICKS: ticks since last run

Run-queue (CPU = 0):
 pid.lid = 0.2, r_count = 0, r_avgcount = 0, maxpri = 0, mlwp = 0x0
 bitmap[6] => [ 0 (0x0) ]                                          
 bitmap[5] => [ 0 (0x0) ]
 bitmap[4] => [ 0 (0x0) ]
 bitmap[3] => [ 0 (0x0) ]
 bitmap[2] => [ 0 (0x0) ]
 bitmap[1] => [ 0 (0x0) ]
 bitmap[0] => [ 0 (0x0) ]
     LID  PRI EPRI         FL  ST                LWP  CPU  TCI LRTICKS
 /- 1997 (find)                                                       
 |     1   42   85 0x00000000  SQ          0x2be7a00    0   -1 48189742
 /- 20743 (postdrop)                                                   
 |     1   43   85 0x00000080  SQ          0x1fd2300    0   -1 48483584
 /- 15398 (sendmail)                                                   
 |     1   43   85 0x00000080  SQ          0x23c16e0    0   -1 48483584
 /- 17386 (tee)                                                        
 |     1   43   85 0x00000080  SQ          0x1fd25c0    0   -1 48483595
 /- 3380 (sh)                                                          
 |     1   43   85 0x00000080  SQ          0x3c3e0c0    0   -1 48483578
 /- 12519 (sh)                                                         
 |     1   43   85 0x00000080  SQ          0x2be7480    0   -1 48483638
 /- 19166 (cron)                                                       
 |     1   43   85 0x00000080  SQ          0x38ee940    0   -1 48483639
 /- 11696 (nfsd)                                                       
 |     5   43  117 0x00000000  SQ          0x3b5e0e0    0   -1 31202903
 |     4   43  117 0x00000000  SQ          0x38ee3c0    0   -1 39836233
 |     3   43  117 0x00000000  SQ          0x23c1c60    0   -1 39843211
 |     2   43  117 0x00000000  SQ           0xc0b1a0    0   -1 39829343
 |     1   43   85 0x00000080  SQ          0x3b5e3a0    0   -1 52772301
 /- 15768 (rpc.lockd)                                                  
 |     1   43   85 0x00000080  SQ          0x31c6c20    0   -1 352
 /- 7473 (rpc.statd)                                              
 |     1   43   85 0x00000080  SQ          0x1fd2040    0   -1 1417
 /- 2898 (dhcpd)                                                   
 |     1   43   85 0x00000080  SQ          0x38ee100    0   -1 51632865
 /- 134 (getty)                                                        
 |     1   43   85 0x00000000  SQ          0x3edc0a0    0   -1 189797
 /- 700 (httpd)                                                      
 |     1   38   83 0x00000080  SQ          0x2c05980    0   -1 2825662762
 /- 719 (httpd)                                                          
 |     1   38   83 0x00000080  SQ          0x3c3e380    0   -1 2825662763
 /- 725 (httpd)                                                          
 |     1   38   83 0x00000080  SQ          0x3c3e900    0   -1 2825662765
 /- 899 (httpd)
 |     1   38   83 0x00000080  SQ          0x3c3e640    0   -1 2825662761
 /- 894 (httpd)                                                          
 |     1   38   83 0x00000080  SQ          0x3c3ebc0    0   -1 2825662766
 /- 651 (httpd)                                                          
 |     1   43   85 0x00000080  SQ           0xc0b720    0   -1 75
 /- 660 (mysqld)                                                 
 |    11   40   84 0x00000080  SQ           0xc0b9e0    0   -1 2825663109
 |    10   40   40 0x00000080  SQ          0x122a700    0   -1 2825663205
 |     9   43   85 0x00000080  SQ           0xc0bca0    0   -1 303       
 |     8   43   85 0x00000080  SQ          0x122a180    0   -1 86 
 |     7   43   85 0x00000080  SQ          0x122a440    0   -1 57
 |     5   40   40 0x00000080  SQ          0x122a9c0    0   -1 2825663375
 |     4   40   40 0x00000080  SQ          0x122ac80    0   -1 2825663313
 |     3   40   40 0x00000080  SQ          0x23c1420    0   -1 2825663310
 |     2   40   40 0x00000080  SQ          0x23c1160    0   -1 2825663375
 |     1   40   84 0x00000080  SQ          0x2c05140    0   -1 2825663247
 /- 603 (cron)                                                           
 |     1   43   85 0x00000000  SQ          0x23c19a0    0   -1 48189700
 /- 543 (qmgr)                                                         
 |     1   43  117 0x01000000  SQ          0x2c05400    0   -1 48183652
 /- 562 (inetd)                                                        
 |     1   43   85 0x00000080  SQ          0x2c05c40    0   -1 121562489
 /- 601 (sh)                                                            
 |     1   41   84 0x00000080  SQ          0x2c056c0    0   -1 2825663535
 /- 602 (master)                                                         
 |     1   43  117 0x01000000  SQ          0x31c63e0    0   -1 48188711
 /- 356 (sshd)                                                         
 |     1   43   85 0x00000000  SQ          0x31c6120    0   -1 233420
 /- 221 (mountd)                                                     
 |     1   43   85 0x00000080  SQ          0x3b5ebe0    0   -1 2488
 /- 173 (rpcbind)                                                  
 |     1   43   85 0x00000000  SQ          0x3b5e920    0   -1 10864375
 /- 150 (syslogd)                                                      
 |     1   43   85 0x00000080  SQ          0x38eec00    0   -1 1740
 /- 1 (init)                                                       
 |     1   43   85 0x00000080  SQ          0x3edd340    0   -1 51497003
 /- 0 (system)                                                         
 |    44   96   96 0x00000200   -          0x2be7740    0   -1 2184306522
 |    41  123  123 0x00000200  SQ          0x3b5e660    0   -1 56346222  
 |    40  125  125 0x00000200  SQ          0x3edc360    0   -1 48210347
 |    39  124  221 0x00000200  SQ          0x3eddb80    0   -1 48189743
 |    38  126  126 0x00000200  SQ          0x3edc620    0   -1 48453627
 |    35   96   96 0x00000200  SQ          0x3f39060    0   -1 4264    
 |    34   96   96 0x00000200  SQ          0x3f39320    0   -1 809 
 |    33   96   96 0x00000200  SQ          0x3edd8c0    0   -1 2825668203
 |    32   96   96 0x00000200  SQ          0x3edd600    0   -1 2825668203
 |    32   96   96 0x00000200  SQ          0x3edd600    0   -1 2825668203
 |    31   96   96 0x00000200  SQ          0x3edc8e0    0   -1 809       
 |    30   96   96 0x00000200  SQ          0x3edd080    0   -1 2825668224
 |    29  125  125 0x00000200  SQ          0x3edcba0    0   -1 775       
 |    19   96   96 0x00000200  SQ          0x3f395e0    0   -1 2825668220
 |    18   96   96 0x00000200  SQ          0x3f398a0    0   -1 2825667916
 |    17   96   96 0x00000200  SQ          0x3f39b60    0   -1 2825668254
 |    16   96   96 0x00000200  SQ          0x3f4e040    0   -1 78        
 |    15   96   96 0x00000200  SQ          0x3f4e300    0   -1 2825668254
 |    14   96   96 0x00000200  SQ          0x3f4e5c0    0   -1 2825668254
 |    13   96   96 0x00000200  SQ          0x3f4e880    0   -1 51596332  
 |    12   96   96 0x00000200  SQ          0x3f4eb40    0   -1 2825668254
 |    11  125  125 0x00000200  SQ          0x3f68020    0   -1 78        
 |    10  125  125 0x00000200  SQ          0x3f682e0    0   -1 78
 |     9  125  221 0x00000200  SQ          0x3f685a0    0   -1 48189643
 |     8  125  125 0x00000200  SQ          0x3f68860    0   -1 52740406
 |     7  127  127 0x00000200  SQ          0x3f68b20    0   -1 52124382
 |     6  223  223 0x00000200   -          0x3f6c000    0   -1 2825668254
 |     5  220  220 0x00000200   -          0x3f6c2c0    0   -1 52256642  
 |     4  221  221 0x00000200  SQ          0x3f6c580    0   -1 48189861
 |     3  222  222 0x00000200   -          0x3f6c840    0   -1 51649303
 |     2    0    0 0x00000201   -          0x3f6cb00    0   -1 2825668254
 |     1  125  125 0x00000200  SQ           0x304220    0   -1 78        


db> show callout                                    
hardclock_ticks now: -1469299042
    ticks  wheel               arg  func
        1  0/159                 0  pfslowtimo
        5  0/163                 0  pffasttimo
       15  0/173           122a180  sleepq_timeout
       22  0/180           3f68020  sleepq_timeout
       22  0/180            304220  sleepq_timeout
       22  0/180           3f682e0  sleepq_timeout
       22  0/180           3f4e040  sleepq_timeout
       26  0/184            c0b720  sleepq_timeout
       34  0/192                 0  if_slowtimo   
       34  0/192                 0  nd6_timer  
       34  0/192                 0  rt_timer_timer
       44  0/202           122a440  sleepq_timeout
       83  0/241                 0  rnd_skew      
      225  1/327                 0  vmem_rehash_all_kick
      198  1/327            c0bca0  sleepq_timeout      
      513  1/328           3b5ebe0  sleepq_timeout
     1261  1/331           38eec00  sleepq_timeout
     1584  1/332           1fd2040  sleepq_timeout
     1736  1/333           3f39060  sleepq_timeout
     2649  1/336           31c6c20  sleepq_timeout
     2716  1/337                 0  arptimer      
     5191  1/346           3f39320  sleepq_timeout
     5191  1/346           3edc8e0  sleepq_timeout
   331773  2/625                 0  nd6_slowtimo  
   490604  2/627           223cbd8  tcp_timer_keep
   487128  2/627           223cdd0  tcp_timer_keep
   480654  2/627           30e9008  tcp_timer_keep
  5665767  2/706                 0  in6_tmpaddrtimer


db> show uvm                                        
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1  15062 VM pages: 4367 active, 231 inactive, 2 wired, 4454 free
  pages  2025 anon, 231 file, 2353 exec
  freemin=75, free-target=100, wired-max=5020
  cpu0:                                      
    faults=227175182, traps=309712409, intrs=3379844322, ctxswitch=1491617966
    softint=1223590842, syscalls=10299863570
  fault counts:                             
    noram=33, noanon=0, pgwait=1653, pgrele=0
    ok relocks(total)=48976745(48976768), anget(retrys)=16227774(2146116), amapcopy=12790303
    neighbor anon/obj pg=14312066/159921221, gets(lock/unlock)=93450491/46830654
    cases: anon=9920567, anoncow=6150518, obj=79370017, prcopy=14080449, przero=44485986
  daemon and swap counts:
    woke=403132, revs=401151, scans=235944725, obscans=131324495, anscans=2648851
    busy=697292, freed=133973344, reactivate=10779015, deactivate=238086308
    pageouts=126077, pending=1788090, nswget=2147440                       
    nswapdev=1, swpgavail=262205
    swpages=262205, swpginuse=5126, swpgonly=4517, paging=0


Home | Main Index | Thread Index | Old Index