NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/38669: NFS deadlock?



>Number:         38669
>Category:       kern
>Synopsis:       NFS deadlock?
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 15 20:35:00 +0000 2008
>Originator:     Martin Husemann
>Release:        NetBSD 4.99.62
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD quadrophenia.duskware.de 4.99.62 NetBSD 4.99.62 (SUNNY.MP) #43: 
Thu May 15 20:27:19 CEST 2008 
martin%sunny-weather.duskware.de@localhost:/usr/src/sys/arch/sparc64/compile/SUNNY.MP
 sparc64
Architecture: sparc64
Machine: sparc64
>Description:

Still trying to hunt down my NFS problems on sparc64 SMP kernels, this time 
on a different machine with root on sd0, so the NFS lock does not kill the
machine completely.

Testcase: make -j 8 in a kernel compile directory.

After a short time all activity stops, top shows:

load averages:  0.00,  0.66,  0.82                  up 0 days,  0:25   21:50:02
39 processes:  38 sleeping, 1 on CPU
CPU0 states:  1.0% user,  0.0% nice,  3.5% system,  0.0% interrupt, 95.5% idle
CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU3 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Memory: 128M Act, 7792K Wired, 12M Exec, 42M File, 3838M Free
Swap: 1026M Total, 1026M Free

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
 4235 martin    85    0   184K   11M select/2   0:42  0.00%  0.00% make
    0 root     125    0     0K  129M schedu/0   0:17  0.00%  0.00% [system]
  673 martin   109    0  4504K   27M tstile/1   0:05  0.00%  0.00% cc1
 1279 martin   109    0  4440K   24M tstile/0   0:04  0.00%  0.00% cc1
  375 martin    43    0   112K 1744K CPU/3      0:01  0.00%  0.00% top
  333 martin    85    0   344K 3992K select/2   0:01  0.00%  0.00% sshd
 1111 martin   114    0   408K 2232K nfsrcv/0   0:00  0.00%  0.00% as
 1209 martin   110    0   168K 1312K tstile/1   0:00  0.00%  0.00% sh
 5075 martin   109    0   408K 2232K tstile/3   0:00  0.00%  0.00% as
 5138 martin    85    0  4416K 5856K netio/0    0:00  0.00%  0.00% cc1
  221 root      85    0   920K 5600K pause/0    0:00  0.00%  0.00% ntpd
  357 root      85    0   344K 5008K netio/1    0:00  0.00%  0.00% sshd
  340 root      85    0   344K 5008K netio/1    0:00  0.00%  0.00% sshd

(yes, userland is slightly older)

Breaking into ddb I got this:

db{0}> ps/w                                              
 PID        LID          COMMAND     EMUL  PRI WAIT-MSG    WAIT-CHANNEL
 5075         1               as   netbsd   27 tstile       11959f90   
 1209         1               sh   netbsd   29 tstile       11959f90
 1111         1               as   netbsd   27 nfsrcv       11839828
 4784         1               as   netbsd   28 nfsrcv       11839828
 5138         1              cc1   netbsd   43 netio        5bb1e60 
 4904         1               cc   netbsd   28 wait         12539cc0
 4966         1               sh   netbsd   27 wait         1265ad70
 5118         1              cc1   netbsd   28 nfsrcv       11839828
 5097         1               cc   netbsd   27 wait         1265ba40
 5086         1               sh   netbsd   27 wait         11d94fb0
 1279         1              cc1   netbsd   27 tstile       11959f90
 1149         1               cc   netbsd   27 wait         12538ff0
 5070         1               sh   netbsd   27 wait         12502d40
 673          1              cc1   netbsd   27 tstile       118ec1c0
 4880         1               cc   netbsd   27 wait         117cc2d0
 5048         1               sh   netbsd   27 wait         1265b000
 4783         1               cc   netbsd   27 wait         127c9ce0
 5106         1               sh   netbsd   27 wait         1265bcd0
 672          1               cc   netbsd   27 wait         1265aae0
 999          1               sh   netbsd   28 wait         12502070
 5099         1               cc   netbsd   27 wait         1265a330

Any hints how to debug this further are welcome

Martin

>Fix:

Unknown


Home | Main Index | Thread Index | Old Index