NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/38669: NFS deadlock?
>Number: 38669
>Category: kern
>Synopsis: NFS deadlock?
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 15 20:35:00 +0000 2008
>Originator: Martin Husemann
>Release: NetBSD 4.99.62
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD quadrophenia.duskware.de 4.99.62 NetBSD 4.99.62 (SUNNY.MP) #43:
Thu May 15 20:27:19 CEST 2008
martin%sunny-weather.duskware.de@localhost:/usr/src/sys/arch/sparc64/compile/SUNNY.MP
sparc64
Architecture: sparc64
Machine: sparc64
>Description:
Still trying to hunt down my NFS problems on sparc64 SMP kernels, this time
on a different machine with root on sd0, so the NFS lock does not kill the
machine completely.
Testcase: make -j 8 in a kernel compile directory.
After a short time all activity stops, top shows:
load averages: 0.00, 0.66, 0.82 up 0 days, 0:25 21:50:02
39 processes: 38 sleeping, 1 on CPU
CPU0 states: 1.0% user, 0.0% nice, 3.5% system, 0.0% interrupt, 95.5% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU3 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 128M Act, 7792K Wired, 12M Exec, 42M File, 3838M Free
Swap: 1026M Total, 1026M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
4235 martin 85 0 184K 11M select/2 0:42 0.00% 0.00% make
0 root 125 0 0K 129M schedu/0 0:17 0.00% 0.00% [system]
673 martin 109 0 4504K 27M tstile/1 0:05 0.00% 0.00% cc1
1279 martin 109 0 4440K 24M tstile/0 0:04 0.00% 0.00% cc1
375 martin 43 0 112K 1744K CPU/3 0:01 0.00% 0.00% top
333 martin 85 0 344K 3992K select/2 0:01 0.00% 0.00% sshd
1111 martin 114 0 408K 2232K nfsrcv/0 0:00 0.00% 0.00% as
1209 martin 110 0 168K 1312K tstile/1 0:00 0.00% 0.00% sh
5075 martin 109 0 408K 2232K tstile/3 0:00 0.00% 0.00% as
5138 martin 85 0 4416K 5856K netio/0 0:00 0.00% 0.00% cc1
221 root 85 0 920K 5600K pause/0 0:00 0.00% 0.00% ntpd
357 root 85 0 344K 5008K netio/1 0:00 0.00% 0.00% sshd
340 root 85 0 344K 5008K netio/1 0:00 0.00% 0.00% sshd
(yes, userland is slightly older)
Breaking into ddb I got this:
db{0}> ps/w
PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
5075 1 as netbsd 27 tstile 11959f90
1209 1 sh netbsd 29 tstile 11959f90
1111 1 as netbsd 27 nfsrcv 11839828
4784 1 as netbsd 28 nfsrcv 11839828
5138 1 cc1 netbsd 43 netio 5bb1e60
4904 1 cc netbsd 28 wait 12539cc0
4966 1 sh netbsd 27 wait 1265ad70
5118 1 cc1 netbsd 28 nfsrcv 11839828
5097 1 cc netbsd 27 wait 1265ba40
5086 1 sh netbsd 27 wait 11d94fb0
1279 1 cc1 netbsd 27 tstile 11959f90
1149 1 cc netbsd 27 wait 12538ff0
5070 1 sh netbsd 27 wait 12502d40
673 1 cc1 netbsd 27 tstile 118ec1c0
4880 1 cc netbsd 27 wait 117cc2d0
5048 1 sh netbsd 27 wait 1265b000
4783 1 cc netbsd 27 wait 127c9ce0
5106 1 sh netbsd 27 wait 1265bcd0
672 1 cc netbsd 27 wait 1265aae0
999 1 sh netbsd 28 wait 12502070
5099 1 cc netbsd 27 wait 1265a330
Any hints how to debug this further are welcome
Martin
>Fix:
Unknown
Home |
Main Index |
Thread Index |
Old Index