tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

So it seems "umount -f /nfs/mount" still doesn't work.....



So I've been meaning to test crash(8) for some time now, but had never
found the occasion.

Well today I found a good reason.  I've been re-arranging some servers
and ended up leaving some NFS mounts in place when I should not have.
This of course causes no end of problems if the nightly cron jobs try to
run as various processes can get stuck, e.g. the "find"s that run.  The
good news is that it is now possible to kill these processes (whereas in
older releases they were forever unkillable despite using interruptable
mounts).  However "umount -f /nfs/mount" is still not working.  It
appears, from "ps", that it gets stuck in a socket call.

# umount -f /future/build
[ 2436610.5372572] load: 0.88  cmd: umount 2940 [tstile] 0.00u 0.00s 0% 236k
[ 2436611.6273278] load: 0.88  cmd: umount 2940 [tstile] 0.00u 0.00s 0% 236k

.... here I killed the stuck find processes from another window ....

[ 2436643.7493935] load: 1.03  cmd: umount 2940 [nfscn2] 0.00u 0.00s 0% 236k
[ 2436645.2894833] load: 1.03  cmd: umount 2940 [nfscn2] 0.00u 0.00s 0% 236k
[ 2436788.9486762] load: 1.00  cmd: umount 2940 [nfscn2] 0.00u 0.01s 0% 236k
[ 2436794.2290140] load: 1.00  cmd: umount 2940 [nfscn2] 0.00u 0.01s 0% 236k
[ 2437106.0289114] load: 1.04  cmd: umount 2940 [nfscn2] 0.00u 0.02s 0% 236k
[ 2441119.0864077] load: 1.00  cmd: umount 2940 [nfscn2] 0.00u 0.14s 0% 236k
[ 2441394.6442117] load: 1.08  cmd: umount 2940 [nfscn2] 0.00u 0.16s 0% 236k


Note also that fstat(1) is entirely useless at finding those stuck
processes as it just gets stuck itself:

# fstat /future/build
[ 2436819.2006111] load: 1.22  cmd: fstat 6722 [tstile] 0.00u 0.00s 0% 440k
[ 2436820.8607171] load: 1.22  cmd: fstat 6722 [tstile] 0.00u 0.00s 0% 440k


Note as you'll see below this particular example is from a
not-very-current current system, but I see the exact same behaviour on a
stock 9.0 and on a much more current 9.99.64.

So, let's see if we can figure out why "umount -f" isn't forcing things:


12:00 [1.60] # ps alx
  UID   PID  PPID  CPU PRI NI   VSZ    RSS WCHAN   STAT TTY           TIME COMMAND
    0     0     0    0 125  0     0 159628 uvm     DKl  ?       1013:51.76 [system]
    0     1     0  671  85  0  3968    472 wait    Is   ?          0:00.01 init 
32767   187     1    0  85  0  3060   1272 select  Ss   ?          1:05.62 /usr/sbin/mdnsd 
    0   249     1    0  85  0  8732   1216 kqueue  Ss   ?          0:06.69 /usr/sbin/syslogd -s 
    0   283     1    0  85  0 26552   1200 nanoslp Ss   ?          0:08.00 /usr/pkg/sbin/xenwatchdogd 30 15 
   19   528     1    0  85  0  5256    480 select  Ss   ?          1:38.50 /usr/sbin/rwhod -u _rwhod 
    0   563     1    0  85  0 46832   1944 select  Isl  ?          0:00.00 /usr/pkg/sbin/xl create /etc/xen/more.conf 
    0   574     1    0  85  0 26552    984 nanoslp Ss   ?          0:07.99 /usr/pkg/sbin/xenwatchdogd 30 15 
    0   591     1    0  85  0 33000   1864 select  Sl   ?         63:21.44 /usr/pkg/sbin/xenconsoled --log=guest 
   12   595   913    0  85  0  6216   1228 kqueue  I    ?          0:02.83 qmgr -l -t unix -u 
    0   599     1    0  85  0  9504   8712 pause   Ss   ?          2:51.72 /usr/sbin/ntpd -p /var/run/ntpd.pid -g 
 1000   609  1906    0  85  0 13660   7076 select  Is   ?          0:00.81 xterm -class UXTerm 
    0   611     1   20  85  0  2344    316 kqueue  Is   ?          0:00.00 /usr/sbin/powerd 
    0   622     1    4  85  0  6756   1076 select  Is   ?          0:00.00 /usr/sbin/sshd 
    0   650     1    0  85  0 31720   1884 select  I    ?          0:08.23 /usr/pkg/sbin/xenstored --pid-file /var/run/xenstored.pi
    0   658     1    0  85  0 26512   3044 select  Is   ?          0:00.10 conserver: master: port 782, 1 local, 0 remote 
    0   913     1    0  85  0  2788    864 kqueue  Is   ?          0:11.68 /usr/libexec/postfix/master -w 
    0   997     1    0  85  0  2544    620 nanoslp Is   ?          0:05.55 /usr/sbin/cron 
 1000  1002 13272  174  85  0 21204   4344 select  Ss   ?          0:00.15 xterm -class UXTerm 
    0  1004     1    0  85  0  2796    864 kqueue  Ss   ?          0:00.02 /usr/sbin/inetd -l 
    0  1906  1004    0  85  0  2624    684 select  I    ?          0:00.01 rshd -L 
    0  3812  1004    0  85  0  2624    684 select  I    ?          0:00.01 rshd -L 
 1000  4007  3812    0  85  0 26472  20676 select  Is   ?          0:14.22 xterm -class UXTerm 
    0  5999  1004    0  85  0  2620    648 select  I    ?          0:00.01 rshd -L 
    0  8861   658    0  85  0 26512   3124 select  S    ?          0:04.76 conserver: group 1: port 65506, 1 console 
 1000  9003 16131    0  85  0 13648   1680 select  Ss   ?          0:02.53 xload 
    0 13272  1004  699  85  0  6124    652 select  S    ?          0:00.01 rshd -L 
 1000 14064  5999    0  85  0 21428   7896 select  Ss   ?          0:00.38 xterm -class UXTerm 
    0 15270  1004    0  85  0  6028    652 select  I    ?          0:00.01 rshd -L 
   12 16008   913    0  85  0  7860   1140 kqueue  I    ?          0:00.01 pickup -l -t unix -u 
    0 16131  1004    0  85  0  2620    644 select  I    ?          0:00.01 rshd -L 
 1000 20090 15270    0  85  0 25056   7028 select  Is   ?          0:00.24 xterm -class UXTerm 
 1000  1768   609    0  85  0 11708   1056 ttyraw  Is+  pts/1      0:00.09 -ksh 
    0  2940  7584    0 117  0 17264    236 nfscn2  D+   pts/2      0:00.14 umount -f /future/build 
 1000  4103  4007    0  85  0  4120   1064 pause   Is   pts/2      0:00.09 -ksh 
    0  7584  4103    0  85  0  7656   1088 pause   I    pts/2      0:00.35 ksh 
    0  6722 27639    0 127  0 16768    440 tstile  D+   pts/3      0:00.00 fstat /future/build 
 1000 21172 14064  489  85  0  3728   1060 pause   Is   pts/3      0:00.09 -ksh 
    0 27639 21172    0  85  0  9648   1048 pause   I    pts/3      0:00.08 ksh 
 1000 13000 20090  722  85  0  3600   1056 ttyraw  Is+  pts/4      0:00.11 -ksh 
    0  3707 19523 1057  85  0 11736   1044 pause   S    pts/5      0:00.08 ksh 
    0  4176  3707 1057  43  0 12900    624 -       O+   pts/5      0:00.00 ps -alx 
 1000 19523  1002 3550  85  0  3188   1056 pause   Ss   pts/5      0:00.09 -ksh 
    0  1013     1  527  85  0  2660    652 ttyraw  Is+  ttyE0      0:00.08 -ksh 
    0   822     1  126  85  0  2524    412 ttyraw  Is+  ttyE1      0:00.00 /usr/libexec/getty Ws ttyE1 
    0   828     1  126  85  0  2524    416 ttyraw  Is+  ttyE2      0:00.00 /usr/libexec/getty Ws ttyE2 
    0   957     1  126  85  0  2528    412 ttyraw  Is+  ttyE3      0:00.00 /usr/libexec/getty Ws ttyE3 
    0   862     1  126  85  0  4188    408 ttyraw  Is+  ttyE4      0:00.00 /usr/libexec/getty Ws ttyE4 
    0  1023     1  126  85  0  2524    424 ttyraw  Is+  ttyE5      0:00.00 /usr/libexec/getty Ws ttyE5 
    0  1050     1  126  85  0  2524    428 ttyraw  Is+  ttyE6      0:00.00 /usr/libexec/getty Ws ttyE6 
    0   668     1    0  85  0  2528    416 ttyraw  Is+  xencons    0:00.01 /usr/libexec/getty console constty 
12:00 [1.61] # crash
Crash version 8.99.32, image version 8.99.32.
Output from a running system is unreliable.
crash> bt /t 0t2940
trace: pid 2940 lid 1 at 0xffffaf808a4748f0
sleepq_block() at sleepq_block+0xfd
kpause() at kpause+0xdf
nfs_reconnect() at nfs_reconnect+0x8b
nfs_request() at nfs_request+0xf3a
nfs_getattr() at nfs_getattr+0x175
VOP_GETATTR() at VOP_GETATTR+0x49
vn_stat() at vn_stat+0x3d
do_sys_statat() at do_sys_statat+0x97
sys___lstat50() at sys___lstat50+0x25
syscall() at syscall+0x9c
--- syscall (number 441) ---
43292a:
crash> bt /t 0t6722
trace: pid 6722 lid 1 at 0xffffaf808a488920
sleepq_block() at sleepq_block+0x99
turnstile_block() at turnstile_block+0x337
rw_vector_enter() at rw_vector_enter+0x169
genfs_lock() at genfs_lock+0x3c
VOP_LOCK() at VOP_LOCK+0x71
vn_lock() at vn_lock+0x90
nfs_root() at nfs_root+0x2b
lookup_once() at lookup_once+0x38e
namei_tryemulroot() at namei_tryemulroot+0x453
namei() at namei+0x29
fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54
do_sys_statat() at do_sys_statat+0x87
sys___stat50() at sys___stat50+0x28
syscall() at syscall+0x9c
--- syscall (number 439) ---
43c94a:
crash> 


So it would seem that even though umount is trying to force an unmount
of an NFS mount, the kernel is first trying to reconnect to the server!


BTW, I have another system running a quite recent i386 build where
crash(8) is unable to do a backtrace:

# ktrace crash
Crash version 9.99.64, image version 9.99.64.
Kernel compiled without options LOCKDEBUG.
Output from a running system is unreliable.
crash> trace /t 0t4003
crash: kvm_read(0x4, 4): kvm_read: Bad address
trace: pid 4003 lid 4003
crash> 


-- 
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp_AUKSZjQ3J.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index