tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: So it seems "umount -f /nfs/mount" still doesn't work.....



At Tue, 30 Jun 2020 12:52:37 -0700, "Greg A. Woods" <woods%planix.com@localhost> wrote:
Subject: So it seems "umount -f /nfs/mount" still doesn't work.....
> 

Curiously the kernel now does something I didn't quite expect when one
tries to reboot a system with a stuck mount.  I was able to see this as
I was running a kernel that verbosely documents all its shutdown
unmounts and detaches.  In prior times I had reached for the power switch.

At first it just hangs:

lilbit# reboot -q
[ 1131744.8297338] syncing disks... 3 3 done
[ 1131744.9797408] unmounting 0xc1f27000 /more/work (more.local:/work)...
[ 1131744.9907053] ok
[ 1131744.9907053] unmounting 0xc1f24000 /more/archive (more.local:/archive)...
[ 1131745.0004431] ok
[ 1131745.0004431] unmounting 0xc1f21000 /more/home (more.local:/home)...
[ 1131745.0097426] ok
[ 1131745.0097426] unmounting 0xc1f1f000 /once/build (once.local:/build)...
[ 1131745.0097426] ok
[ 1131745.0210854] unmounting 0xc1f1b000 /future/build (future.local:/build)...
[ 1131745.0210854] ok
[ 1131745.0304676] unmounting 0xc1f11000 /building/build (building.local:/build)...

  .... this is me hitting ^T to try to see what's going on ....

[ 1131753.2800902] load: 0.52  cmd: reboot 7414 [fstcnt] 0.00u 0.16s 0% 424k
[ 1132107.6651517] load: 0.48  cmd: reboot 7414 [fstcnt] 0.00u 0.16s 0% 424k
[ 1133247.8436109] load: 0.48  cmd: reboot 7414 [fstcnt] 0.00u 0.16s 0% 424k

   .... then I hit ^C and immediately it proceeded ....

^C[ 1133249.3636755] unmounting 0xc1f0f000 /proc (procfs)...
[ 1133249.3636755] ok
[ 1133249.3636755] unmounting 0xc1f0d000 /dev/pts (ptyfs)...
[ 1133249.3788641] unmounting 0xc1ecb000 /kern (kernfs)...
[ 1133249.3843127] ok
[ 1133249.3843127] unmounting 0xc1ec9000 /cache (/dev/wd1a)...
[ 1133249.7636916] ok
[ 1133249.7636916] unmounting 0xc1ec6000 /home (/dev/wd0g)...
[ 1133249.7736976] unmounting 0xc1dd7000 /usr/pkg (/dev/wd0f)...
[ 1133250.0737098] unmounting 0xc1ab1000 /var (/dev/wd0e)...
[ 1133250.1537121] unmounting 0xc1804000 / (/dev/wd0a)...
[ 1133251.0337515] unmounting 0xc1f11000 /building/build (building.local:/build)...
[ 1133251.0469644] unmounting 0xc1f0d000 /dev/pts (ptyfs)...
[ 1133251.0469644] unmounting 0xc1ec6000 /home (/dev/wd0g)...
[ 1133251.0579007] unmounting 0xc1dd7000 /usr/pkg (/dev/wd0f)...
[ 1133251.0637673] unmounting 0xc1ab1000 /var (/dev/wd0e)...
[ 1133251.0637673] unmounting 0xc1804000 / (/dev/wd0a)...
[ 1133251.0750403] sd0: detached
[ 1133251.0750403] scsibus0: detached
[ 1133251.0750403] gpio1: detached
[ 1133251.0853614] sysbeep0: detached
[ 1133251.0853614] midi0: detached
[ 1133251.0853614] wd1: detached
[ 1133251.0949369] uhub0: detached
[ 1133251.0949369] com1: detached
[ 1133251.0949369] usb0: detached
[ 1133251.1045456] gpio0: detached
[ 1133251.1045456] ohci0: detached
[ 1133251.1045456] pchb0: detached
[ 1133251.1151702] unmounting 0xc1f11000 /building/build (building.local:/build)...
[ 1133251.1151702] unmounting 0xc1f0d000 /dev/pts (ptyfs)...
[ 1133251.1279509] unmounting 0xc1ec6000 /home (/dev/wd0g)...
[ 1133251.1279509] unmounting 0xc1dd7000 /usr/pkg (/dev/wd0f)...
[ 1133251.1393918] unmounting 0xc1ab1000 /var (/dev/wd0e)...
[ 1133251.1448739] unmounting 0xc1804000 / (/dev/wd0a)...
[ 1133251.1448739] forcefully unmounting /building/build (building.local:/build)...
[ 1133251.1587138] forceful unmount of /building/build failed with error -3
[ 1133251.1653872] rebooting...


So it seems there's some contention between the internal attempt to
unmount the stuck NFS filesystem(s), and the reboot system call itself,
but if the reboot command is interrupted, then the kernel can get on
with its shutdown procedures, and eventually it actually forces the
unmount of the stuck NFS filesystem.

Another interesting thing to note is that /future/build was also stuck
as future.local is offline at this time.  However that's the filesystem
I tried to clear first by hand with "umount -f /future/build", but that
was stuck, apparently in the same call to nfs_reconnect().  It seems it
had done enough that when the reboot() triggered unmounting that it
could complete the unmount without problems.  (The other mounts on
more.local and once.local were responding so they unmounted normally.)

-- 
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpY7aR5sgTkS.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index