Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54969 (Disk cache is no longer flushed on shutdown)



So, the reason I jumped from what I thought was a relatively stable
point in the main -current branch to a more recent version was primarily
because of what I believe is a problem related to PR# 54969.

I had noticed long fscks on large filesystems following normal clean
reboots and got investigating.

Maybe what remains an issue here is just related to dm(4) partitions, as
only my /dev/mapper partition(s) have had problems recently.

Unfortunately though this is still happening with 9.99.81 (2021-03-10).
(and both with GENERIC and XEN3_DOM0)

In any case I would say this is the single most critical, serious, and
important, issue in current (and netbsd-9)!  It totally kills system
reliability (though maybe only if one is using LVM).

Just for evidence, I added a bunch more printfs to the kernel and rc.d
scripts (and '-v' flags to fsck, mount, etc.) to help me see for myself
better what exactly is going on.  This is the console after a truly
normal complete safe reboot using shutdown(8).

In this example all processes but the shutdown scripts should be dead.
The NFS mount on /more/work probably won't complete because I probably
started shutdown(8) without first doing "cd /" (and without doing "exec
shutdown"), and my CWD was probably on that NFS mount.  This could maybe
be fixed by having reboot/halt/powerdown kill its parent process first,
and perhaps also doing chdir("/") too.

There's no excuse I can find for /build not unmounting though, and
definitely no excuse for '/' not umounting either, though it later '/'
is forcefully unmounted, and on reboot '/' appears to be clean.  However
the forceful unmount of /build doesn't work, and it is NOT clean.

Note also that /build will sometimes unmount quickly and cleanly if it
hasn't been dirtied since the last boot, but it seems even creating one
file can leave it dirty on reboot.

Maybe what remains an issue here is just related to dm(4) partitions?


[Wed Mar 24 20:42:56 2021][ 715713.0781096] syncing disks... done
[Wed Mar 24 20:42:56 2021][ 715713.2081201] unmounted more.local:/vcs from /more/vcs, type nfs
[Wed Mar 24 20:42:56 2021][ 715713.2481211] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/home from /more/home, type nfs
[Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/archive from /more/archive, type nfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted procfs from /proc, type procfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted ptyfs from /dev/pts, type ptyfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted kernfs from /kern, type kernfs
[Wed Mar 24 20:42:58 2021][ 715714.6782049] unmounted /dev/dk3 from /usr/pkg, type ffs
[Wed Mar 24 20:42:58 2021][ 715714.7282939] unmounted /dev/dk2 from /var, type ffs
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
[Wed Mar 24 20:42:59 2021][ 715716.5383256] brgphy1: detached

	[[ ... almost all the rest of devices detach ... ]]

[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] sd1: detached
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting more.local:/work from /more/work...
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted more.local:/work from /more/work, type nfs
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting /dev/mapper/scratch-build from /build...
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
[Wed Mar 24 20:43:02 2021][ 715718.5384534] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5384534] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounting /dev/dk0 from /...
[Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounted /dev/dk0 from /, type ffs
[Wed Mar 24 20:43:02 2021][ 715718.5384534] unmounting done
[Wed Mar 24 20:43:02 2021][ 715718.5384534] turning off swap... done
[Wed Mar 24 20:43:02 2021][ 715718.5384534] dk0 at sd0 (/) deleted
[Wed Mar 24 20:43:02 2021][ 715718.5384534] sd0: detached
[Wed Mar 24 20:43:02 2021][ 715718.5384534] scsibus0: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] mfi0: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] pci8: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] ppb7: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] unmounting done
[Wed Mar 24 20:43:02 2021][ 715718.7184994] turning off swap... done
[Wed Mar 24 20:43:02 2021][ 715718.7184994] rebooting...

	[[ ... why is "turning off swap" seen twice? .. ]]

	[[ ... and then the reboot, until rc scripts say ... ]]

[Wed Mar 24 20:44:51 2021]Starting root file system check:
[Wed Mar 24 20:44:51 2021]/dev/rdk0: file system is clean; not checking
[Wed Mar 24 20:44:51 2021]start / wait fsck_ffs -p /dev/rdk0


[Wed Mar 24 20:44:52 2021]Starting file system checks:
[Wed Mar 24 20:44:52 2021]/dev/rdk2: file system is clean; not checking
[Wed Mar 24 20:44:52 2021]/dev/rdk3: file system is clean; not checking

	[[ ... here I hit ^T on the console as it was taking too long ... ]]

[Wed Mar 24 20:44:58 2021][  15.0201108] load: 0.08  cmd: sleep 345 [nanoslp] 0.00u 0.00s 0% 512k
[Wed Mar 24 20:44:58 2021]/dev/mapper/rscratch-build: phase 1: cyl group 24 of 345 (6%)
[Wed Mar 24 20:46:09 2021]/dev/mapper/rscratch-build: phase 1: cyl group 284 of 345 (82%)
[Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: 1400986 files, 36172587 used, 28347707 free (17403 frags, 3541288 blocks, 0.0% fragmentation)
[Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: MARKING FILE SYSTEM CLEAN
[Wed Mar 24 20:49:30 2021]start /var nowait fsck_ffs -p /dev/rdk2
[Wed Mar 24 20:49:30 2021]start /build nowait fsck_ffs -p /dev/mapper/rscratch-build
[Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk2 (/var) = 0x0
[Wed Mar 24 20:49:30 2021]start /usr/pkg nowait fsck_ffs -p /dev/rdk3
[Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk3 (/usr/pkg) = 0x0
[Wed Mar 24 20:49:30 2021]done ffs: /dev/mapper/rscratch-build (/build) = 0x0
[Wed Mar 24 20:49:30 2021]Script /etc/rc.d/fsck running
[Wed Mar 24 20:49:30 2021]Currently sourcing /etc/rc.d/fsck
[Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
[Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
[Wed Mar 24 20:49:30 2021]/dev/dk2 on /var type ffs (local, fsid: 0xa802/0x78b, reads: sync 1 async 0, writes: sync 2 async 0)


--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp2e2vqEzhWc.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index