NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54969: Disk cache is no longer flushed on shutdown



>Number:         54969
>Category:       kern
>Synopsis:       Disk cache is no longer flushed on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 16 09:40:00 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2017.08.21.09.00.21, and -9
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

The disk controller on one of my systems is logging an error message
on every power-on, indicating that the controller's battery backed
cache still contains data from the previous time the system was
powered on:

  POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator

This means that from the controller's perspective, the system was not
shut down cleanly.  But the system has in fact been shut down cleanly,
at least as far as the kernel is concerned, by running "halt -p".

By adding some printfs to the sd(4) driver, I found that sd_flush() is
not being called during the shutdown, and neither is sd_lastclose().

The serial console shows "detached" messages from a large number of
devices including the non-root disk sd1 (which was never mounted), but
the root disk sd0 is conspicuously absent:

  Feb  9 05:32:24 hostname halt: halted by root
  Feb  9 05:32:24 hostname syslogd[167]: Exiting on signal 15
  [ 8086.8109260] syncing disks... done
  [ 8086.9609971] sd1: detached
  [ 8086.9910100] cd0: detached
  [ 8087.0210241] brgphy3: detached
  [ 8087.0610430] brgphy2: detached
  [ 8087.0910569] brgphy1: detached
  [ 8087.1310757] brgphy0: detached
  [ 8087.1710944] atapibus0: detached
  [ 8087.2011089] uhub5: detached
  [ 8087.2411278] uhub3: detached
  [ 8087.2711418] uhub2: detached
  [ 8087.3111606] uhub1: detached
  [ 8087.3411746] com1: detached
  [ 8087.4312167] bnx3: detached
  [ 8087.5212591] bnx2: detached
  [ 8087.6113014] bnx1: detached
  [ 8087.7013435] bnx0: detached
  [ 8087.7313577] atabus1: detached
  [ 8087.7757913] atabus0: detached
  [ 8087.8122505] usb5: detached
  [ 8087.8455835] usb4: detached
  [ 8087.8789168] usb2: detached
  [ 8087.9122492] usb1: detached
  [ 8087.9455816] pci11: detached
  [ 8087.9799570] pci10: detached
  [ 8088.0143320] pci9: detached
  [ 8088.0476650] pci8: detached
  [ 8088.0809989] pci7: detached
  [ 8088.1143312] pci6: detached
  [ 8088.1476642] pci5: detached
  [ 8088.1809984] pci4: detached
  [ 8088.2143316] pci3: detached
  [ 8088.2476648] pci2: detached
  [ 8088.2809972] sysbeep0: detached
  [ 8088.3184975] midi0: detached
  [ 8088.3516482] ehci0: detached
  [ 8088.3816623] uhci4: detached
  [ 8088.4216811] uhci2: detached
  [ 8088.4516952] uhci1: detached
  [ 8088.4817099] ppb10: detached
  [ 8088.5217278] pchb12: detached
  [ 8088.5517425] pchb11: detached
  [ 8088.5917607] pchb10: detached
  [ 8088.6217747] pchb9: detached
  [ 8088.6617935] pchb8: detached
  [ 8088.6918073] pchb7: detached
  [ 8088.7318261] pchb6: detached
  [ 8088.7618402] pchb5: detached
  [ 8088.8018594] pchb4: detached
  [ 8088.8318730] pchb3: detached
  [ 8088.8618871] pchb2: detached
  [ 8088.9019059] pchb1: detached
  [ 8088.9319202] ppb9: detached
  [ 8088.9719389] ppb8: detached
  [ 8089.0019528] ppb7: detached
  [ 8089.0319669] ppb6: detached
  [ 8089.0719858] ppb5: detached
  [ 8089.1019997] ppb4: detached
  [ 8089.1320139] ppb3: detached
  [ 8089.1720326] ppb2: detached
  [ 8089.2020466] ppb1: detached
  [ 8089.2320607] pchb0: detached

  [ 8089.2820849] The operating system has halted.
  [ 8089.2820849] Please press any key to reboot.

This is a HP DL360 G7 server with a P410i disk controller and the
BBWC option.  A full console log is at:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.15.12.45.05/test.log

By grepping historic logs from the TNF i386 testbed for the
corresponding "wd0: detached" messages, I found that they were present
until the following commit, and absent thereafter:

  2017.08.21.09.00.21 hannken src/sys/kern/vfs_mount.c 1.67
  2017.08.21.09.00.21 hannken src/sys/kern/vfs_vnode.c 1.98
  2017.08.21.09.00.21 hannken src/sys/sys/vnode_impl.h 1.16

The commit message was "Change forced unmount to revert open device
vnodes to anonymous devices."

This issue looks like it has the potential to cause data loss.  For
example, the HP system will presumably lose the cahced data if powered
off long enough to drain the BBWC battery.  The -9 branch is also
affected.

>How-To-Repeat:

>Fix:



Home | Main Index | Thread Index | Old Index