NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/56329: nvme(4) takes long time to umount
>Number: 56329
>Category: kern
>Synopsis: nvme(4) takes long time to umount
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 25 04:00:00 +0000 2021
>Originator: Paul Goyette
>Release: NetBSD 9.99.87
>Organization:
+--------------------+--------------------------+----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost |
| | | pgoyette99%gmail.com@localhost |
+--------------------+--------------------------+----------------------+
>Environment:
System: NetBSD speedy.whooppee.com 9.99.87 NetBSD 9.99.87 (SPEEDY 2021-07-23 13:58:03 UTC) #0: Sat Jul 24 00:01:27 UTC 2021 paul%speedy.whooppee.com@localhost:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
Using nvme(4) device as input AND output device for running
standard system builds, so there is lots of file creation and
re-creation and deletion. Device details are
<lspci>
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
<dmesg>
[ 8.118210] nvme0 at pci3 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a804 (rev. 0x00)
Depending on just how much activity has occurred, it can take
up to 15 seconds (or even longer) for the device/file-system
to umount. During this time, neither xosview nor systat seem
to observe any disk activity (presumably, no commands are being
issued), yet the device does not respond. It "feels like" the
device is dequeuing a large number of deferred operations which
are all being processed by the nvme controller without any host
intervention.
Here is a time-stamped console log of a system shutdown showing
the lengthy delay for umount (in this case, ~15 seconds):
...
[ 900738.108348] unmounting 0xffff9afce05b4000 /kern (kernfs)...
[ 900738.108348] unmounted kernfs on /kern type kernfs
[ 900738.108348] unmounting 0xffff9afce0d30000 /build (/dev/ld0e)...
[ 900753.244179] unmounted /dev/ld0e on /build type ffs
[ 900753.244179] unmounting 0xffff9afce052f000 /home (/dev/wd0f)...
[ 900753.624326] unmounted /dev/wd0f on /home type ffs
...
Given that there seems to be some queue of deferred operations,
it is not surprising that, after a system crash, a forced
``fsck -fp'' identifies [tens of] thousands of unreferenced
files. So far, fsck has been successful at restoring the file
system without any data loss.
This misbehavior occurs whether or not the file system used
``-o log'' (wapbl). And it occurs without using ``-o discard''.
Not sure how to debug this further. However, dh@ suggests
that this behavior is definitely suboptimal, and that it
should be a blocker for the NetBSD-10 release. (At least, we
ought to know enough to describe the actual bug, and maybe
provide a work-around.
>How-To-Repeat:
See above
>Fix:
Yes, please
>Unformatted:
Home |
Main Index |
Thread Index |
Old Index