NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/53183: System stops servicing I/O requests and eventually deadlocks



>Number:         53183
>Category:       kern
>Synopsis:       System stops servicing I/O requests and eventually deadlocks
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 14 20:45:00 +0000 2018
>Originator:     Sevan Janiyan
>Release:        NetBSD-HEAD
>Organization:
>Environment:
i386 build
>Description:
Following on from the nvme deadlock PR kern/52769 [1], I have spent some more time on trying to gather information on how my system deadlocks when I cvs update. An easy trigger is updating a tree which is has a lot of catching up to do, especially in src/external. The system is still technically alive but it will not perform any disk I/O operations, that is, in X11, I can have a window displaying top(1), another running iostat(1) and a few others running a cvs update of a pkgsrc and src tree at the same time and eventually the system will stock service i/o but the top & iostat windows will continue to operate, showing the system is completely idle.

I suspected the system which I frequently hit this issue, which is a Thinkpad flashed with coreboot. To rule this machine out, I switched the SSD out to another Thinkpad which does not run coreboot and the issue was present there too. It could well be the SSD at fault but I experienced the same problem on an ageing SATA HDD prior before investing in the SSD. Indeed I have not ruled out being double unlucky by using a third disk or system. I did attempt to using virtual box on macOS and on 2 attempts I ended up hard reseting the host in both cases. It seems that 2 concurrent CVS checkouts is too much.

I was previously using discard and log prior to that but stopped and the issues persisted.

Once the system deadlocks, it's possible to enter ddb once, after resuming, the system eventually locks hard.

[1] http://mail-index.netbsd.org/netbsd-bugs/2018/03/04/msg055906.html
>How-To-Repeat:
Start two concurrent checkouts or updates from CVS (pkgsrc & src tree)
Optional (guarantees failure in my case): add some CPU load, I've been trying to recompile a kernel with -j2 on a dual core system
>Fix:



Home | Main Index | Thread Index | Old Index