NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/56932: x68k frequently hangs up after uvm change in 9.99.75

>Number:         56932
>Category:       kern
>Synopsis:       x68k frequently hangs up after uvm change in 9.99.75
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jul 18 04:55:00 +0000 2022
>Originator:     Tetsuya Isaki
>Release:        NetBSD-current (9.99.75)
NetBSD 9.99.75 x68k
x68k frequently hangs up after uvm change in 9.99.75.
According to bisect (between 9.99.1 and 9.99.98), the following commit
seems trigger.

> Module Name:  src
> Committed By: chs
> Date:     Wed Nov  4 01:30:19 UTC 2020
> Modified Files:change, that is, one previous commit in git doesn't hang up.
>   src/sys/kern: init_main.c
>   src/sys/uvm: uvm_aobj.c uvm_init.c uvm_pdaemon.c
> Log Message:
> In uvmpd_tryownerlock(), if the initial try-lock of the owner lock fails
> then rather than do more try-locks and eventually sleep for a tick,
> take a hold on the current owner's lock, drop the page interlock,
> and acquire the lock that we took the hold on in a blocking fashion.
> After we get the lock, check if the lock that we acquired is still
> the lock for the owner of the page that we're interested in.
> If the owner hasn't changed then can proceed with this page,
> otherwise we will skip this page and move on to a different page.
> This dramatically reduces the amount of time that the pagedaemon
> sleeps trying to get locks, since even 1 tick is an eternity to sleep
> in this context and it was easy to trigger that case in practice,
> and with this new method the pagedaemon only very rarely actually blocks
> to acquire the lock that it wants since the object locks are adaptive,
> and when the pagedaemon does block then the amount of time it spends
> sleeping will be generally be much less than 1 tick.
> To generate a diff of this commit:
> cvs rdiff -u -r1.531 -r1.532 src/sys/kern/init_main.c
> cvs rdiff -u -r1.151 -r1.152 src/sys/uvm/uvm_aobj.c
> cvs rdiff -u -r1.54 -r1.55 src/sys/uvm/uvm_init.c
> cvs rdiff -u -r1.130 -r1.131 src/sys/uvm/uvm_pdaemon.c

The hung-up happens randomly but highly reproduible.  Three times runs
are sufficient to reproduce it.  In other words, I cannot complete a
sequence of power-on, login, and shutdown successfully 3 times in a row.
The kernel built without this change, that is, one previous commit in git
doesn't hang up.

When this hung-up occurs, sometimes echo back works but sometimes it
does not.  NMI (entering DDB) works.

Here is a dmesg and output of swapctl -a.  I got it on the kernel
without this uvm change (one previous commit in git), because it's hard
to work on that kernel...

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
    2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 9.99.75 (GENERIC) #15: Mon Jul 18 00:25:57 JST 2022
X68030 (m68030 CPU/MMU, m68882 FPU, 30MHz clock)
total memory = 12288 KB
avail memory = 7744 KB
entropy: no seed from bootloader
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
mainbus0 (root)
intio0 at mainbus0 mapped at 0x38f000
ne0 at intio0 addr 0xece300 intr 0xf9: Nereid Ethernet
ne0: NE2000 (RTL8019) Ethernet
ne0: Ethernet address XX:XX:XX:XX:XX:XX
ne0: 10base2, 10baseT, 10baseT-FDX, auto, default [0x00 0x10] auto
mfp0 at intio0 addr 0xe88000 intr 0x40
clock0 at mfp0: MFP timer C
kbd0 at mfp0
powsw0 at mfp0: Front Switch
rtc0 at intio0 addr 0xe8a000: RP5C15
dmac0 at intio0 addr 0xe84000: HD63450 DMAC
dmac0: 4 channels available.
zsc0 at intio0 addr 0xe98000 intr 0x70
zstty0 at zsc0 channel 0
ms0 at zsc0 channel 1
neptune0 at intio0 addr 0xece000 intr 0xf9: no device found.
neptune1 at intio0 addr 0xece400 intr 0xf9: no device found.
opm0 at intio0 addr 0xe90000
vs0 at intio0 addr 0xe92000 using DMA ch3 intr 0x6a and 0x6b
dmac0: allocating ch 3 for vs.
vs0: MSM6258V ADPCM voice synthesizer
audio0 at vs0: playback, capture, half duplex
audio0: slinear_be:16 -> adpcm:4 1ch 15625Hz, blk 625 bytes (80ms) for playback
audio0: slinear_be:16 <- adpcm:4 1ch 15625Hz, blk 625 bytes (80ms) for recording
spkr0 at audio0: PC Speaker (synthesized)
fdc0 at intio0 addr 0xe94000 intr 0x60 using DMA ch0 intr 0x64 and 0x65
dmac0: allocating ch 0 for fdc.
fdc0: uPD72065 FDC
fd0 at fdc0 drive 0: 1.2MB/[1024bytes/sector], 77 cyl, 2 head, 8 sec
fd1 at fdc0 drive 1: 1.2MB/[1024bytes/sector], 77 cyl, 2 head, 8 sec
par0 at intio0 addr 0xe8c000 intr 0x63: parallel port (write only, interrupt)
scsirom0 at intio0 addr 0xfc0000: On-board at 0xe96020
spc0 at scsirom0
scsibus0 at spc0: 8 targets, 8 luns per target
sram0 at intio0 addr 0xed0000: 16k bytes accessible
bmd0 at intio0 addr 0xece3f0: Nereid Bank Memory Disk
bmd0: 16 MB, 0xee0000(64KB) x 256 pages
grfbus0 at mainbus0
grf0 at grfbus0: 768 x 512 16 colors builtin display
ite0 at grf0: rows 32 cols 96
grf1 at grfbus0: 768 x 512 16 colors graphic display
ite at grf1 not configured
enabling interrupts
entropy: WARNING: extracting entropy too early
timecounter: Timecounter "mfp" frequency 20000 Hz quality 100
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <CFSxSI, NAGATO, 0101> disk fixed
sd0: 955 MB, 1942 cyl, 16 head, 63 sec, 512 bytes/sect x 1957536 sectors
sd0: async, 8-bit transfers
aes: BearSSL aes_ct
aes_ccm: self-test passed
chacha: Portable C ChaCha
blake2s: self-test passed
bell0: YM2151 OPM bell emulation.
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
entropy: ready

# swapctl -a
Device      1K-blocks     Used    Avail Capacity  Priority
/dev/bmd0c      16384     2908    13476    18%    0
/dev/sd0b      132016        0   132016     0%    2
Total          148400     2908   145492     2%
As described above, it happens randomly but I used the following
procedure: power-on, login as root, "shutdown -r now", and
boot again...

Home | Main Index | Thread Index | Old Index