NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54253: disk mis-detection panic on HP Pavilion w/NVIDIA chipset



>Number:         54253
>Category:       kern
>Synopsis:       disk mis-detection panic on HP Pavilion w/NVIDIA chipset
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 31 14:45:00 +0000 2019
>Originator:     John D. Baker
>Release:        NetBSD/amd64-8.99.41
>Organization:
>Environment:
NetBSD dpe2850c.technoskunk.fur 8.99.41 NetBSD 8.99.41 (GENERIC) #255: Thu May 23 23:15:52 CDT 2019 sysop%yggdrasil.technoskunk.fur@localhost:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC amd64
>Description:
I have access to an older HP Pavilion system that I sometimes use to
netboot -current, mostly for testing "nouveau" (it's the one machine on
which nouveau has always worked).

(The machine normally boots a recent Linux Mint from local disk.)

NetBSD 8.99.41 (GENERIC) #255: Thu May 23 23:15:52 CDT 2019
        sysop%yggdrasil.technoskunk.fur@localhost:/r0/build/current/obj/amd64/sys/arch/amd64/compile/GENERIC
total memory = 8191 MB
avail memory = 7927 MB
WARNING: module error: module `nfs' pushed by boot loader already exists
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
userconf: configure system autoconfiguration:
uc> disable nouveau
nouveau* disabled
uc> exit
Continuing...
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
HP-Pavilion NP218AA-ABA p6142p ( )
[...]
cpu0 at mainbus0 apid 0
cpu0: AMD Phenom(tm) 9650 Quad-Core Processor, id 0x100f23
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 1
cpu1: AMD Phenom(tm) 9650 Quad-Core Processor, id 0x100f23
cpu1: package 0, core 1, smt 0
cpu2 at mainbus0 apid 2
cpu2: AMD Phenom(tm) 9650 Quad-Core Processor, id 0x100f23
cpu2: package 0, core 2, smt 0
cpu3 at mainbus0 apid 3
cpu3: AMD Phenom(tm) 9650 Quad-Core Processor, id 0x100f23
cpu3: package 0, core 3, smt 0

I forget just when it started doing this, but recent -current (probably
starting with 8.99.35 maybe?) will panic during disk detection with (hand
transcribed as machine has no serial port):

[...]
wd1 at atabus2 drive 0
panic: kernel diagnostic assertion "mutex_owned(&chp->ch_lock)" failed: file "/x/current/src/sys/dev/ata/ata_subr.c", line 275
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x160
stge_eeprom_wait.isra.4() at netbsd:stge_eeprom_wait.isra.4
ahci_reset_drive() at netbsd:ahci_reset_drive+0x2b
wd_get_params.constprop.5() at netbsd:wd_get_params.constprop.5+0x9a
wdattach() at netbsd:wdattach+0x104
config_attach_loc() at netbsd:config_attach_loc+0x1a5
config_found_sm_loc() at netbsd:config_found_sm_loc+0s48
atabusconfig_thread() at netbsd:atabusconfig_thread+0x2f1
cpu0: End traceback...
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff8021ddad cs 0x8 rflags 0x202 cr2 0 ilevel 0 rsp 0xffffba80afa37cb0
curlwp 0xffff9b4309e29240 pid 0.83 lowest kstack 0x ffffba80afa342c0
Stopped in pid 0.83 (system) at netbsd:breakpoint+0x5:  leave
db{0}> 

The machine has only one disk, wd0, so the "wd1 at atabus2 drive 0" is
spurious.  It should instead be atapibus0 and cd0.  Looking at a dmesg.boot
where it booted successfully:

[...]
ahcisata0 at pci0 dev 9 function 0: NVIDIA nForce MCP77 AHCI Controller (rev. 0xa2)
LSA0: Picked IRQ 21 with weight 1
ahcisata0: 64-bit DMA
ahcisata0: ignoring broken port multiplier support
ahcisata0: AHCI revision 1.20, 4 ports, 32 slots, CAP 0xe3209f03<PMD,ISS=0x2=Gen2,SCLO,SAL,SSNTF,SNCQ,S64A>
ahcisata0: interrupting at ioapic0 pin 21
atabus0 at ahcisata0 channel 0
atabus1 at ahcisata0 channel 1
atabus2 at ahcisata0 channel 2
atabus3 at ahcisata0 channel 3
[...]
ahcisata0 port 1: device present, speed: 3.0Gb/s
ahcisata0 port 2: device present, speed: 1.5Gb/s
autoconfiguration error: ahcisata0 port 2: clearing WDCTL_RST failed for drive 0
autoconfiguration error: ahcisata0 port 1: clearing WDCTL_RST failed for drive 0
ehci1: handing over low speed device on port 2 to ohci1
wd0 at atabus1 drive 0
wd0: <WDC WD6400AAKS-65A7B2>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 596 GB, 1240341 cyl, 16 head, 63 sec, 512 bytes/sect x 1250263728 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), NCQ (32 tags)
wd0(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags)
atapibus0 at atabus2: 1 targets
cd0 at atapibus0 drive 0: <ATAPI   DVD A  DH16A6L-C, 249920422616, ZHCH> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
cd0(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)

An interesting feature is that stge(4) code seems always to be involved
although the machine does not have an stge(4) interface (it has nfe(4)).

The panic occurs during the period when "nouveau" is attaching so the
screen is blank.  At first I thought it was a "nouveau" regression so
I disabled "nouveau" via userconf.  That's when I saw the actual panic.

>How-To-Repeat:
See above.
>Fix:



Home | Main Index | Thread Index | Old Index