NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/47594: Hang during boot on Xen + CURRENT/6.01 + ASUS M2A-VM



>Number:         47594
>Category:       kern
>Synopsis:       Hang during boot on Xen + CURRENT/6.01 + ASUS M2A-VM
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 25 19:35:00 +0000 2013
>Originator:     Toby Karyadi
>Release:        CURRENT, 6.01
>Organization:
>Environment:
NetBSD server01.bogus.com 6.99.11 NetBSD 6.99.11 (XEN3_DOM0) #0: Sat Feb 22 
16:58:01 EST 2013  
bob%server01.bogus.com@localhost:/mnt/v01/build/src/current/2012.09.23.00.33.00/obj/sys/arch/amd64/compile/XEN3_DOM0
 amd64

(okay, I wasn't able to get the uname -a string, since the kernel can't boot 
up, and I've replaced the user and host names, but other than that, it's 
accurate)
>Description:
I experienced a hang during boot up when the SATA disks are being detected, 
that is, it would pause at:

<... more kernel messages ...>
ahcisata0: <text>
wd0 <text> <and then a long pause ~1-2minutes>
wd1 <text> <and then another long pause ~1-2minutes>
wd2 <text> <and then another long pause ~1-2minutes>
<and then the boot up just does not advance any further>

The bug only occurred when I use the XEN3_DOM0 kernel on this particular 
motherboard, ASUS M2A-VM, which has an Athlon 3600 CPU and ATI SB 600 South 
Bridge chip. If I use the GENERIC kernel, the hang up does not occur. This 
problem is not reproducible on my other ASUS motherboard of the about the same 
vintage, ASUS M2NPV-VM, which has some NVIDIA based south bridge chip. Setting 
the bios setting to use IDE for the SATA disks vs using the AHCI interface did 
not make any difference. 

When I pressed Ctrl-Alt-Esc during the hang up the backtrace would show:
<keyboard stuff>
-- interrupt 
<text> Xspllower <more text>
<text> netbsd:idle_loop

I narrowed down the problem to this checkin:
http://netbsd.sonnenberger.org/timeline?c=2012-09-23+02%3A31%3A05
Note that the time listed on the timeline on that website is UTC+2 for some 
reason.

I know it's not necessarily easy to reproduce the bug because it's specific to 
a motherboard that's about 5 years old, but I'll be happy to test patches etc 
to help debug when I have the time. I'm just concerned that the new logic to 
walk the device tree based on acpi is not exactly the same in behavior as 
before and that the problem may crop up in other motherboards as well. 
>How-To-Repeat:
- amd64 XEN3_DOM0 that has the above checkin, e.g. current branch checked out 
from cvs with -D 2012.09.23.00.33.00
- ASUS M2A-VM motherboard
>Fix:
A workaround it obviously to use a kernel that does not have the checkin above, 
e.g. current branch checked out from cvs with -D 2012.09.23.00.28.00. 
Alternatively you can do a reverse patch of the checkin:
http://netbsd.sonnenberger.org/vpatch?from=ec1f945961a0b2fb&to=cb0f6fdf0cec8684 
. But I haven't tried it. 



Home | Main Index | Thread Index | Old Index