NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/47594: Hang during boot on Xen + CURRENT/6.01 + ASUS M2A-VM
>Number: 47594
>Category: kern
>Synopsis: Hang during boot on Xen + CURRENT/6.01 + ASUS M2A-VM
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 25 19:35:00 +0000 2013
>Originator: Toby Karyadi
>Release: CURRENT, 6.01
>Organization:
>Environment:
NetBSD server01.bogus.com 6.99.11 NetBSD 6.99.11 (XEN3_DOM0) #0: Sat Feb 22
16:58:01 EST 2013
bob%server01.bogus.com@localhost:/mnt/v01/build/src/current/2012.09.23.00.33.00/obj/sys/arch/amd64/compile/XEN3_DOM0
amd64
(okay, I wasn't able to get the uname -a string, since the kernel can't boot
up, and I've replaced the user and host names, but other than that, it's
accurate)
>Description:
I experienced a hang during boot up when the SATA disks are being detected,
that is, it would pause at:
<... more kernel messages ...>
ahcisata0: <text>
wd0 <text> <and then a long pause ~1-2minutes>
wd1 <text> <and then another long pause ~1-2minutes>
wd2 <text> <and then another long pause ~1-2minutes>
<and then the boot up just does not advance any further>
The bug only occurred when I use the XEN3_DOM0 kernel on this particular
motherboard, ASUS M2A-VM, which has an Athlon 3600 CPU and ATI SB 600 South
Bridge chip. If I use the GENERIC kernel, the hang up does not occur. This
problem is not reproducible on my other ASUS motherboard of the about the same
vintage, ASUS M2NPV-VM, which has some NVIDIA based south bridge chip. Setting
the bios setting to use IDE for the SATA disks vs using the AHCI interface did
not make any difference.
When I pressed Ctrl-Alt-Esc during the hang up the backtrace would show:
<keyboard stuff>
-- interrupt
<text> Xspllower <more text>
<text> netbsd:idle_loop
I narrowed down the problem to this checkin:
http://netbsd.sonnenberger.org/timeline?c=2012-09-23+02%3A31%3A05
Note that the time listed on the timeline on that website is UTC+2 for some
reason.
I know it's not necessarily easy to reproduce the bug because it's specific to
a motherboard that's about 5 years old, but I'll be happy to test patches etc
to help debug when I have the time. I'm just concerned that the new logic to
walk the device tree based on acpi is not exactly the same in behavior as
before and that the problem may crop up in other motherboards as well.
>How-To-Repeat:
- amd64 XEN3_DOM0 that has the above checkin, e.g. current branch checked out
from cvs with -D 2012.09.23.00.33.00
- ASUS M2A-VM motherboard
>Fix:
A workaround it obviously to use a kernel that does not have the checkin above,
e.g. current branch checked out from cvs with -D 2012.09.23.00.28.00.
Alternatively you can do a reverse patch of the checkin:
http://netbsd.sonnenberger.org/vpatch?from=ec1f945961a0b2fb&to=cb0f6fdf0cec8684
. But I haven't tried it.
Home |
Main Index |
Thread Index |
Old Index