NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: install/58932: NetBSD-10.99.12-i386-install.img ACPI problems Compal DL-75 laptop



> Date: Mon, 23 Dec 2024 09:10:01 +0000 (UTC)
> From: ea1abz%gmail.com@localhost
> 
> The dmesg booting and panic screen (I do not know a way to record
> that in a file, sorry):

You may be able to:

1. boot in the way that crashes
2. type `reboot' (or `sync') at the ddb prompt
3. boot in the way that works
4. run `dmesg' (or check /var/run/dmesg.boot)

This should have the previous boot's dmesg at the beginning, as long
as it is a reboot or reset and not power-off and power-on.

> https://ea4nz.cloudns.cc/hamradio/20241223_084701.jpg
> [...]

Here's the panic, summarized:

[   1.7092374] panic: cpu0: time has not advanced in 1501 heartbeats
[   1.7092374] cpu0: Begin traceback...
[   1.7092374] [...]
[   1.7092374] panic
[   1.7092374] heartbeat
[   1.7092374] hardclock
[   1.7092374] --- switch to interrupt stack ---
[   1.7092374] Xresume_lapic_timer
[   1.7092374] --- interrupt ---
[   1.7092374] inl
[   1.7092374] AcpiHwRead
[   1.7092374] AcpiGetTimer
[   1.7092374] acpitimer_read_fast
[   1.7092374] binuptime
[   1.7092374] microtime
[   1.7092374] auich_finish_attach
[   1.7092374] config_interrupts_thread

The panic shows that the system timecounter state has not advanced in
about 15sec.  This is weird because the hardclock interrupt seems to
be firing on cpu0, and that normally advances the system timecounter
state.

When the heartbeat timer timed out, it was probably in this loop in
the auich(4) PCI audio driver:

   1646 	/* start */
   1647 	kpreempt_disable();
   1648 	microtime(&t1);
   1649 	bus_space_write_1(sc->iot, sc->aud_ioh, ICH_PCMI + ICH_CTRL, ICH_RPBM);
   1650 
   1651 	/* wait */
   1652 	nciv = ociv;
   1653 	do {
   1654 		microtime(&t2);
   1655 		if (t2.tv_sec - t1.tv_sec > 1)
   1656 			break;
   1657 		nciv = bus_space_read_1(sc->iot, sc->aud_ioh,
   1658 					ICH_PCMI + ICH_CIV);
   1659 	} while (nciv == ociv);
   1660 	microtime(&t2);
   1661 
   1662 	/* stop */
   1663 	bus_space_write_1(sc->iot, sc->aud_ioh, ICH_PCMI + ICH_CTRL, 0);
   1664 	kpreempt_enable();

https://nxr.netbsd.org/xref/src/sys/dev/pci/auich.c?r=1.161#1646

Specifically, it was probably in one of the microtime() calls, reading
out the ACPI timecounter register.  I also note that the dmesg
timestamps have not advanced.

The available timecounters appear to be:

timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0

This makes me suspect that, in spite of acpi_timer.c's attempts to
test the ACPI hardware timer before committing to use it, that timer
is broken -- it's not advancing the timecounter and it's not letting
auich(4) break out of that loop and it's not moving the dmesg
timestamps forward.

Unfortunately, there's no userconf(4) trick to disabling the ACPI
timecounter at boot, but I would be curious to see what happens if you
patch acpitimer_init in sys/dev/acpi/acpi_timer.c to just return -1
unconditionally without doing anything to attach a timecounter.

If that produces the same results, you could try additionally patching
acpipmtimer_attach in sys/dev/ic/acpipmtimer.c to just return NULL
unconditionally without doing anything to attach a timecounter.  Then
your system might boot with i8254 -- suboptimal, but better than not
booting, and it would help to confirm this hypothesis about what's
happening.

I would also be curious to see output of `acpidump -dt' on this system
(whether from NetBSD, if you can get it that way, or even from Linux
if you can't, with the caveat I'm not 100% sure the command-line
options are the same).


Home | Main Index | Thread Index | Old Index