Subject: IMPORTANT: CRITICAL BUG FIX
To: None <port-hp300@NetBSD.ORG>
From: Jason Thorpe <thorpej@NetBSD.ORG>
Date: 10/04/1996 02:15:43
Folks! This is important, for anyone running NetBSD/hp300 1.2 or
some close approximation thereof (specifically, anyone who's kernel
prints out "delay constant for this cpu...")
Herb Peyerl reported a nasty bug to me last weekend, basically
recursive panic/trap whenever he attempted to use DDB. Scott Reynolds
and Dave Carrel confirmed this problem on their systems, as did I
on my 319. We diagnosed this as a corruption of the vector table;
stray pointer scribbling over kernel text. "ick".
So, with a tip from Charles Hannum, I found the bug tonight. The
commit message is included here for your reading enjoyment:
date: 1996/10/04 08:55:04; author: thorpej; state: Exp; lines: +9 -1
At the end of the delay calibration routine, explcitly reset the timer.
This fixes a critical bug where a clock interrupt would happen sometime
between the call to hp300_calibrate_delay() and when proc0 is initialized.
This ends up dereferencing a bad pointer in itimerdecr(), which scribbles
over the first page of kernel text, specifically vectors 46 and 47 (decimal).
To complicate matters, the way the bug manifested itself was different
depending on whether or not DDB was configured into the kernel. When
DDB is in the kernel, kernel text is mapped read/write. When DDB is not
in the kernel, kernel text is mapped read-only. Note that the kernel
scribble happens early, typically before the console is initialized.
In the non-DDB case, the kernel will hang as soon as it's loaded because
the access causes a fault (before the console is initialized, so you
don't see the trap).
In the DDB case, the access does _not_ cause a fault. However, the
mechanism used to enter the kernel debugger is to issue a "trap #15".
Conveniently, this is one of the corrupted vectors (47), thus rendering
DDB useless (it actually caused a recursive panic/trap loop).
This _WILL_ be in the first 1.2 official patch.
As the commit message implies, I'm going to make sure and get this into
the first NetBSD 1.2 Official Patch. However, I'm not sure when that's
going to go out. So, I'm going to mail this out to you folks now.
NOTE: if you do not run with this patch to clock.c, and you take
DDB out of your kernel config, YOUR KERNEL WILL NOT BOOT.
With that in mind, the (simple) patch is appended below.
I'll try to get updated kernel images up soon; if nothing else,
I'll make a new miniroot and kernel set when Official Patch 1
comes out. I'm really sorry that this snuck into the release
(announced today, if you didn't notice :-), but that's water
under the bridge now.
If you have any questions, don't hesitate to ask, but _please_
post them to the list; I'm extremely busy at work until the
end of November, and am not as quick to reply to e-mail as I'd like
to be (as some of you have no doubt noticed). If you post them to
the list, it's likely that someone else will be able to help you out.
Thanks for your patience, folks!
Jason R. Thorpe
The port-hp300 guy
----- snip -----
Apply this patch to src/sys/arch/hp300/hp300/clock.c
RCS file: /usr/og/devsrc/netbsd/src/sys/arch/hp300/hp300/clock.c,v
retrieving revision 220.127.116.11
diff -c -r18.104.22.168 clock.c
*** clock.c 1996/06/23 08:57:15 22.214.171.124
--- clock.c 1996/10/04 08:20:25
*** 165,170 ****
--- 165,178 ----
+ * Make sure the clock interrupt is disabled. Otherwise,
+ * we can end up calling hardclock() before proc0 is set up,
+ * causing a bad pointer deref.
+ clk->clk_cr2 = CLK_CR1;
+ clk->clk_cr1 = CLK_RESET;
* Sanity check the delay_divisor value. If we totally lost,
* assume a 50MHz CPU;