amiga-dev: Re: Strange kernel problems

Subject: Re: Strange kernel problems
To: None <amiga-dev@sun-lamp.cs.berkeley.edu>
From: Michael L. Hitch <osymh@gemini.oscs.montana.edu>
List: amiga-dev
Date: 02/28/1994 08:08:42
On Feb 28, 11:51am, Niklas Hallqvist wrote:
> >>>>> "Michael" == Michael L Hitch <osymh@gemini.oscs.montana.edu> writes:
> 
> Michael>   Try out this fix and see if it helps any.  It has fixed the
> Michael> problem I was having with the 940219 kernel.  I finally
> Michael> tracked it down when I created a kernel that would give me a
> Michael> "init died" panic every time I tried to boot it.  If I varied
> Michael> the memory size, I could get it to run with no problem.
> 
> I LOVE YOU!
> 
> How on earth did you find this one, step by step?  I was really going crazy...

  I first ran into the problem when I was just getting the 5380 driver
code working on my homebuilt SCSI board.  It would panic with "init
died" when trying to start /sbin/init.  I tried to figure out what what
going on, but didn't spend too much time at it.  When I configured all
the SCSI drivers, the problem went away.  Then, when the 940219 kernel
came out, I had problems similar to yours when I tried to run it on an
8M system.  The same kernel ran fine on my other system with 16M, but
failed in the same way if I restricted the memory to 8M.  It would get
lots of errors on files when I tried to start up multi-user mode.  This
last weekend, I updated to the latest sources and was checking to see if
the problem still existed.  I found that 8M (8192K) would run fine, but
when I reduced the memory to 7500K, it started failing.  With 6500K, I
got the "init died" panic - so I figured the two problems were related.

  The final straw came when I generated a kernel that would give me the
panic when booting on a 16M system:  16384K would panic, but 16351K
would run fine.  At that point I decided I had to figure out what the
problem was.  After the panic, I rebooted AmigaDOS and looked at the
kernel stack (my AmigaDOS startup doesn't add the 32 bit memory, so I
can examine all the NetBSD memory from AmigaDOS;  I've even got a little
program that lets me examine the memory using virtual addresses).  From
the stack, it appeared that the execve of /sbin/init was failing, so I
started putting printf statements in the execve routine to determine
what error was occurring.  Once I had narrowed it down to an error
return from copyinstr, I started looking closely at that routine.  It
took me quite a while to figure out why it was returning an "error" code
of 0x0004000.  That turned out to be the maximum length of the string to
copy.  The gas bug that doesn't handle branch instructions properly
resulted in that bogus return value.

> To everyone: Are there any clear evidence that -O2 is bad?  I believe Chris
> took it out just in case.  And I wonder: Why didn't the copyinstr bug hit
> harder?  It was *really* evil!

  I haven't noticed any problems with the kernel using the -O2 option
(this is with the 2.5.6 version of gcc).  Any problems I've had were all
contributed to other bugs in the code.  One bug only showed up with the
-O2 option though.  I originally thought it might be the optimization
that caused the problem, but the real problem was an extraneous
reference to an uninitialized variable.

  The copyinstr bug only hit if the destination address had the low 16
bits of the address zero.  The size of available memory had a
significant impact on when that would occur.  In your case, you had to
run quite a while before it occurred.  I happened to get "lucky" and
have it occur when trying to start init, or when starting multi-user
mode.  I think if gas had assembled the branch instruction correctly, it
would have worked better, although not correctly.  The incorrect code
probably would have failed under certain conditions, depending upon the
maximum length of the copy and upon the actual length of the string.

Michael

------------------------------------------------------------------------------