Subject: Re: NetBSD-current on Amiga?
To: None <port-amiga@NetBSD.org>
From: Michael L. Hitch <mhitch@gemini.msu.montana.edu>
List: port-amiga
Date: 09/11/2006 12:07:05
> > It's apparently some default in gcc which has changed. Perhaps -m68020
> > needs to be explicitely set as the default so that 64 bit operations are
> > not generated. The alternative of making -68060 sets isn't very tidy. But
> > if anyone wants -m68060 built sets, please let me know.
>
>   I doubt it's anything to do with 64 bit operations - I'm not seeing any
> great number of unimplemented instruction traps.

  It is not anything to do with the toolchain.

> > That was where I was stuck before. I guessed that that was the problem,
> > but I didn't have the time to wait for the system to finish booting.
> >
> > What kind of faults are you seeing mostly?
>
>   They are all pagefaults.
>
>   A quick test during lunch shows this seems to happen after the fork()
> when the shell is doing the exec().  It's almost like it's getting write
> faults on a read-only page and the page is never getting set to writable
> until later.  I think I'm going to add some instrumentation to trap.c
> to capture the last N page faults and see what the PC and faulting VA
> addresses are for them.

  Nobody must have running much on an m68060 amiga using -current for
over a year.  I started tracking this back to figure out when the problem
first appeared, and found the exact day.  Sources from 2005-04-01 do not
exhibit the problem, but sources from 2005-04-02 do.  I rather suspected
this change, since comparing the differences between 3.1_RC1 and 4.0_BETA
seemed to indicate the only thing likely to explain what I was seeing was
UVM and pmap changes.

  Some of the kernels I ran had much smaller amounts of page faults when
starting programs, and the problem was not nearly as noticable.  If anyone
did run a -current kernel since that change, and happened to hit the case
where the number of faults was relatively low may not have noticed the
problem.  When the number of faults becomes rather excessive (like John
had), it becomes much more apparent.

  I've been trying to attempt to figure out exactly what is causing this,
but have had little success so far.  What it looks like to me is that in
certain conditions, a page fault on a missing page does not get completely
handled, and the hardware does not get the new page information set up
correctly, and when the instruction is restarted, it faults again.  The
fault handler is still happy about the fault and tries to fault the page
in again - over and over and over....  At some point, the MMU information
does get set up correctly, and things run fine after that - until the next
time that kind of fault occurs.

  I had added code in the trap fault handler to save information about the
last 256 page faults, and I can see where it gets many page faults that
have the same program counter, fault address, and fault status longword.

  I'm not all that familiar with the 68060 differences, but it's acting
like the level 2 segment table entry is not getting updated.  It acts like
the table entry is not getting written to memory - either the entry is
never written or it's being written to a cache-enabled page.  I haven't
yet been able to determine if that's what is happening;  the pmap code
appears to be doing the correct thing.

Michael