Subject: Re: R140, now mounting root over NFS
To: Ben Harris <bjh21@NetBSD.ORG>
From: Kjetil B. Thomassen <kjetil@thomassen.priv.no>
List: port-arm32
Date: 12/17/2000 21:17:45
On Sun 10 Dec, Ben Harris wrote:
> On Sun, 10 Dec 2000, Kjetil B. Thomassen wrote:
> 
> > On Sat 09 Dec, Ben Harris wrote:
> > > On Sun, 3 Dec 2000, Kjetil B. Thomassen wrote:
> > > 
> > > > I made myself a test kernel were I added DDB and strip out some of the
> > > > things I didn't think I needed. The config file has been attached to
> > > > this email.
> > > > 
> > > > The last stuff before it hang is:
> > > > root file system type: nfs
> > > > init: copying out path '/sbin/init' 11
> > > > 
> > > > I broke into DDB with CTRL-ALT-Esc
> > > 
> > > Is it possible for you to try typing "x/i 0,8" at DDB here?  This will
> > > dump the exception vectors, and on my system, I find that the vector at
> > > 0x8 (the SWI vector) has been corrupted to "andeq r0,r0,r0" (ie all
> > > zeroes).  It looks like something in the kernel is writing through a NULL
> > > pointer.  That'll be fun to debug.
> > 
> > The vectors are not corrupted on my R140. They look fine to me.
> 
> How odd.  So they were all of the form "ldr r15, fiqhandler+0x100" etc?  
> Incidentally, if you get revision 1.15 or later of
> sys/arch/arm26/arm26/except.c, the kernel will check the vectors itself
> every time it returns to user mode (if DIAGNOSTIC's defined).

Kernel sources updated yesterday, and new kernel made, now I get a bit
further, but it panics:

NetBSD 1.5N (R140) #7 ...
...
Avail 1920 KB
...
panic: m_copym0 overrun
Stopped in nfsio at cpu_Debugger+0x10 bl kbd_trap

The interesting thing now is that it managed to create a new process:
PID PPID  PGRP  UID  S  FLAGS  COMMAND  WAIT
10   1     10    0   3   0x6    init    uvn_fp1

The trace shows:
panic
m_dup
m_copym
nfs_request
nfs_writerpc
nfs_doio
nfssvc_iod
start_nfsio

I don't know what is going on here, but at least it is a bit of
progress. My config file can be found at:
http://www.thomassen.priv.no/NetBSD/R140.conf

And a compiled kernel can be found at:
http://www.thomassen.priv.no/NetBSD/R140.gz

They are both linked from:
http://www.thomassen.priv.no/NetBSD/

> > Yes, I got several pages of output, and I couldn't understand much of
> > it. Also, the stuff in uvm_stat.c is above my head, so I think I need to
> > understand more of this before I can do anything more.
> >
> > I used sources from some time yesterday, but it did not get any further
> > than it has done before. The R140 is up and running as it has mounted
> > the root directory and is answering when I ping it. The delay is around
> > 5 ms.
> > 
> > Is there anything else I can do in DDB to try to track this down?
> 
> One possibility is to look at the UVM histories, making a note of the
> timestamps of the last few entries, then type "c", wait a bit, drop back
> into the debugger and re-run the history and see if any more things have
> happened.  This can give you some idea of what's going on.

I did that, and at the time there were activity from time to time, but
as I don't understand the kernel internals, I have no idea of what is
going on.

I think that if I am going to be of more help here I need to know more
about how the kernel works internally. Any good ideas for what I should
do to get some more insight?

Making a new kernel takes around 60-90 minutes, and this is done in the
background, so I will continue to cvs the sources and try with irregular
intervals.

I guess that this panic asks for a send-pr, so if you agree, then I will
do that.

I also looked in the source code for a driver for my HCCS IDE card, but
there is no driver for it. That is why it is not being configured, so I
guess I have to try to get the necessary information from somewhere and
adapt the wd driver to work with it.

Any progress on finding the basic program to replace SetStation? If not,
do you remember what it takes to just set a station number?

TIA!

Kjetil B.
mailto:kjetil@thomassen.priv.no
http://www.thomassen.priv.no/