Subject: Re: Ultra 10, anyone?
To: Andrey Petrov <petrov@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: port-sparc64
Date: 09/02/2002 11:43:13
On Wed, Aug 28, 2002 at 09:05:44PM -0700, Andrey Petrov wrote:
> On Wed, Aug 28, 2002 at 06:06:07PM -0700, Chuck Silvers wrote:
> > On Wed, Aug 28, 2002 at 01:54:38PM -0700, Andrey Petrov wrote:
> > > On Wed, Aug 28, 2002 at 01:36:12AM -0700, Chuck Silvers wrote:
> > > > hmm, actually this isn't an aliasing issue at all,
> > > > pmap_kremove() is just not flushing the cache.
> > > > note that when we remove the TTE entirely (such as in
> > > > pmap_page_protect(VM_PROT_NONE)), we do need to flush
> > > > the cache there as well.
> > > > 
> > > > please try the attached patch.
> > > > 
> > > 
> > > Nope, didn't pass the test:  build over nfs on netbooted machine.
> > > 
> > build what?  I tried building a kernel using only NFS with today's
> > 1.6-branch and it worked fine.  (I also tried -current, it was also fine.)
> > this was on an ultra2 though, is the problem ultra10-specific?
> 
> Don't know why but by some reason I thought that I told that already.
> It failed for me to build distribution. I usually breaks in tools.
> And this is on ultra2.


ok, I tried some experiments using build.sh.
a current 1.6-branch kernel fails while building linking the tools cc1:

cc -DCROSS_COMPILE -DIN_GCC -DHAIFA   -O   -DHAVE_CONFIG_H  -o cc1 toplev.o version.o tree.o print-tree.o stor-layout.o fold-const.o  function.o stmt.o except.o expr.o calls.o expmed.o explow.o optabs.o  intl.o varasm.o rtl.o print-rtl.o rtlanal.o emit-rtl.o genrtl.o real.o  dbxout.o sdbout.o dwarfout.o dwarf2out.o xcoffout.o bitmap.o alias.o gcse.o  integrate.o jump.o cse.o loop.o unroll.o flow.o stupid.o combine.o varray.o  regclass.o regmove.o local-alloc.o global.o reload.o reload1.o caller-save.o  insn-peep.o reorg.o haifa-sched.o final.o recog.o reg-stack.o  insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o lcm.o  profile.o insn-attrtab.o sparc.o getpwd.o  convert.o  mbchar.o dyn-string.o splay-tree.o graph.o sbitmap.o resource.o hash.o c-parse.o c-lang.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o  c-aux-info.o c-common.o c-iterate.o obstack.o        ../libiberty/libiberty.a
expr.o: In function `emit_block_move':
expr.o(.text+0x3134): undefined reference to `copy_to_mode_reg'
expr.o(.text+0x31d4): undefined reference to `copy_to_mode_reg'
expr.o(.text+0x31f0): undefined reference to `copy_to_mode_reg'
expr.o(.text+0x3218): undefined reference to `copy_to_mode_reg'
expr.o: In function `clear_storage':
expr.o(.text+0x5300): undefined reference to `copy_to_mode_reg'
expr.o(.text+0x5398): more undefined references to `copy_to_mode_reg' follow
cc: Internal compiler error: program ld got fatal signal 11
*** Error code 1


when I add my patch (cache flush in pmap_kremove() and
pmap_page_protect(VM_PROT_NONE)), then it gets past that
but fails while creating libc.a:

building standard c library
/home/chs/netbsd/tooldir/arch/sparc64/bin/sparc64--netbsd-ranlib libc.a
building profiled c library
*** Signal 11

Stop.
nbmake: stopped in /home/chs/netbsd/src/lib/libc
*** Error code 1
...


note that it's nbmake that's dumping core.  if I just do the build again,
it succeeds in creating libc_p.a but then dies creating libc_pic.a.

now the interesting bit is that if I also add the dcache_flush_page() in
pmap_clear_modify() (either your diff's way or my diff's way), nbmake dies
exactly the same way, so that change isn't helping.


gdb says:

9 ultra2:~ # gdb netbsd/tooldir/arch/sparc64/bin/nbmake netbsd/obj/sparc64/home/chs/netbsd/src/lib/libc/nbmake.core 
GNU gdb 5.0nb1
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc64--netbsd"...(no debugging symbols found)...
Core was generated by `nbmake'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/libexec/ld.elf_so...(no debugging symbols found)...
done.
Loaded symbols for /usr/libexec/ld.elf_so
Reading symbols from /usr/lib/libc.so.12...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libc.so.12
#0  0x40345568 in strlen () from /usr/lib/libc.so.12
(gdb) bt
#0  0x40345568 in strlen () from /usr/lib/libc.so.12
#1  0x114034 in Check_Cwd_av ()
#2  0x114338 in Check_Cwd_Cmd ()
#3  0x1143d4 in Check_Cwd ()
#4  0x104084 in CompatRunCommand ()
#5  0x110b1c in Lst_ForEachFrom ()
#6  0x110a94 in Lst_ForEach ()
#7  0x104588 in CompatMake ()
#8  0x110b1c in Lst_ForEachFrom ()
#9  0x110a94 in Lst_ForEach ()
#10 0x10431c in CompatMake ()
#11 0x1049fc in Compat_Run ()
#12 0x1139d0 in main ()
#13 0x1021ec in ___start ()
(gdb) 


the trouble here is that in Check_Cwd_av(), av[0] is bogus.
the av array looks like it contains the tail end of the cmd string
that was passed to Check_Cwd_Cmd().

however, if I add a call to getpid() in Check_Cwd_Cmd() right before
it calls brk_string(), then the problem goes away.  looks like more
cache-flushing problems.

-Chuck