Subject: Re: --db_more-- in recent sparc64 kernel
To: None <eeh@netbsd.org>
From: Andrey Petrov <petrov@netbsd.org>
List: port-sparc64
Date: 07/13/2001 18:04:19
On Sat, Jul 14, 2001 at 12:13:31AM -0000, eeh@netbsd.org wrote:
> 
> | One which fails to start:
> |
> | netbsd:     file format elf64-sparc
> | netbsd
> | architecture: sparc:v9, flags 0x00000012:
> | EXEC_P, HAS_SYMS
> | start address 0x0000000001000000
> |
> | Program Header:
> |     LOAD off    0x0000000000000080 vaddr 0x0000000001000000 paddr 0x0000000001000000 align 2**7
> |          filesz 0x000000000041e7d8 memsz 0x0000000000499768 flags rwx
> 
> You haven't been looking at the linker errors, have you?
> 

No, no linker errors. 

> What has happened in this case is that your kernel and data segments have
> collided.  This is a bad thing.
> 

Here is the same with sections information:

netbsd:     file format elf64-sparc
netbsd
architecture: sparc:v9, flags 0x00000012:
EXEC_P, HAS_SYMS
start address 0x0000000001000000

Program Header:
    LOAD off    0x0000000000000080 vaddr 0x0000000001000000 paddr 0x0000000001000000 align 2**7
         filesz 0x000000000041e7d8 memsz 0x0000000000499768 flags rwx

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0029fe38  0000000001000000  0000000001000000  00000080  2**7
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         0001e7d8  0000000001400000  0000000001400000  00400080  2**6
                  CONTENTS, ALLOC, LOAD, DATA
  2 .rodata       000998d1  000000000129fe38  000000000129fe38  0029feb8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .bss          0007af88  000000000141e7e0  000000000141e7e0  0041e860  2**4
                  ALLOC
  4 .comment      000060a2  0000000000000000  0000000000000000  0041e860  2**0
                  CONTENTS, READONLY
  5 .note         00000020  00000000000060a4  00000000000060a4  00424904  2**2
                  CONTENTS, READONLY
  6 .ident        0000007b  00000000000060c4  00000000000060c4  00424924  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
0000000001404008 l       .data	0000000000000000 estack0


> On sparc64 the kernel is mapped with 4MB pages that are locked into the
> TLB.  To minimize wasteage, prevent overlap, etc. both the text segment
> and the data segment are explicitly located at a 4MB aligned address by
> the arguments passed to the linker.  Text is located at 0x0000000001000000
> and data located at 0x0000000001400000.  When the text+rodata grows beyond
> 4MB, it collides with the data segment and lots of bad things happen.

text+rodata is smaller then 4MB so it shouldn't overlap with data.
I got confused first time when I looked at sizes which ofwboot
prints out.
Linker put everything in one segment (program header), so I expect
that gap is filled somehow. 

The kernel crashes in main() in the very beginning
	p = &proc0;
	curproc = p;
	p->p_cpu = curcpu();  <-----

with 6c (fast DMMU protection) trap. All trap levels are populated
this.

> The solution is to change the linker command to move the data segment over
> to the next 4MB boundary.  It may make more sense to play with the linker
> scripts and specify 4MB alignment for sections, that way the kernel could
> grow without requiring changes to the link command, but that gets a bit
> more complicated.

I can try this. Currently I'm playing with ofw debugger trying to find how
it fails. 

	Andrey