Subject: Re: --db_more-- in recent sparc64 kernel
To: None <eeh@netbsd.org, petrov@netbsd.org>
From: None <eeh@netbsd.org>
List: port-sparc64
Date: 07/14/2001 01:33:18
| On Sat, Jul 14, 2001 at 12:13:31AM -0000, eeh@netbsd.org wrote:
| > 
| > | One which fails to start:
| > |
| > | netbsd:     file format elf64-sparc
| > | netbsd
| > | architecture: sparc:v9, flags 0x00000012:
| > | EXEC_P, HAS_SYMS
| > | start address 0x0000000001000000
| > |
| > | Program Header:
| > |     LOAD off    0x0000000000000080 vaddr 0x0000000001000000 paddr 0x0000000001000000 align 2**7
| > |          filesz 0x000000000041e7d8 memsz 0x0000000000499768 flags rwx
| > 
| > You haven't been looking at the linker errors, have you?
| > 
|
| No, no linker errors. 

Hm, that's interesting.

| > What has happened in this case is that your kernel and data segments have
| > collided.  This is a bad thing.
| > 
|
| Here is the same with sections information:
|
| netbsd:     file format elf64-sparc
| netbsd
| architecture: sparc:v9, flags 0x00000012:
| EXEC_P, HAS_SYMS
| start address 0x0000000001000000
|
| Program Header:
|     LOAD off    0x0000000000000080 vaddr 0x0000000001000000 paddr 0x0000000001000000 align 2**7
|          filesz 0x000000000041e7d8 memsz 0x0000000000499768 flags rwx
|
| Sections:
| Idx Name          Size      VMA               LMA               File off  Algn
|   0 .text         0029fe38  0000000001000000  0000000001000000  00000080  2**7
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         0001e7d8  0000000001400000  0000000001400000  00400080  2**6
|                   CONTENTS, ALLOC, LOAD, DATA
|   2 .rodata       000998d1  000000000129fe38  000000000129fe38  0029feb8  2**3
|                   CONTENTS, ALLOC, LOAD, READONLY, DATA
|   3 .bss          0007af88  000000000141e7e0  000000000141e7e0  0041e860  2**4
|                   ALLOC
|   4 .comment      000060a2  0000000000000000  0000000000000000  0041e860  2**0
|                   CONTENTS, READONLY
|   5 .note         00000020  00000000000060a4  00000000000060a4  00424904  2**2
|                   CONTENTS, READONLY
|   6 .ident        0000007b  00000000000060c4  00000000000060c4  00424924  2**0
|                   CONTENTS, READONLY
| SYMBOL TABLE:
| 0000000001404008 l       .data	0000000000000000 estack0
|
| text+rodata is smaller then 4MB so it shouldn't overlap with data.
| I got confused first time when I looked at sizes which ofwboot
| prints out.
| Linker put everything in one segment (program header), so I expect
| that gap is filled somehow. 

Here are the start and end of each segment:

(gdb) p 1000000+29fe38
$1 = 0x129fe38
(gdb) p 129fe38+998d1
$2 = 0x1339709
(gdb) p 1400000+1e7d8
$3 = 0x141e7d8
(gdb) p 141e7e0+7af88
$4 = 0x1499768

There does not seem to be any overlap.  There is 813303 bytes between the
end of the text segment and the beginning of the data segment.  There
is surprizingly little in the data segment.  Under 1MB.

I have seen this sort of thing myself recently.  It would appear that the
linker is fouling up.  If I grab one of these corrupt kernel images, stick 
it under gdb, and dump out statically initialized data, say cn_tab, the data
is corrupt.

|
| The kernel crashes in main() in the very beginning
| 	p = &proc0;
| 	curproc = p;
| 	p->p_cpu = curcpu();  <-----
|
| with 6c (fast DMMU protection) trap. All trap levels are populated
| this.

Fire up GDB on the umage and see what you have in, say, proc0.
Also, if there is only one segment, or confusion between segments,
it is quite possible that writeable data has is read-only because
it has text protection.

Eduardo