Subject: toolchain/25467: objcopy fails to hand link_set sections correctly
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <Richard.Earnshaw@arm.com>
List: netbsd-bugs
Date: 05/05/2004 09:46:05
>Number:         25467
>Category:       toolchain
>Synopsis:       objcopy fails to hand link_set sections correctly
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    toolchain-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed May 05 08:47:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Richard Earnshaw
>Release:        NetBSD 2.0E
>Organization:
ARM
-- 
>Environment:
	
	
System: NetBSD shark1.cambridge.arm.com 2.0E NetBSD 2.0E (SHARK1) #2: Tue Apr 27 17:51:37 BST 2004 rearnsha@pc960.cambridge.arm.com:/work/rearnsha/netbsd/build/src/shark/sys/arch/shark/compile/SHARK1 shark
Architecture: arm
Machine: shark, cats
>Description:

	Objcopy is used to generate a.out bootable images from the ELF kernel
	and tries to handle the link_set sections (special initialization 
	tables in the kernel) by appending them to the .text section.

	Unfortunatley, it doesn't take the size of these sections into account
	when writing out the kernel header, so if the text section grows beyond
	a page boundary then the resuling image headers will be incorrect and
	part of the image won't be loaded correctly.  This causes the kernel
	to be dead at boot time.

	THE FAILURE MODE IS SILENT, SO YOU ONLY KNOW THINGS HAVE FAILED WHEN
	YOU TRY TO BOOT THE KERNEL.

	
>How-To-Repeat:
	Build and boot various shark and cats kernels, there's about a 1 in 10
	chance that a kernel will be dead (2.0E GENERIC kernels for shark seem
	to suffer from this problem, I believe the distributed 1.6.2 INSTALL
	for shark is similarly broken).

	The key criteria for the failure is that in the ELF image:

	(sizeof .text + sizeof (link_set*)) mod 4096 > (sizeof .text) mod 4096

	
>Fix:

	Personally, I'm not convinced that the hack introduced into objcopy
	to do this sort of thing is the right approach.  My proposed fix
	would be to ammend the kernel link script to manually incorporate
	each required link section into the script, something like the 
	following (for shark):

	
Index: kern.ldscript
===================================================================
RCS file: /cvsroot/src/sys/arch/shark/conf/kern.ldscript,v
retrieving revision 1.1
diff -u -r1.1 kern.ldscript
--- kern.ldscript       21 Nov 2002 01:38:41 -0000      1.1
+++ kern.ldscript       5 May 2004 08:41:19 -0000
@@ -15,6 +15,25 @@
     *(.stub)
     *(.glue_7t) *(.glue_7)
     *(.rodata) *(.rodata.*)
+    /* Special link sections for kernel data tables.  We put these in the
+       .text section because objcopy can't translate them into a.out object
+       files and get the section boundaries correct.  */
+    . = ALIGN(4);
+    PROVIDE (__start_link_set_pools = .);
+    *(link_set_pools)
+    PROVIDE (__stop_link_set_pools = .);
+    . = ALIGN(4);
+    PROVIDE (__start_link_set_sysctl_funcs = .);
+    *(link_set_sysctl_funcs)
+    PROVIDE (__stop_link_set_sysctl_funcs = .);
+    . = ALIGN(4);
+    PROVIDE (__start_link_set_malloc_types = .);
+    *(link_set_malloc_types)
+    PROVIDE (__stop_link_set_malloc_types = .);
+    . = ALIGN(4);
+    PROVIDE (__start_link_set_evcnts = .);
+    *(link_set_evcnts)
+    PROVIDE (__stop_link_set_evcnts = .);
   } =0
   PROVIDE (__etext = .);
   PROVIDE (_etext = .);

	Using this method and removing the hack from objcopy would mean that
	we'd get link failures if a new link_set sections were added, but
	at least we'd then have a direct failure mode with an obvious fix.

	Jason, however, seems to think differently: see the discussion on
	port-arm circa 2004/04/27.
>Release-Note:
>Audit-Trail:
>Unformatted: