Subject: kern/5916: running a kernel with memory disk support spoils subsequent kernel on sparc
To: None <gnats-bugs@gnats.netbsd.org>
From: None <jbernard@ox.mines.edu>
List: netbsd-bugs
Date: 08/05/1998 13:54:16
>Number:         5916
>Category:       kern
>Synopsis:       running a kernel with memory disk support spoils subsequent kernel on sparc
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug  5 13:05:00 1998
>Last-Modified:
>Originator:     Jim Bernard
>Organization:
	Speaking for myself
>Release:        July 26, 1998
>Environment:
System: NetBSD spud 1.3F NetBSD 1.3F (SPUD) #0: Tue Jul 28 15:43:32 MDT 1998 jim@roo:/home/local/compile/sys/arch/sparc/compile/SPUD sparc


>Description:
	If a kernel with memory disk support is booted on a sparc (1),
	later boots of other kernels exhibit strange behavior, at least
	with respect to execution of shell scripts.  A workaround is to
	cycle power before booting the other kernel (it is not sufficient
	to do a PROM "reset").

	The specific failure seen is that scripts (including shell startup
	scripts) evidently attempt to execute some (but not necessarily
	all) variable-setting statements as commands (seen with both sh
	and bash), which then fail.

	This is likely to be more than a little disturbing to users who
	install using a floppy boot disk and then expect the subsequently
	installed system to work correctly (unless they happen to turn off
	the power before booting the installed system).

>How-To-Repeat:
	* For the system on which this was observed, dmesg reports
	  (though I expect the problem is not unique to this system):
real mem = 8335360
avail mem = 6238208
using 101 buffers containing 413696 bytes of memory
bootpath: /sbus0/esp0/sd@0,0
mainbus0 (root): Sun 4/60
cpu0 at mainbus0: MB86900/1A or L64801 @ 20 MHz, WTL3170/2 FPU
cpu0: 64K byte write-through, 16 bytes/line, sw flush: cache enabled
memreg0 at mainbus0 ioaddr 0xf4000000
clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
timer0 at mainbus0 ioaddr 0xf3000000 ipl 10 delay constant 7
auxreg0 at mainbus0 ioaddr 0xf7400000
zs0 at mainbus0 ioaddr 0xf1000000 ipl 12 softpri 6
zstty0 at zs0 channel 0
zstty1 at zs0 channel 1
zs1 at mainbus0 ioaddr 0xf0000000 ipl 12 softpri 6
kbd0 at zs1 channel 0 (console)
ms0 at zs1 channel 1
fdc0 at mainbus0 ioaddr 0xf7200000 ipl 11 softpri 4: chip 82072
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
audioamd0 at mainbus0 ioaddr 0xf7201000 ipl 13 softpri 4
audio0 at audioamd0
sbus0 at mainbus0 ioaddr 0xf8000000: clock = 25 MHz
dma0 at sbus0 slot 0 offset 0x400000: rev 1
esp0 at sbus0 slot 0 offset 0x800000 level 3: ESP100, 25MHz, SCSI ID 7
scsibus0 at esp0: 8 targets
sd0 at scsibus0 targ 3 lun 0: <HITACHI, DK515C, CP15> SCSI1 0/direct fixed
sd0: 639MB, 1361 cyl, 14 head, 68 sec, 512 bytes/sect x 1309896 sectors
st0 at scsibus0 targ 4 lun 0: <ARCHIVE, VIPER 1500 21247, 2.2G> SCSI2 1/sequential removable
st0: drive empty
le0 at sbus0 slot 0 offset 0xc00000 level 5: address 08:00:20:08:b8:e7
le0: 8 receive buffers, 2 transmit buffers
bwtwo0 at sbus0 slot 3 offset 0x0 level 7: SUNW,501-1455, 1152 x 900 (console)
bwtwo0: attached to /dev/fb
root on sd0a dumps on sd0b
root file system type: ffs

	* Boot any kernel containing memory disk support (with or without
	  an actual ramdisk image present); an INSTALL kernel will do (I
	  can provide a floppy image, compiled from July 26, 1998 sources),
	  or just add the MD support to your usual kernel config file by
	  adding:

options 	MEMORY_DISK_HOOKS
options 	MEMORY_DISK_IS_ROOT	# force root on memory disk
options 	MEMORY_DISK_SERVER=0	# no userspace memory disk support
options 	MINIROOTSIZE=3168	# 1.44M * 1.1

pseudo-device	md		1	# memory disk device (ramdisk)

	  and build a kernel with that.

	* Shut down the MD kernel and boot a normal kernel without cycling
	  power (i.e., do a prom "reset" or just "boot").

	* Log in as root with shell /bin/csh, no .cshrc or .login files
	  present (they can be present, but removing them simplifies the
	  situation).  I also moved .profile out of the way.

	* Try to execute a script, e.g.:
	    #! /bin/sh
	    date

	* Watch that fail with, e.g.:

	    SHELL=/bin/csh: Can't open SHELL=/bin/csh
	  or
	    HOME=/root: Can't open HOME=/root

	  (The exact error may vary, depending on unknown factors, but
	  it always refers to an environment variable assignment, and it
	  always appears to be complaining about opening the assignment.)

	* To show what's happening more clearly, here are ktrace outputs
	  from execution of the script above (named "xxx") after booting
	  a regular kernel from a power-off state, and after booting the
	  same kernel after having previously booted a kernel generated
	  from the same config file but with the MD-support lines shown
	  above added.  The first difference is flagged.

[After boot from power-off state:]
   210 ktrace   RET   ktrace 0
   210 ktrace   CALL  execve(0xeffffce7,0xeffffcb8,0xeffffcc0)
   210 ktrace   NAMI  "./xxx"
   210 ktrace   NAMI  "/bin/sh"
   210 sh       EMUL  "netbsd"
   210 sh       RET   execve JUSTRETURN
   210 sh       CALL  getpid
   210 sh       RET   getpid 210/0xd2
   210 sh       CALL  geteuid
   210 sh       RET   geteuid 0
   210 sh       CALL  __sysctl(0xeffffa00,0x2,0x5dd30,0xeffff9fc,0,0)
   210 sh       RET   __sysctl 0
   210 sh       CALL  break(0x5e9ac)
   210 sh       RET   break 0
   210 sh       CALL  break(0x5effc)
   210 sh       RET   break 0
   210 sh       CALL  break(0x5fffc)
   210 sh       RET   break 0
   210 sh       CALL  open(0xeffffce8,0,0x3d)
			   ^^^^^^^^^^
   210 sh       NAMI  "./xxx"
   210 sh       RET   open 3
   210 sh       CALL  fcntl(0x3,0,0xa)
   210 sh       RET   fcntl 10/0xa
   (remainder omitted)

[After booting same kernel after previously running MD kernel:]
   237 ktrace   RET   ktrace 0
   237 ktrace   CALL  execve(0xeffffce7,0xeffffcb8,0xeffffcc0)
   237 ktrace   NAMI  "./xxx"
   237 ktrace   NAMI  "/bin/sh"
   237 sh       EMUL  "netbsd"
   237 sh       RET   execve JUSTRETURN
   237 sh       CALL  getpid
   237 sh       RET   getpid 237/0xed
   237 sh       CALL  geteuid
   237 sh       RET   geteuid 0
   237 sh       CALL  __sysctl(0xeffffa00,0x2,0x5dd30,0xeffff9fc,0,0)
   237 sh       RET   __sysctl 0
   237 sh       CALL  break(0x5e9ac)
   237 sh       RET   break 0
   237 sh       CALL  break(0x5effc)
   237 sh       RET   break 0
   237 sh       CALL  break(0x5fffc)
   237 sh       RET   break 0
   237 sh       CALL  open(0xeffffcf9,0,0x3d)
			   ^^^^^^^^^^
   237 sh       NAMI  "SHELL=/bin/csh"
   237 sh       RET   open -1 errno 2 No such file or directory
   237 sh       CALL  break(0x60ffc)
   237 sh       RET   break 0
   237 sh       CALL  write(0x2,0x60000,0x2a)
   237 sh       GIO   fd 2 wrote 42 bytes
       "SHELL=/bin/csh: Can't open SHELL=/bin/csh
       "
   237 sh       RET   write 42/0x2a
   237 sh       CALL  exit(0x2)

>Fix:
	None known, but cycling power is a workaround.
>Audit-Trail:
>Unformatted: