Subject: Boot NetBSD CD on Beige G3 OF 2.4 should be possible (after fix)
To: None <port-macppc@netbsd.org>
From: =?ISO-8859-15?Q?Christian_M=FCller?= <cmue81@gmx.de>
List: port-macppc
Date: 11/02/2005 16:43:58
Ok,


some of you have described the

*Warning, unexpected short transfer 0/10240*

problem before.  The problem is that OF can read and load ofwboot.xcf 
from an iso9660 cd w/o a problem (having set OF variable boot-device to 
ide1/@0:,\ofwboot.xcf for example), but when ofwboot of the NetBSD 
project is given the kernel to load (via OF variable boot-file, e.g. 
ide1/@0:1,/NETBSD.MACPPC) it will fail, resulting in looped printing of 
the error message above.

I've looked at the code now for a while and seem to understand how 
ofwboot.xcf looks for the kernel (which resulted in a misunderstanding 
on the way that made me wrongly post: Bug in ofdev.c? question on this 
list, shoot *g*).

The relevant files are (in order of appearance):
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/macppc/stand/ofwboot/boot.c?rev=1.18&content-type=text/x-cvsweb-markup
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/lib/libsa/loadfile.c?rev=1.22.2.3&content-type=text/x-cvsweb-markup
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/lib/libsa/open.c?rev=1.24&content-type=text/x-cvsweb-markup
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/macppc/stand/ofwboot/ofdev.c?rev=1.15&content-type=text/x-cvsweb-markup

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/lib/libsa/ufs.c?rev=1.45&content-type=text/x-cvsweb-markup
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/lib/libsa/ustarfs.c?rev=1.24&content-type=text/x-cvsweb-markup
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/lib/libsa/cd9660.c?rev=1.18&content-type=text/x-cvsweb-markup


Here goes:
* In boot.c//main the function loadfile(kernels[i], marks, LOAD_KERNEL) 
gets called, with kernels[i] being the string you supplied via boot-file 
or a default filled in by ofwboot code.
* In loadfile.c//loadfile the function open(fname, 0) is called to get a 
filedescriptor fd which will then be given to fdloadfile.  fname is the 
same pointer kernels[i] points to.
* In open.c//open the function devopen(f, fname, &file) is called, which 
is device specific code again.  fname is still unprocessed (== 
kenrels[i]).  devopen uses OF_finddevice and other functions to check if 
OF knows the device you supplied in your boot-file string. if no device 
is given, the device is set to bootdev - the device ofwboot.xcf was 
loaded from.
================================================================================================================================================================
ofdev.c//devopen also does the partition handling (together with 
ofdev.c//filename) - in a rather obscure way, it expects letters from 
"a" for the first, "b" for the second and so on (OF uses numbers) - 
however doing so will cripple the filepath, which I will show with the 
relevant code given [[ from 
src/sys/arch/macppc/stand/ofwboot/ofdev.c//devopen() ]].  Assume 
boot-file is set to   ide1/ata-disk@0:a/path/to/kernel.  If you use 
ide1/ata-disk@0:a,/path/to/kernel  the filename function will not find 
your partition (see ^^^ below)!!

	cp = filename(fname, &partition);
	if (cp) {
		strcpy(buf, cp);
		*cp = 0;
	}
	if (!cp || !*buf)
		return ENOENT;
	if (!*fname)
		strcpy(fname, bootdev);
	strcpy(opened_name, fname);
	if (partition) {
		cp = opened_name + strlen(opened_name);
		*cp++ = ':';
		*cp++ = partition;
		*cp = 0;
	}
	if (*buf != '/')
		strcat(opened_name, "/");
	strcat(opened_name, buf);
	*file = opened_name + strlen(fname) + 1;

cp points at the slash after the partition letter in 
"ide1/ata-disk@0:a/path/to/kernel" and partition contains 'a' after 
filename() is done. After the following if is done, buf contains 
"/path/to/kernel" and fname contains "ide1/ata-disk@0:a" (the slash was 
replaced by the string-terminator 0 with *cp=0).  After if (partition) 
is done opened_name contains "ide1/ata-disk@0:a:a", then buf gets 
appended, so opened_name contains "ide1/ata-disk@0:a:a/path/to/kernel".  
Now it really goes wrong, fname was not changed, so 
opened_name+strlen(fname)+1 lets (*file) point to "a/path/to/kernel", 
NOT "/path/to/kernel"

If you use numbers instead of letters (ide1/ata-disk@0:5/path/to/kernel) 
this bug won't affect you, but the code later on will, regardless of 
what number you used, always use partition zero, as partition will not 
be set by filename() function!

		} else {
			part = partition ? partition - 'a' : 0;
			ofdev.partoff = label.d_partitions[part].p_offset;
		}


================================================================================
^^^ [[ from src/sys/arch/macppc/stand/ofwboot/ofdev.c//filename() ]]

			if (!strcmp(devtype, "block")) {
				/* search for arguments */
				for (cp = lp;
				    --cp >= str && *cp != '/' && *cp != ':';)
					;
				if (cp >= str && *cp == ':') {
					/* found arguments */
					for (cp = lp;
					    *--cp != ':' && *cp != ',';)
						;
					if (*++cp >= 'a' &&
					    *cp <= 'a' + MAXPARTITIONS)
						*ppart = *cp;
				}
			}
			return lp;

When the code reaches the statement above lp points at the forwardslash 
succeeding the "," in "ide1/@0:a,/path/tokernel"  After the first 
for-loop cp points at the ":", the if is true, so the second for-loop 
lets cp point to "," after it is done.  So now the if that should parse 
the partition letter is operating on (*lp)==(*++cp)=='/' which is 
unintended --- the __solution__ is to completely delete the second for 
loop and use ":" as the only delimiter for partition:

			if (!strcmp(devtype, "block")) {
				/* search for arguments */
				for (cp = lp;
				    --cp >= str && *cp != '/' && *cp != ':';)
					;
				if (cp >= str && *cp == ':') {
					/* found arguments */
					if (*++cp >= 'a' &&
					    *cp <= 'a' + MAXPARTITIONS)
						*ppart = *cp;
				}
			}
			return lp;

================================================================================
================================================================================================================================================================



* In open.c//open the ofdev.c//devopen() function is done now and did 
the right thing (tm), since we used "ide1/@0:0/path/to/kernel".  Our 
struct open_file f and char *file are properly set up, let's get down to 
the part where the open function does the following:

	besterror = ENOENT;
	for (i = 0; i < nfsys; i++) {
		error = FS_OPEN(&file_system[i])(file, f);
		if (error == 0) {
			f->f_ops = &file_system[i];
			return (fd);
		}
		if (error != EINVAL)
			besterror = error;
	}
	error = besterror;

filesystem and nfsys has been set up in ofdev.c//devopen, still:

		file_system[0] = file_system_ufs;
		file_system[1] = file_system_ustarfs;
		file_system[2] = file_system_cd9660;
		file_system[3] = file_system_hfs;
		nfsys = 4;

So the lib standalone open function tries to be real useful in supplying 
the user, that wants to load the kernel from the given device, with four 
possible filesystems to read from at this early stage.  Unfortunately 
this works only if every single FS_OPEN(&file_system[i])(file, f) 
routine returns.  The cd9660 filesystem will be tried after ustarfs, but 
it doesn't have a chance since in 
src/sys/lib/libsa/ustarfs.c//real_fs_cyliner refuses to give up on this 
while loop:

	while(xferrqst > 0) {
#if !defined(LIBSA_NO_TWIDDLE)
		twiddle();
#endif
		for (i = 0; i < 3; ++i) {
			e = DEV_STRATEGY(f->f_dev)(f->f_devdata, F_READ,
			    seek2 / 512, xferrqst, xferbase, &xfercount);
			if (e == 0)
				break;
			printf("@");
		}
		if (e)
			break;
		if (xfercount != xferrqst)
			printf("Warning, unexpected short transfer %d/%d\n",
				(int)xfercount, (int)xferrqst);
		xferrqst -= xfercount;
		xferbase += xfercount;
		seek2    += xfercount;
	}

So basically we want to read 10240 bytes with ustarfs code from an 
iso9660 filesystem - since it properly fails to work (we transfered 0 
bytes in xfercount) this loop will only end hitting the power button.  A 
quick hack might be trying to read from an iso9660 filesystem before 
ustarfs (of course, if the iso9660 code doesn't return you will have the 
same trouble when trying to boot the kernel from ustarfs), a good hack 
should probably make ustarfs code return from the loop...


Regards,
Christian