port-atari: Re: NetBSD 1.1B (BOOTX) error report (was Re: help! Working falcon kernel would be nice...)

Subject: Re: NetBSD 1.1B (BOOTX) error report (was Re: help! Working falcon kernel would be nice...)
To: None <goettsch@informatik.tu-muenchen.de>
From: Leo Weppelman <leo@waubel.ahwau.ahold.nl>
List: port-atari
Date: 06/18/1996 23:33:19
Hello Helmar,

> > > then the system hangs irretrievably. In addition to this it now
> > This is _bad_. If I am right, you are the fourth one complaining about
> > this particular problem...
> 
> Now you got the fifth... :-(
> I can agree Dan fully. I got exactly the same problems the last days:
> the original BOOTX-kernel crashed all some minutes, especially when I 
> perform large data transfer over the SCSI-bus (I'll report the commands
> below in detail). Like Dan I wasn't even able only to save my /usr or 
> /home partitions on another msdos-disk. So working with NetBSD makes no 
> sense, but spends many senseless hours without fun :-(. A  
As you are definitely not the first one who has lost some files :-(, I think
the following should be made clear to everyone:

Installing a new kernel is *never* without risk. It doesn't matter if it
is a alpha/beta or 'official' release kernel. This is a fact of (software)
life. It just means that *before* installing a new kernel, you should either
make a backup or be prepared for data loss.
I always try to make reasonably sure that a kernel is save of course. However
due to differences in hardware configurations, of which the current falcon
SCSI problems are a rather striking example :-(, there is always *some*
risk involved.
> 
> 
> Now the error-reports:
> ----------------------
> 
> 
> First a very little thing:
> 	the kernel read the wrong time (NOT date)
> 	from the clock chip of my Falcon
> 
> According to the time, when I boot GEMDOS the time in NetBSD1.1B is 
> about 7 hours erlier. So the last 'date' command delivers
> 	Jun 17 13:04:18
> and after that I reboot in MULTITOS and the time there is
> 	20:15 , 17.6.96
> Strange ?
Indeed, please check if the symlink /etc/localtime points to
/usr/share/zoneinfo/MET or CET (whatever you prefer). If it does, this
must be looked into further.
> 
> 
> Next the boring messages of 
> 	"St-mem pool exhausted, binpatch 'st_pool_size'to get more"
> 
> It destroys the screen when appearing a hundreds of time on commnds like
> 'newfs' or 'fsck' and it seemed that it brakes the speed of the system
> slow. You wrote a short explanation for the low value of _st_pool_size,
> >> (cite from your FAQ)The default (BOOT) kernel supplied has a rather small St-mem pool. This
> is done to enable it to boot in systems having only 4Mb of ram. In fact
> it has a pool just big enough for 2 virtual consoles in ST-mode.
> << (cite end)
> But if it is so, why has the delivered BOOTX kernel 3 (*three*) virtual
> consoles, so this message MUST appear on any machines with any video
> configurations ? (We have discussed that already one year ago, 
> but anyway ... ;-)
There are a few reasons/tradeoffs:
   - The default value is almost *never* correct. It just can't be
     because there are so many different choises for the video mode.
     Therefore I decided for maintainability reasons to keep the st-pool
     size out of the config files.
   - The default just suffices for the BOOT kernel. This is going to be
     the kernel supplied on the boot-floppy. The BOOTX kernel has come
     to life to satisfy the needs of a large group of people that don't
     have enough recources to compile a kernel of their own. The config
     of BOOTX was based roughly on the request I got for pre-compiled
     kernels.
   - As the BOOTX kernel is meant to be used by people who have an installed
     system *and* binpatch is going to be a part of the official 1.2
     distribution I thought tuning to pool size could be left to the user.
> 
> So I had to follow your instructions in your FAQ (no man-page of
> 'binpatch' found on my system!) and issued the command
> 	binpatch -s _st_pool_size -o 8192 -r NNN <path to kernel>
> with NNN = 3871296 
> where I compute NNN from
> 	video-size 	= (1024x768x256col) x 3 ite's	= 2359296 byte
I think the FAQ must be wrong here... I think the correct calculation
should be: (width/8) * height * (color bytes/dot) This becomes:
   (1024/8)*768*1 = 98304 == about 320Kb (including slack)
Anyone has other ideas????
> 			   + 8 kB slack x 3 ite's	=   24000 byte
> 	bounce-buffer	=  16kB x 3 SCSI devices	=   48000 byte
Yeah, this is correct. However I discovered a bug in the bouncing code that
might occaisionally accuire a larger buffer (+/- 64Kb). :-(
> 	floppy		=  18kB x 80 tracks (???)	= 1440000 byte
Only 1 single sector is buffered at most == 512 bytes.
> 							--------------
> 	SUM						= 3871296 byte
This yields about 350Kb total.
> Is this correct ? Or is the value too big ?? 
> (Then, IMHO you have to mention the most upper value for NNN in this FAQ.)
> But anyway as I booted from this binpatched kernel I only got some lines
> of pixel-dust ... :-(
The upper limit is not easily determined, but the kernel needs atleast room
for itself (if you have no TT ram, it must reside in ST-ram), buffer-cache
and program work-space.
> So I had to change to the original kernel and to bear this messages ... ;)
> 
> 
> 3) GEMDOS-partions > 300 MB  don't work on NetBSD 1.1B
> 	==> panic: allocbuf: buffer larger than MAXBUFSIZE
> 	    when write access on this partition
> Is this so ? Is this well-known ? Why does nobody write this in the FAQ
> or anothe suitable place ?? It cost me two day of searching and musing
> and nearly I reinstalled the system and destroyed my NetBSD-partitions
> as I thoght before the netbsd partitions were destroyed. But then I found
> an entry in my logbook about the same error I got last year ... ;)
I need to look this up. I think the cluster sizes gemdos uses for filesystems
of this size won't fit into the buffer cache. It looks configurable however.
> 
> 
> 4) the horrible instable kernel on large data transfers
> 
> All I want to do is to save my installed system (/ /usr and /home part.)
> on another disk.
> I tried this on the following two configurations:

[ ..... ]
> 
> The following command delivers always and reproductable kernel crashes
> like 'jump to zero' or 'MMU fault' or sometimes just kernel hanging with
> no flash lights on both disks:
> 	cd /usr; tar -cvf /tos3/netbsd-u.tar *
Dan also had the some fault, I'm trying to sort it out. Talking of Dan ;-),
he tried kernel not using DMA interrupts on his Falcon and this at least solves
the SCSI problems...
> 
> Config B:  one SCSI disk and one removable media (ZIP)
                                    ^^^^^^^^^^^^^^^^^^^
This is a known problem. It has also been seen on the mac68k (which uses
the same SCSI-driver). Somehow large request fail on these drives while
small transfers have no problems. I'm looking into it. Unfortunately
I don't have a ZIP drive myself.

> May 28 20:22:29 falheg /netbsd: ncrscsi0 : 
> 	Trying to reach Message-out phase, now: 0
> May 28 20:22:29 falheg /netbsd: ncrscsi0 : 
> 	Resetting SCSI-bus (Failure to abort command)
> May 28 20:22:29 falheg /netbsd: sd1(ncrscsi0:6:0): unit attention, 
> 	data = 00 10 01 05 d2 00 00 00 00 00 00 00 00 02 00 01 86
> 
> This error-messages always appear, when I wrote "greater" data 
> (= more than one file, ore files > 1 KB about) on ZIP.
Exactly....


[ ..... ]
> some things in this case I'm ready for this -- but ONLY if someone
> guarantees me at last a stable kernel, that the work which spends also
> some time is not in vain.
As I already mentioned a 'guaranteed' stable kernel is nearly impossible.
Unfortunately, there are always some people in trouble when these kind of
things happen. This is nearly unavoidable. The damage can be minimised
(at least for the group) by posting these problems to the list ASAP.

> 
> So this completes my reports. I know this sounds negative.
> I hope you understand that I'm angry, because I *want* to work with NetBSD
> on my Falcon (to install good programs on Unix, there exist many 
> posibilites, it seemed an interesting hobby for me)
> BUT IT MUST WORK.
I know how it feels to have your filesystems scrambled.... It happened to me
quite a few times. If however nobody dares to try these kernels, there will
never be a stable kernel...
> Dan wrote:
> > > All I want is a stable kernel suitable for a basic 4Mb Falcon, no FPU.MY WORDS (but with FPU, 16 MB Falcon :-)
> 
> Unfortunately our anonymous ftp-server on my working-group is yet not
> installed. but I hope in the next days we get it again. So mail
> me if you want to put kernels (like before one year ago, you remember)
> on it.
I might be needing this offer for the ZIP problems...

Leo.