Subject: problems installing netbsd 1.5.1 on a raq
To: None <port-cobalt@netbsd.org>
From: None <marius@alchemy.franken.de>
List: port-cobalt
Date: 08/23/2001 02:47:49
hi

i recently ran into several problems installing netbsd 1.5.1 on a raq(1),
originally equiped with raq2 software
the label on the back of the raq reads:
S/N 1C02AA8500006
R15 C02 NIE
"RaQE"
and the firmware tells that it was built on Mon Aug 24 14:44:00 PDT 1998

i wasn't able to netboot it as described in the faq, when i do a
bfd /netbsd.gz nfsroot=/nfsroot
on the raq, dhcpd just shows a loop of BOOTREQUEST/BOOTREPLY messages:
# dhcpd -f -d
BOOTREQUEST from 00:10:e0:00:18:7f via ep0
BOOTREPLY for 217.5.208.61 to cobalt (00:10:e0:00:18:7f) via ep0
BOOTREQUEST from 00:10:e0:00:18:7f via ep0
BOOTREPLY for 217.5.208.61 to cobalt (00:10:e0:00:18:7f) via ep0
...

after some time, the raq gives up
this behaviour correspondes to the following post:
http://ftp.cobaltnet.com/lists/cobalt-developers/msg00253.html

as i am quite certain that my dhcp/nfs setup is correct and as i didn't
find a answer nor solution to the mentioned post, i think this is a bug
in this model's firmware version

i succeeded in netbooting this raq by copying the netbsd kernel onto the
first ext2 partition (got it on the raq via ftp under the installed linux)
and invoking a
bfd /boot/netbsd.gz -a
and selecting tlp0 as root device

the second problem was that the raq froze as soon as i tried to access
wd0, e.g. by mounting a ext2fs or calling disklabel wd0
first i thought that something was wrong with the nfs exported /dev but
it turned out that this was caused by the (oviously still unresolved )
dma bug mentioned some time ago on this list

iirc, i got the "pciide0:0:0: lost interrupt" message displayed when i
didn't access wd0 and the raq was idle for a while

to walk around it, i netbooted
ftp://ftp.netbsd.org/pub/NetBSD/arch/cobalt/netbsd15-noatadma.gz
and built a 1.5.1 kernel with:
wd*             at pciide? channel ? drive ? flags 0x0ff0

while trying to re-partition the disk, i hit the next problem:
# fdisk wd0                                                             
panic: TLB out of universe: ksp was 0xc9145f80                                  
Stopped at      0x8011c238:     lw      v0,392(s2)                              
db> t                                                                           
trap+3c8 (10400002,2462ffe8,d,0) ra 28421000 sz 80                              
PC 0x28421000: not in kernel space                                              
0+28421000 (10400002,2462ffe8,d,0) ra 0 sz 0                                    
User-level: curproc NULL

to finally get netbsd onto the raq, i put the disk into a pc deleted the
4 ext2 partitions, added one 16mb ext2 parition for the netbsd kernel and
did a mke2fs with a linux rescue disk (mke2fs refused to work under freebsd)
afterwards put the disk back into the raq, netbooted and added the netbsd
labels with disklabel -e -I wd0
disklabel wd0 now looks like:
8 partitions:                                                                   
#        size   offset     fstype   [fsize bsize cpg/sgs]                       
  a:   262144    32130     4.2BSD     1024  8192    16   # (Cyl.   31*- 291*)   
  b:   524445 19519635       swap                        # (Cyl. 19364*- 19884) 
  c: 20011950    32130     unused        0     0         # (Cyl.   31*- 19884)  
  d: 20044080        0     unused        0     0         # (Cyl.    0 - 19884)  
  e:    32067       63 Linux Ext2        0     0         # (Cyl.    0*- 31*)    
  f:  2097152   294274     4.2BSD     1024  8192    16   # (Cyl.  291*- 2372*)  
  g: 12933905  2391426     4.2BSD     1024  8192    16   # (Cyl. 2372*- 15203*) 
  h:  4194304 15325331     4.2BSD     1024  8192    16   # (Cyl. 15203*- 19364*

i guess disklabel has added a second partition to the disk to store the
labels in, at least i didn't and i can't look at the fdisk output as it
still causes a panic and i'm to lazy to put the disk back into a pc to
see if disklabel added a partition

during the rest of the installation process i got a unclean /var filestem
(wd0f and only this one!) 2 or 3 times at boot time with a previous mount
of all filesystems and a _clean_ shutdown
i think this is triggerd by going rc_configured=NO/YES forth and back and/
or nfsboot with mounting the labels on the disk afterwards
i wasn't able to reproduce this at will and it didn't occure after
finishing the installation and configuration


this post is mainly for information in order to prevent others from spending
hours searching errors in their dhcp/nfs setup as i did

looking at the dma bug, the fdisk caused panic and other fundamental things
that should work but actually don't like Shaun Jurrens' tlp0 problem i
think the netbsd/cobalt port should still be in 'experimental' state, rather
than 'stable'

marius