Subject: Re: Re: Problems booting Blade150
To: Joel CARNAT <joel@carnat.net>
From: None <franceschini_a@tiscali.it>
List: port-sparc64
Date: 06/10/2005 15:42:55
On Fri, Jun 10, 2005 at 10:42:46AM +0200, Joel CARNAT wrote:
> On Fri, Jun 10 2005 - 10:22, Martin Husemann wrote:
> > On Fri, Jun 10, 2005 at 09:23:49AM +0200, franceschini_a@tiscali.it wrote:
> > 
> > > 	I tried many different versions up to 2.0.2.
> > 
> > Your problem sounds like interrupt issues - could you try a 3.0_BETA?
> > There should be builds available on ftp.netbsd.org and it's mirrors,
> > in /pub/NetBSD-daily/netbsd-3 IIUC, but there is not much there yet.
> > 
> 
>  looks like there is no sparc64 snap yet :(
>  franceschini_a, you can download an iso I used to install my Ultra5 (May, 10th).
>  I didn't compile X but it should be enough to check if 3.0_BETA is OK for your SunBlade.
>  http://tumfatig.net/NetBSD-3.0_BETA.iso (84Mo)

Hi Joel,

	I tried your kernel (thanks) but the problem is still the same :(

	This is what happens when I try netbooting:

---------------
The usual lost-interrupt stuff on the ide controller...
---------------
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 512 tc_skip: 0
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 512 tc_skip: 0
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0
wd0: drive supports PIO mode 4aceride0:0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0
, DMA mode 2aceride0:0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0
, Ultra-DMA mode 5 (Ultra/100)aceride0:0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0

wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
cd0(aceride0:0:1): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 512 tc_skip: 0
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 512 tc_skip: 0
aceride0:0:0: lost interrupt
        type: ata tc_bcount: 0 tc_skip: 0
--------------
now it's trying to get un IP address
--------------
root on gem0
nfs_boot: trying DHCP/BOOTP
gem0: device timeout
nfs_boot: timeout...
gem0: device timeout
nfs_boot: timeout...
gem0: device timeout
nfs_boot: timeout...
gem0: device timeout

---------
And this is what happens dhcpd side
--------
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
DHCPDISCOVER from 00:03:ba:68:5a:05 via bge0
DHCPOFFER on 192.168.144.250 to 00:03:ba:68:5a:05 via bge0
---
And this is what 'tcpdump ether host 00:03:ba:68:5a:05' tell us
As you may see DHCP is responding to the requests but ...
---
15:15:56.220689 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from netbsd1, length: 548
15:15:56.227416 IP nfsserver.bootps > netbsd1.bootpc: BOOTP/DHCP, Reply, length: 300
15:15:57.221150 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from netbsd1, length: 548
15:15:57.221355 IP nfsserver.bootps > netbsd1.bootpc: BOOTP/DHCP, Reply, length: 300
15:15:59.222230 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from netbsd1, length: 548
15:15:59.222437 IP nfsserver.bootps > netbsd1.bootpc: BOOTP/DHCP, Reply, length: 300

-----
.. and now is trying RARP/Bootparam
------
rp who-is netbsd1 tell netbsd1
15:16:30.548778 rarp reply netbsd1 at netbsd1
15:16:31.048744 rarp who-is netbsd1 tell netbsd1
15:16:31.048970 rarp reply netbsd1 at netbsd1
15:16:31.548964 rarp who-is netbsd1 tell netbsd1
...
...

It's like netbsd kernel ignored the reply to its questions...

Notice that the kernel is netbooted correctly via rarp/tftp, so it's not a network
problem nor a rarp configuration problem.
In fact , as you may see, when the first stage boot process (OpenBoot) tries
to netboot all works fine:

15:22:54.835928 rarp who-is netbsd1 tell netbsd1
15:22:54.836185 rarp reply netbsd1 at netbsd1
15:22:54.837764 IP netbsd1.32768 > 255.255.255.255.tftp:  17 RRQ "C0A890FA" octet 
15:22:54.840892 IP nfsserver.64181 > netbsd1.32768: UDP, length: 516
15:22:54.842632 IP netbsd1.32768 > nfsserver.64181: UDP, length: 4
15:22:54.842659 IP nfsserver.64181 > netbsd1.32768: UDP, length: 516
15:22:54.843150 IP netbsd1.32768 > nfsserver.64181: UDP, length: 4
...
..

And when ofwboot boots up it can load the kernel via NFS without problem (so NFS must be
configured correctly).

At the end of the process I get:

nfs_boot: trying RARP (and RPC/bootparam)
revarp failed, error=51
no file system for gem0
cannot mount root, error = 79
root device (default gem0): 

...
Any Idea?

Thanks.