Subject: Re: hp700 boot-from-disk troubles
To: Jochen Kunz <jkunz@unixag-kl.fh-kl.de>
From: Chuck Silvers <chuq@chuq.com>
List: port-hp700
Date: 05/17/2005 22:30:56
On Tue, May 17, 2005 at 11:19:18AM +0200, Jochen Kunz wrote:
> On Mon, 16 May 2005 07:50:27 -0700
> Chuck Silvers <chuq@chuq.com> wrote:
> 
> > I've found more problems with booting from disk on hp700.  it looks
> > like the same problem that we originally saw on the older machines is
> > still there on the newer machines, namely that in some kernels, the
> > scsipi_xfer pool gets corrupted.
> I slowly start to belive that the SCSI / DMA problems are olny a symptom
> and the actual problem is much deeper. The problems detecting the FPU
> may be an other symptom of this deep problem. I got the "fabricating a
> geometry" message, typical for the disk boot problem, even when net
> booted:
> sd0 at scsibus0 target 2 lun 0: <SEAGATE, ST34572WS, HP00> disk fixed
> sd0: 4095 MB, 6300 cyl, 8 head, 166 sec, 512 bytes/sect x 8388314 sectors
> sd0: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
> sd0: fabricating a geometry
> boot device: tlp0
> root on tlp0

yes, I see this behaviour too.  it's the same type of corruption, where
some memory is zeroed.  in the case I described it's the scsipi_xfer
"cmdstore", in the "fabricating a geometry" case it's the sense data
on the stack.


> It adds a new view to the problem if the real problem causes the FPU
> detection to fail. The FPU detection is done very early in startup, even
> before the kernel goes virtual.

it's unclear whether this is related to the memory corruption.
in the boot-from-net root-on-disk case, the corruptions also occur
but PDC_COPROC succeeds.


> Regarding to your suspicion that xxboot causes the problem, because
> xxboot is not involved when netbooting, I have an idea for booting from
> disk without xxboot. This would be a good test for CDROM booting too but
> my time is tight during the next two days.

I actually tried this with the install image that I posted about before,
but I didn't pay attention to what happened with PDC_COPROC.
I'll let you take a turn on this next.  :-)

-Chuck