Subject: Re: New IDE controller.
To: Richard Rauch <rauch@rice.edu>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-i386
Date: 02/21/2002 19:58:23
On Wed, Feb 20, 2002 at 08:51:21PM -0600, Richard Rauch wrote:
> On Wed, 20 Feb 2002, Richard Rauch wrote:
> 
> > Some time ago, I was reporting terrible performance problems with my IDE
> > hard drive under the on-board VIA IDE controller.  This afternoon, I
> > picked up a $20 controller from SIIG.  Under NetBSD dmesg, it probes as:
>  [...]
> > I've tried turning off all of the PCI and IDE frills in my BIOS, but I
> > can't get it to behave any better.  I'm going to go try running it under
> > GNU/LINUX and will post a followup.  In the meantime...
> 
> GNU/LINUX panics (apparently in a reliable spot; I had it panic twice when
> trying to mount ``vfs'', I believe; I didn't write down the details).
> 
> Further playing revealed this:
> 
> If I boot from a NetBSD CD, I can access the partitions.  Sort of.  When
> doing a large dd read test, NetBSD gave me a lost interrupt error and
> seemed to freeze.
> 
> Feeling experimental, I used the CD's bootloader to reboot, but told it to
> boot wd0a:netbsd.  This got me a ways into the boot process, until (around
> the time it started sshd, I believe), it gave me another lost interrupt.
> It seemed to die a hard death, so I recorded the interrupt, killed the
> power, unplugged it, swapped the cable back to the built-in IDE, and
> brought the system back up.  (It took rather a long time to finish fsck on
> the /usr partition, but it's all happy now, as far as I can tell.)
> 
> Anyway, here's the lost interrupt information, if it helps any:
> 
>  /~~~ kernel message
> 
> pciide1:0:0: lost interrupt
>         type: ata tc_bcount 8192 tc_skip: 0
> pciide1:0:0: bus_master DMA error: missing interrupt, status 0x21
> 
>  \___ kernel message
> 
> (a) Would NetBSD have had any chance to recover if I let it?  It didn't
> actually *say* that it was panicking, but it also seemed pretty dead.

If it's the problem I'm thinking about, it's really dead because the
motherboard is locked up at hardware level (I suspect the PCI bus is
locked, and block RAM access from the CPU)

> 
> (b) Are there known problems with the ``Triones/Highpoint HPT366/370 IDE
> Controller''?  Bear in mind that the NetBSD bootloader seemed to die with
> it, too, before actually loading the kernel.  Bad hardware?  Or could
> updating the bootloader and booting a -current kernel fix the problems?
> (Also bear in mind that the bootloader *worked* when loaded from CD from
> the old controller, in a 1.5.2 bootable CD; the hard disk is using the
> NetBSD boot-selector, really, so it may not be using the same software to
> boot the kernel.)

I found a bug some time ago in the interrupt handling of the HPT. This is
fixed in -current and 1.5.3_ALPHA since at last a week (I don't remember
the exact date). You can fix it by using a recent 1.5.3_ALPHA kernel,
or work around it by using only one channel of the HPT and making sure it
doesn't share interrupt with something else.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--