Subject: (Still) NCR card + Quantum VP32210 SCSI Drive probs
To: None <current-users@NetBSD.ORG>
From: Davyd Norris <root@eolas.fcollins.com.au>
List: current-users
Date: 08/21/1995 11:15:03
Howdy again,

I am still having problems getting a new kernel going with the
latest sup of -current.  Here is the guts of my previous message to
the list:

>Just supped the latest sources and attempted to rebuild a kernel for
>my system and I am getting errors at boot time on the SCSI to disk
>transfers.

>I built a kernel about a week ago and booted from it with no
>problems, but ended up scrapping that kernel and trying to get MH up
>and running, so I did not build the rest of the system.

I am currently running a July 3rd build of -current, which was about
the time that pppd broke, and so I commented out any ppp support in
the kernel.  Now my people are yelling for it, so I need it back.  I
also want to get my 'standard' lpt0 (hardware thinks it is at lpt1's
port!) up and running.

The last kernel that I got running was an X insecure kernel
(round the first week of August - just before the scsi quirks code got
added) that I do not want to run now since I configured the X portal
code.

>System is as follows:
>
>Pentium 90 PCI bus - 32Mbyte RAM
>NCR PCI SCSI card (53c810 based PCI-SC200 card)
>Quantum VP32210 Fast SCSI-II 2.02 Gbyte HDD

>That's about all you need to know for this little problem.

>The system probes all go OK, and identify all the drives, including
>the Quantum and an external NEC 4Xe CD-ROM.  However, it appears
>that when the disk is probed for sizes etc. , it goes berserk with
>the following messages:

>extraneous data discarded
>COMMAND FAILED (9 0) ncr(0:0:0)
>.
>.
>.
>(repeated so rapidly that I dont know whether there is any other
>message at the start)

>It appears to be receiving the XE_EXTRA_DATA error and barfing, and
>does not write anything to the logs - this is why I can't be sure of
>what is said.

>As I said before, I don't think it is a problem with the PCI card,
>because it inits OK and finds all its devices.  It seems to be when
>it talks to the drive.

New developments:

supped the latest sources because I read about some changes in the
ncr and pci code and tried again, this time with in-kernel debugging
turned on.  This allowed me to see what was happening before the
crash.

The system correctly ID's all devices and gets their sizes and types
including the Quantum HDD.  It gets to the line:

biomask 4040 netmask 1800 ttymask 1a

and then:

vm_fault(f86f0700,0,1,0) -> 1

kernel: page fault trap, code = 0
Stopped at _nqsrv_getlease+0x148: movl 0(%eax),%ebx

If kernel debugging is turned off I get messages as previously
detailed.

I am booting a new kernel with a full July 3rd build.  Could this
have anything to do with the old/new mount code and a new kernel?
Does anyone have any idea where I should start looking?

I have not done much kernel debugging, so I am at a bit of a loss as
to where to start.

AdvThanksAnce

Dave.