Subject: Re: scsi disk generic HBA error after reboot
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Dan LaBell <dan4l-nospam@verizon.net>
List: netbsd-help
Date: 04/16/2005 06:59:48
On Apr 15, 2005, at 12:44 PM, Manuel Bouyer wrote:

> On Fri, Apr 15, 2005 at 04:24:34AM -0400, Dan LaBell wrote:
>> I posted about this before with the subject trm0: parity error in 2.0
>> after reboot.
>>
>> I have 2 scsi drives that I plan to combine with ccd.  These drives 
>> are
>> ibm ultrastar 18xp's , they're 80pin, and I'm using 2 80->50pin
>> converters to use with with my
>> Tekram DC395, I had no problems with in 1.6, but with 2.0 I find it
>> works OK on initial power up, but gives me generic HBA error, and
>> "trm0: parity error"  on subsequent boots.
>> I now have both drives installed  (I was waiting on a railkit 
>> --they're
>> too tall to fit in my 3-1/2in bays. ), and started to playing with
>> jumper settings hoping maybe I could stumble on something that worked.
>> Besides finding combo's where it wouldn't work at all
>> in 2.0 (instead of just 2nd boot ) I noticed some differences in dmesg
>> output, less
>> drive info on first boot, on 2nd boot more info, "sync (50.00ns offset
>> 15), 16-bit (40.000MB/s) transfers"   I don't know that much about
>> scsi, some explaination about
>> what is the sync line means might help -- I'd like to be able to 
>> jumper
>> my way around this and there are 2 jumpers related to sync, SP sync 
>> and
>> Dis Ti Sy.
>> Also, can 50 pin do 16bit transfer?
>
> No, and thay is probably your problem. For some reason, after a warm 
> boot
> the driver thinks the adapter is a Ultra-wide, and negotiate 16bit
> transfers with the drive. This won't work accros a 50pin cable, a only
> 8 of the 16 data pins are wired.
Ok, I didn't think so, but trusted the kernel more than myself.

>
> OK, I see the problem. The driver is ignoring the card's model or 
> eeprom
> setting for negotiating sync/wide, because the checks are not at the 
> proper
> place.
> Please try the (untested) attached patch
Cool, thanks.   AND it works!

%dmesg | grep 'trm0\|sd[12]'
trm0 at pci0 dev 11 function 0: Tekram DC395U, DC315/U (TRM-S1040) 
Fast20 Ultra SCSI Adapter
trm0: interrupting at irq 5
scsibus0 at trm0: 8 targets, 8 luns per target
sd1 at scsibus0 target 0 lun 0: <IBM, DXHS18Y, 0430> disk fixed
sd1: 17366 MB, 8154 cyl, 20 head, 218 sec, 512 bytes/sect x 35566480 
sectors
sd1: sync (50.00ns offset 15), 8-bit (20.000MB/s) transfers
sd2 at scsibus0 target 1 lun 0: <IBM, DXHS18Y, 0430> disk fixed
sd2: 17366 MB, 8154 cyl, 20 head, 218 sec, 512 bytes/sect x 35566480 
sectors
sd2: sync (50.00ns offset 15), 8-bit (20.000MB/s) transfers

and I can do disklabel /dev/sd1 and /dev/sd2 w/o getting device not 
configured.

These drives are configured not spin up on power up, they came that 
way, and not wanting
to tax my powersupply I left them configured them like that, in order 
to get them to spin up on power on, I have to to put jumpers /on/ 
auto-start and on "Del Start", maybe the save money, less gates, or 
cheaper ones... anyway,  when configured to power up right away. 2.0 
(w/o patch) it wouldn't work on any boot, first or otherwise... I 
haven't tested that w/ patch yet, though I have mounted and fsck'ed -f 
all file systems, both on first powerup and reboot, and all seems well, 
Thanks!

Curious, when the drive spins up on the initial boot, the kernel prints 
out less
info,  actually <IBM, DXHS, 0430> disk fixed and no line about 
geometry, was it just skipping the negotiation because it didn't have 
enough info about the drive?  And
on subsequent boots it saw the drive as capable ( though controller is 
not ) and
and negotiates a 16bit transfer that the controller couldn't do? Any of 
these features not in 1.6?


> -- 
> Manuel Bouyer <bouyer@antioche.eu.org>
>      NetBSD: 26 ans d'experience feront toujours la difference
> --
> <diff>