Subject: port-i386/10060: Error on ncr driver
To: None <gnats-bugs@gnats.netbsd.org>
From: None <kivinen@ssh.fi>
List: netbsd-bugs
Date: 05/06/2000 21:03:14
>Number:         10060
>Category:       port-i386
>Synopsis:       When doing some scsi commands the NCR scsi driver hangs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 06 21:04:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Tero Kivinen
>Release:        NetBSD-current 2000-04-24
>Organization:
SSH Communications Security
>Environment:
System: NetBSD kahva.ssh.fi 1.4X NetBSD 1.4X (KAHVA) #0: Thu Apr 27 09:28:18 EEST 2000 ztk@kahva.ssh.fi:/usr/src/sys/arch/i386/compile/KAHVA i386

ncr0 at pci0 dev 15 function 0: ncr 53c875 fast20 wide scsi
ncr0: interrupting at irq 5
ncr0: minsync=12, maxsync=137, maxoffs=16, 128 dwords burst, large dma fifo
ncr0: single-ended, open drain IRQ driver, using on-chip SRAM
ncr0: restart (scsi reset).
scsibus0 at ncr0: 16 targets, 8 luns per target
...
scsibus0: waiting 2 seconds for devices to settle...
ss0 at scsibus0 target 4 lun 0: <Nikon, LS-2000, 1.31> SCSI2 6/scanner removable
cd1 at scsibus0 target 6 lun 0: <TEAC, CD-R56S, 1.0E> SCSI2 5/cdrom removable
probe(ncr0:6:1): 10.0 MB/s (100 ns, offset 15)
...

>Description:

	I am writing linux sane (Scanner Access Now Easy) emulation
	driver for NetBSD so I can run the linux version of vuescan
	(http://www.hamrick.com/) in my machine. The driver is ready,
	but when the vuescan starts it does some scsi commands and
	after few the ncr driver returns error and after that the ncr
	driver will return error for all commands for that device.
	This seems to be ncr driver bug and powering off and on the
	scanner itself doesn't help.

	I created a small test program that does the same scsi
	commands using netbsd native SCIOCCOMMAND ioctl and it causes
	the same effect on the ncr driver, so the bug cannot be in my
	sane emulation code.

	The bug is repeatable with the test program when run on the
	Nikon LS-2000 scanner, but if I run the same test program on
	the TEAC CD-R56S device it seems to work. This might also be
	problem in the Nikon LS-2000 scanner device.

	Here is the output of the test program when run on the
	/dev/ss0 device (Nikon LS-2000 scanner):

	-----------------------------------------------------------------
	kahva (9:29) ~/sanei-linux-compat#gcc test.c
	kahva (9:29) ~/sanei-linux-compat#./a.out
	Request:
	00000000: 1200 0000 2400                           ....$.          
	Reply:
	00000000: 0680 0202 1f00 0000 4e69 6b6f 6e20 2020  ........Nikon   
	00000010: 4c53 2d32 3030 3020 2020 2020 2020 2020  LS-2000         
	00000020: 312e 3331                                1.31            
	Request:
	00000000: 1201 0000 0400                           ......          
	Reply:
	00000000: 0600 0010                                ....            
	Request:
	00000000: 1201 0000 1400                           ......          
	Reply:
	00000000: 0600 0010 0001 4041 5051 5253 5460 61c1  ......@APQRST`a.
	00000010: d1e1 f0f8                                ....            
	Request:
	00000000: 1201 0000 0400                           ......          
	Reply:
	00000000: 0600 0010                                ....            
	Request:
	00000000: 1201 0000 1400                           ......          
	Reply:
	00000000: 0600 0010 0001 4041 5051 5253 5460 61c1  ......@APQRST`a.
	00000010: d1e1 f0f8                                ....            
	Request:
	00000000: 1201 0100 0400                           ......          
	Reply:
	00000000: 0601 0007                                ....            
	Request:
	00000000: 1201 0100 0b00                           ......          
	Reply:
	00000000: 0601 0007 064d 6f75 6e74 00              .....Mount.     
	Request:
	00000000: 1201 4000 0400                           ..@...          
	ncr0:4: ERROR (a0:0) (6-a7-7) (e0/5) @ (mem a51001b4:a51001b4).
	ncr0: regdump: da 10 80 05 47 e0 04 0f 01 06 00 a7 80 00 0f 00.
	ncr0: restart (fatal error).
	ss0(ncr0:4:0): COMMAND FAILED (9 ff) @0xc09ee000.
	Scsi command 7 failed, retsts = 1
	Reply:
	00000000: 0000 0000                                ....            
	Request:
	00000000: 1201 4000 1000                           ..@...          
	ncr0: timeout ccb=0xc09ee000 (skip)
	^C
	kahva (9:29) ~/sanei-linux-compat#
	-----------------------------------------------------------------

	After pressing last "ncr0: timeout ..." line the program hung,
	and pressing Ctrl-C does nothing. After I turn off the scanner
	for few seconds the program continues and exits because of the
	Ctrl-C. If I run program to the /dev/rcd1d device the output
	is like this:

	-----------------------------------------------------------------
	kahva (9:41) ~/sanei-linux-compat#./a.out /dev/rcd1d 
	cd1(ncr0:6:0): 10.0 MB/s (100 ns, offset 15)
	Request:
	00000000: 1200 0000 2400                           ....$.          
	Reply:
	00000000: 0580 0202 1f00 0098 5445 4143 2020 2020  ........TEAC    
	00000010: 4344 2d52 3536 5320 2020 2020 2020 2020  CD-R56S         
	00000020: 312e 3045                                1.0E            
	Request:
	00000000: 1201 0000 0400                           ......          
	Reply:
	00000000: 0500 0002                                ....            
	Request:
	00000000: 1201 0000 1400                           ......          
	Reply:
	00000000: 0500 0002 0080 0000 0000 0000 0000 0000  ................
	00000010: 0000 0000                                ....            
	Request:
	00000000: 1201 0000 0400                           ......          
	Reply:
	00000000: 0500 0002                                ....            
	Request:
	00000000: 1201 0000 1400                           ......          
	Reply:
	00000000: 0500 0002 0080 0000 0000 0000 0000 0000  ................
	00000010: 0000 0000                                ....            
	Request:
	00000000: 1201 0100 0400                           ......          
	Scsi command 5 failed, retsts = 3
	Reply:
	00000000: 0000 0000                                ....            
	Request:
	00000000: 1201 0100 0b00                           ......          
	Scsi command 6 failed, retsts = 3
	Reply:
	00000000: 0000 0000 0000 0000 0000 00              ...........     
	Request:
	00000000: 1201 4000 0400                           ..@...          
	Scsi command 7 failed, retsts = 3
	Reply:
	00000000: 0000 0000                                ....            
	Request:
	00000000: 1201 4000 1000                           ..@...          
	Scsi command 8 failed, retsts = 3
	Reply:
	00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
	zsh: 460 exit 9     ./a.out /dev/rcd1d
	-----------------------------------------------------------------

	If I rerun the program to /dev/ss0 device it fails
	immediately:
		
	-----------------------------------------------------------------
	kahva (9:41) ~/sanei-linux-compat#./a.out /dev/ss0  
	ncr0:4: ERROR (81:0) (6-a7-7) (0/5) @ (script 1bc:a5094800).
	ncr0: script cmd = 900b0000
	ncr0: regdump: da 10 80 05 47 00 04 0f 01 06 83 a7 80 00 0f 00.
	ncr0: restart (fatal error).
	ss0(ncr0:4:0): COMMAND FAILED (9 ff) @0xc09ee000.
	^C^C
	kahva (9:41) ~/sanei-linux-compat#
	-----------------------------------------------------------------

	
	And I had to turn off the scanner again to recover. I also
	tried to rerun the program to the /dev/rcd1d device again, and
	now it hangs also:
		
	-----------------------------------------------------------------
	kahva (9:49) ~/sanei-linux-compat#./a.out /dev/rcd1d
	Request:
	00000000: 1200 0000 2400                           ....$.          
	^C^Cncr0: timeout ccb=0xc09f8000 (skip)
	^C^C
	-----------------------------------------------------------------

	And because I cannot turn off the CD-Rom I cannot recover
	anymore (turning off the scanner doesn't help).

	Because the problem is completely repeatable, I can rerun the
	tests with more debugging, just give out information what kind
	of debugging information would be useful and how to enable
	them.

>How-To-Repeat:

Compile this code and run it on some scsi devices:
----------------------------------------------------------------------
#include <sys/param.h>
#include <sys/ioctl.h>
#include <sys/scsiio.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <util.h>

#include <dev/scsipi/scsipi_all.h>
#include <dev/scsipi/scsi_all.h>
#include <dev/scsipi/scsi_disk.h>
#include <dev/scsipi/scsipiconf.h>

typedef struct {
  char request[6];
  int reply_len;
} scsi_requests;

scsi_requests requests[] = {
  { { 0x12, 0x00, 0x00, 0x00, 0x24, 0x00 }, 36 },
  { { 0x12, 0x01, 0x00, 0x00, 0x04, 0x00 }, 4 },
  { { 0x12, 0x01, 0x00, 0x00, 0x14, 0x00 }, 20 },
  { { 0x12, 0x01, 0x00, 0x00, 0x04, 0x00 }, 4 },
  { { 0x12, 0x01, 0x00, 0x00, 0x14, 0x00 }, 20 },
  { { 0x12, 0x01, 0x01, 0x00, 0x04, 0x00 }, 4 },
  { { 0x12, 0x01, 0x01, 0x00, 0x0b, 0x00 }, 11 },
  { { 0x12, 0x01, 0x40, 0x00, 0x04, 0x00 }, 4 },
  { { 0x12, 0x01, 0x40, 0x00, 0x10, 0x00 }, 16 }
};

int num_requests = sizeof(requests) / sizeof(requests[0]);

void print_buffer(unsigned char *buffer, size_t len)
{
  size_t i, j;

  for(i = 0; i < len; i += 16)
    {
      printf("%08lx: ", (unsigned long) i);
      for(j = 0; j < 16; j++)
	{
	  if (i + j >= len)
	    printf("  ");
	  else
	    printf("%02x", buffer[i + j]);
	  if (j % 2 == 1)
	    printf(" ");
	}
      printf(" ");
      for(j = 0; j < 16; j++)
	{
	  if (i + j >= len)
	    printf(" ");
	  else if (buffer[i + j] >= ' ' &&
		   buffer[i + j] <= '~')
	    printf("%c", buffer[i + j]);
	  else
	    printf(".");
	}
      printf("\n");
    }
}

int main(int argc, char **argv)
{
  char inqbuf[64];
  scsireq_t req;
  int fd, i;

  if (argc < 2)
    fd = open("/dev/ss0", O_RDWR, 0666);
  else
    fd = open(argv[1], O_RDWR, 0666);

  if (fd < 0)
    {
      perror("Opening device");
      exit(1);
    }

  for(i = 0; i < num_requests; i++)
    {
      memset(inqbuf, 0, sizeof(inqbuf));
      memset(&req, 0, sizeof(req));

      memcpy(req.cmd, requests[i].request, 6);
      printf("Request:\n");
      print_buffer(req.cmd, 6);
      
      req.cmdlen = 6;
      req.databuf = inqbuf;
      req.datalen = requests[i].reply_len;
      req.timeout = 5000;
      req.flags = SCCMD_READ;
      req.senselen = SENSEBUFLEN;

      if (ioctl(fd, SCIOCCOMMAND, &req) == -1)
	{
	  perror("Ioctl SCIOCCOMMAND failed");
	  exit(1);
	}

      if (req.retsts != SCCMD_OK)
	{
	  printf("Scsi command %d failed, retsts = %d\n", i, req.retsts);
	}
      printf("Reply:\n");
      print_buffer(inqbuf, requests[i].reply_len);
    }
}
----------------------------------------------------------------------

>Fix:
	None known. 

>Release-Note:
>Audit-Trail:
>Unformatted: