Subject: port-i386/6318: race condition(?): integer divide by zero caused by pciide driver
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Havard.Eidnes@runit.sintef.no>
List: netbsd-bugs
Date: 10/17/1998 21:08:44
>Number:         6318
>Category:       port-i386
>Synopsis:       race condition(?): integer divide by zero caused by pciide driver
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 17 12:20:01 1998
>Last-Modified:
>Originator:     Havard Eidnes
>Organization:
        RUNIT A/S
>Release:        NetBSD-current Oct 12 1998
>Environment:
        
System: NetBSD maaltrost.runit.sintef.no 1.3H NetBSD 1.3H (MAALTROST) #1: Thu Oct 15 22:28:32 MEST 1998     he@maaltrost.runit.sintef.no:/usr/src/sys/arch/i386/compile/MAALTROST i386


>Description:
        I booted an Oct 12 1998 kernel with the pciide driver compiled
        in.  The system in question has a CMD0640 controller (see boot
        messages below), and is equipped with two IDE drives on the
        first channel.  Initially, when the console was on the PC
        screen/ keyboard, the kernel would reproducibly stop at this
        point in the bootup process:

root file system type: ffs
swapctl: adding /dev/wd0b as swap device at priority 0
swapctl: adding /dev/sd0b as swap device at priority 0
Automatic boot in progress: starting file system checks.
/dev/rwd0a: file system is clean; not checking
kernel: integer divide fault trap, code=0
stopped in fsck_ffs at _readdisklabel+0xbb
db>

        The stack backtrace was

_readdisklabel+0xbb
_wdgetdisklabel
_wdopen
_spec_open
_vn_open
_sys_open
_syscall

        The instruction at _readdisklabel+0xbb is

divl  0x38(%ecx),%eax

        and ecx was 0xf03a4a00, eax was 1.

        An examination of 0xf03a4a38 gave 0.  The area surrounding
        that part was:

0xf03a4a00:     82564557        0              0         0
0xf03a4a10:            0        0       74636966  6f697469
0xf03a4a20:         7375        0            200         0
0xf03a4a30:            0        0              0    102200
0xf03a4a40:            0        0          10e10         0
0xf03a4a50:            0        0              0         0

        0x102200 is the total number of sectors on the wd1 drive
        (1057280 decimal).  The int at 0x38(%ecx) is supposed to be
        calculated as the product of sectors/track and heads, giving
        the number of sectors per cylinder (if I read the code
        correctly).

        I should perhaps mention that the first BSD file system
        partition for both IDE drives starts at sector 0.

        To debug this further I built and installe a bootblock and a
        kernel with serial console, but surprisingly, the problem did
        not reoccur.  I've not tried switching back to the PC console
        yet, but the above crash was repeated two or three times in a
        row while the PC keyboard/screen was the console.


        My current best guess is that there is a race condition in the
        pciide code which messes up the results for the
        ata_get_params() call if there is other activity ongoing on
        the other drive on the same channel (???), and that printing
        things on the 9600 baud serial console slows it down enough so
        that this problem doesn't surface in that configuration.


        The autoconf output from the kernel and the output from some
        of the debugging done to try to recreate the problem looked
        like this:

>How-To-Repeat:
        See above.
>Fix:
        Don't know, sorry.
>Audit-Trail:
>Unformatted:
>> NetBSD/i386 BIOS Boot, Revision 2.3
>> (he@maaltrost.runit.sintef.no, Thu Oct 15 22:03:25 MEST 1998)
>> Memory: 638/31744 k
Use hd1a:netbsd to boot sd0 when wd0 is also installed
Press return to boot now, any other key for boot menu
booting wd0a:netbsd - starting in 0
type "?" or "help" for help.
> boot netbsd.new -as
booting wd0a:netbsd.new (howto 0x3)
1306624+102400+153620+[82440+98028]=0x1a990c
[ preserving 180468 bytes of netbsd symbol table ]
Copyright (c) 1996, 1997, 1998
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.3H (MAALTROST) #1: Thu Oct 15 22:28:32 MEST 1998
    he@maaltrost.runit.sintef.no:/usr/src/sys/arch/i386/compile/MAALTROST
cpu0: family 5 model 2 step 1
cpu0: Intel Pentium (P54C) (586-class)
real mem  = 33161216
avail mem = 28815360
using 430 buffers containing 1761280 bytes of memory
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o enabled, memory enabled
pchb0 at pci0 dev 0 function 0
pchb0: VLSI Technology 82C592 CPU Bridge (rev. 0x00)
pcib0 at pci0 dev 1 function 0
pcib0: VLSI Technology 82C593 ISA Bridge (rev. 0x00)
pciide0 at pci0 dev 8 function 0: CMD Technology PCI0640
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <ST51270A>
wd0: using 32-sector pio transfers, lba mode
wd0: 1223MB, 2485 cyl, 16 head, 63 sec, 512 bytes/sect x 2504880 sectors
wd0: PIO mode 4, DMA mode 2
wd1 at pciide0 channel 0 drive 1: <QUANTUM LPS540A>
wd1: using 8-sector pio transfers, lba mode
wd1: 516MB, 1120 cyl, 16 head, 59 sec, 512 bytes/sect x 1057280 sectors
wd1: PIO mode 3, DMA mode 1
pciide0: secondary channel wired to compatibility mode
pciide0: secondary channel ignored (disabled)
S3 86C864-0 ("Vision864") (VGA prehistoric) at pci0 dev 9 function 0 not configured
ahc0 at pci0 dev 11 function 0
ahc0: interrupting at irq 9
ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs
scsibus0 at ahc0 channel 0: 8 targets
sd0 at scsibus0 targ 3 lun 0: <MICROP, 2105-08MQ1068605, 4849> SCSI2 0/direct fixed
sd0: 532MB, 1760 cyl, 8 head, 77 sec, 512 bytes/sect x 1091355 sectors
de0 at pci0 dev 12 function 0
de0: interrupting at irq 5
de0: DEC DE500-XA 21140 [10-100Mb/s] pass 1.1
de0: address 00:00:f8:01:08:20
isa0 at pcib0
ep0 at isa0 port 0x300-0x30f irq 10: 3Com 3C509 Ethernet
ep0: address 00:20:af:ac:25:a0, 8KB byte-wide FIFO, 5:3 Rx:Tx split
ep0: 10baseT, 10base5/AUI, 10base2/BNC (default 10baseT)
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0-0xff: using exception 16
WARNING: Pentium FDIV bug detected!
vt0 at isa0 port 0x60-0x6f irq 1
vt0: unknown s3, 80 col, color, 8 scr, mf2-kbd, [R3.32]
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
biomask 4240 netmask 4660 ttymask 46e2
boot device: wd0
de0: enabling 100baseTX port
root device (default wd0a): 
dump device (default wd0b): 
file system (default generic): 
root on wd0a dumps on wd0b
root file system type: ffs
Enter pathname of shell or RETURN for sh: 
# cat /etc/fstab
/dev/wd0a / ffs rw 1 1
/dev/wd0b none swap sw 0 0
/dev/wd0b /tmp mfs rw,-s=39688 0 0
/dev/wd0e /var ffs rw 1 2
/dev/wd0f /usr ffs rw 1 2
/dev/wd1a /store ffs rw 1 3
/dev/sd0b none swap sw 0 0
/dev/sd0h /local ffs rw 1 2
#
# fsck -p /dev/rwd0a
/dev/rwd0a: 652 files, 17810 used, 11389 free (77 frags, 1414 blocks, 0.3% fragmentation)
/dev/rwd0a: MARKING FILE SYSTEM CLEAN
# fsck -p /dev/rwd1a
/dev/rwd1a: 14345 files, 318112 used, 675293 free (5141 frags, 83769 blocks, 0.5% fragmentation)
/dev/rwd1a: MARKING FILE SYSTEM CLEAN
# fsck -f -p
/dev/rwd0a: 652 files, 17810 used, 11389 free (77 frags, 1414 blocks, 0.3% fragmentation)
/dev/rwd0e: 189 files, 4093 used, 93362 free (98 frags, 11658 blocks, 0.1% fragmentation)
/dev/rwd0e: MARKING FILE SYSTEM CLEAN
/dev/rsd0h: 25819 files, 314398 used, 68404 free (8132 frags, 7534 blocks, 2.1% fragmentation)
/dev/rwd1a: 14345 files, 318112 used, 675293 free (5141 frags, 83769 blocks, 0.5% fragmentation)
/dev/rwd0f: 43940 files, 457678 used, 501176 free (6352 frags, 61853 blocks, 0.7% fragmentation)
/dev/rwd0f: MARKING FILE SYSTEM CLEAN
# reboot
...
boot device: wd0
root on wd0ade0: enabling 100baseTX port
 dumps on wd0b
root file system type: ffs
Enter pathname of shell or RETURN for sh: 
# fsck -p
/dev/rwd0a: file system is clean; not checking
/dev/rwd0e: file system is clean; not checking
/dev/rwd1a: file system is clean; not checking
/dev/rsd0h: file system is clean; not checking
/dev/rwd0f: file system is clean; not checking
# reboot
syncing disks... done
rebooting...
...
root file system type: ffs
swapctl: adding /dev/wd0b as swap device at priority 0
swapctl: adding /dev/sd0b as swap device at priority 0
Automatic boot in progress: starting file system checks.
/dev/rwd0a: file system is clean; not checking
/dev/rwd0e: file system is clean; not checking
/dev/rsd0h: file system is clean; not checking
/dev/rwd1a: file system is clean; not checking
/dev/rwd0f: file system is clean; not checking
setting tty flags
starting network
hostname: maaltrost.runit.sintef.no
...