Subject: Re: kern/33445: Fixes for Promise SATA (pdcsata) PCI driver
To: None <buhrow@lothlorien.nfbcal.org>
From: Timo Schoeler <timo.schoeler@riscworks.net>
List: netbsd-bugs
Date: 05/17/2006 16:28:15
thus Brian Buhrow spake:
>> Number:         33445 Category:       kern Synopsis:       These
>> patches fix a number of problems with the pdcsata driver for
>> NetBSD-3.0 and -current.. Confidential:   no Severity:
>> critical Priority:       high Responsible:    kern-bug-people 
>> State:          open Class:          sw-bug Submitter-Id:   net 
>> Arrival-Date:   Mon May 08 23:30:00 +0000 2006 Originator:
>> Brian Buhrow Release:        NetBSD 3.0_STABLE and -current 
>> Organization:
> Vianet Communications
>> Environment:
>   System: NetBSD fserv1.via.net 3.0_STABLE NetBSD 3.0_STABLE
> (NFBNETBSD) #0: Tue Jan 31 14:45:08 PST 2006
> buhrow@lothlorien.nfbcal.org:/usr/src/sys/arch/i386/compile/NFBNETBSD
> i386 Architecture: i386 Machine: i386
>> Description:
>  The driver for the family of Promise SATA controllers, 
> /usr/src/sys/dev/pci/pdcsata.c is not very robust when it comes to
> handling transient drive errors, or interrupt hickups when the card
> is under load. Worse, my experience seems to indicate, and the Linux
> driver confirms, that these cards tend to fall over rather frequently
> during high load operations or if drives unexpectedly reset or go to
> sleep.  Symptoms include interupt timeouts during heavy load, the
> inability to reset drives if they go to sleep, and a failure of the
> card to generate interrupts at all if the interrupt load gets too
> high.
> 
>> How-To-Repeat:
>  To test to see if you're encountering the problems this driver fixes
>  before you patch, try the following steps:
> 
> 1.  Install a card supported by the pdcsata driver, either one of the
> 203xx cards, or the 205xx cards.  The Promise PDC40718 and PDC40719
> cards are also supported by the pdcsata driver.
> 
> 2.  After the card is configured, and you have a disk on it which is 
> running, perform the command: atactl /dev/wd3d sleep Assuming the
> drive attached to your pdcsata driven card is wd3.  Change the drive
> number to match the drive actually attached to your pdcsata card. 
> Now, run disklabel -r wd3 Again, making the same assumptions as
> above. If you have the broken verssion of the driver, you won't be
> able to revive the drive without a reboot.
>> Fix:
>  These patches solve all the bugs listed above, as well as simplify 
> the driver.  I have tested these patches on production systems
> running at high volume, and they work well.  I have been working with
> abs@netbsd.org, who also has one of these cards, and they help him as
> well, although there are still some minor issues to work out with his
> setup. These patches apply cleanly against 3.0 sources as of April
> 21, 2006, but I believe they'll apply equally cleanly to current 3.0
> sources, as well as -current sources. I would like to see these fixes
> get into 3.0, as well as the 4.0 branch.
> 
> -thanks -Brian
> 
> 
> Index: pdcsata.c 
> =================================================================== 
> RCS file: /cvsroot/src/sys/dev/pci/pdcsata.c,v retrieving revision
> 1.3.2.2 diff -u -r1.3.2.2 pdcsata.c --- pdcsata.c	5 Feb 2006 17:13:57
> -0000	1.3.2.2 +++ pdcsata.c	5 May 2006 17:07:57 -0000 @@ -48,17
> +48,19 @@
> 
> #define PDC203xx_BAR_IDEREGS 0x1c /* BAR where the IDE registers are
> mapped */
> 
> +#define PDC_CHANNELBASE(ch) 0x200 + ((ch) * 0x80) +#define
> PDC_ERRMASK 0x00780700 + static void pdcsata_chip_map(struct
> pciide_softc *, struct pci_attach_args *); static void
> pdc203xx_setup_channel(struct ata_channel *); -static int
> pdc203xx_pci_intr(void *); static void pdc203xx_irqack(struct
> ata_channel *); static int  pdc203xx_dma_init(void *, int, int, void
> *, size_t, int); static void pdc203xx_dma_start(void *,int ,int); 
> static int  pdc203xx_dma_finish(void *, int, int, int); +static int
> pdcsata_pci_intr(void *); +static void pdcsata_do_reset(struct
> ata_channel *, int);
> 
> /* PDC205xx, PDC405xx and PDC407xx. but tested only pdc40718 */ 
> -static int  pdc205xx_pci_intr(void *); -static void
> pdc205xx_do_reset(struct ata_channel *, int); static void
> pdc205xx_drv_probe(struct ata_channel *);
> 
> static int  pdcsata_match(struct device *, struct cfdata *, void *); 
> @@ -183,30 +185,8 @@ return; } intrstr = pci_intr_string(pa->pa_pc,
> intrhandle); - -	switch (sc->sc_pp->ide_product) { -	case
> PCI_PRODUCT_PROMISE_PDC20318: -	case PCI_PRODUCT_PROMISE_PDC20319: -
> case PCI_PRODUCT_PROMISE_PDC20371: -	case
> PCI_PRODUCT_PROMISE_PDC20375: -	case PCI_PRODUCT_PROMISE_PDC20376: -
> case PCI_PRODUCT_PROMISE_PDC20377: -	case
> PCI_PRODUCT_PROMISE_PDC20378: -	case PCI_PRODUCT_PROMISE_PDC20379: -
> default: -		sc->sc_pci_ih = pci_intr_establish(pa->pa_pc, -
> intrhandle, IPL_BIO, pdc203xx_pci_intr, sc); -		break; - -	case
> PCI_PRODUCT_PROMISE_PDC40718: -	case PCI_PRODUCT_PROMISE_PDC40719: -
> case PCI_PRODUCT_PROMISE_PDC20571: -	case
> PCI_PRODUCT_PROMISE_PDC20575: -	case PCI_PRODUCT_PROMISE_PDC20579: -
> sc->sc_pci_ih = pci_intr_establish(pa->pa_pc, -		    intrhandle,
> IPL_BIO, pdc205xx_pci_intr, sc); -		break; -	} +	sc->sc_pci_ih =
> pci_intr_establish(pa->pa_pc, +	    intrhandle, IPL_BIO,
> pdcsata_pci_intr, sc);
> 
> if (sc->sc_pci_ih == NULL) { aprint_error("%s: couldn't establish
> native-PCI interrupt", @@ -258,6 +238,8 @@ 
> sc->sc_wdcdev.sc_atac.atac_set_modes = pdc203xx_setup_channel; 
> sc->sc_wdcdev.sc_atac.atac_channels = sc->wdc_chanarray;
> 
> +	sc->sc_wdcdev.reset = pdcsata_do_reset; + switch
> (sc->sc_pp->ide_product) { case PCI_PRODUCT_PROMISE_PDC20318: case
> PCI_PRODUCT_PROMISE_PDC20319: @@ -281,7 +263,6 @@ 
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60, 0x00ff00ff); 
> sc->sc_wdcdev.sc_atac.atac_nchannels = PDC40718_NCHANNELS;
> 
> -		sc->sc_wdcdev.reset = pdc205xx_do_reset; 
> sc->sc_wdcdev.sc_atac.atac_probe = pdc205xx_drv_probe;
> 
> break; @@ -290,7 +271,6 @@ bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, 0x60, 0x00ff00ff); 
> sc->sc_wdcdev.sc_atac.atac_nchannels = PDC20575_NCHANNELS;
> 
> -		sc->sc_wdcdev.reset = pdc205xx_do_reset; 
> sc->sc_wdcdev.sc_atac.atac_probe = pdc205xx_drv_probe;
> 
> break; @@ -403,53 +383,37 @@ }
> 
> static int -pdc203xx_pci_intr(void *arg) +pdcsata_pci_intr(void *arg)
>  { struct pciide_softc *sc = arg; struct pciide_channel *cp; struct
> ata_channel *wdc_cp; int i, rv, crv; -	u_int32_t scr; - -	rv = 0; -
> scr = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x00040); - -
> for (i = 0; i < sc->sc_wdcdev.sc_atac.atac_nchannels; i++) { -		cp =
> &sc->pciide_channels[i]; -		wdc_cp = &cp->ata_channel; -		if (scr &
> (1 << (i + 1))) { -			crv = wdcintr(wdc_cp); -			if (crv == 0) { -
> printf("%s:%d: bogus intr (reg 0x%x)\n", -
> sc->sc_wdcdev.sc_atac.atac_dev.dv_xname, -				    i, scr); -			} else
>  -				rv = 1; -		} -	} -	return rv; -} - -static int 
> -pdc205xx_pci_intr(void *arg) -{ -	struct pciide_softc *sc = arg; -
> struct pciide_channel *cp; -	struct ata_channel *wdc_cp; -	int i, rv,
> crv; -	u_int32_t scr, status; +	u_int32_t scr, status, chanbase;
> 
> rv = 0; scr = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x40); +
> if (scr == 0xffffffff) return(1); bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, 0x40, scr & 0x0000ffff); - -	status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60); -
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60, status &
> 0x000000ff); +	scr = scr & 0x0000ffff; +	if (!scr) return(1);
> 
> for (i = 0; i < sc->sc_wdcdev.sc_atac.atac_nchannels; i++) { cp =
> &sc->pciide_channels[i]; wdc_cp = &cp->ata_channel; if (scr & (1 <<
> (i + 1))) { +			chanbase = PDC_CHANNELBASE(i) + 0x48; +			status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +			if
> (status & PDC_ERRMASK) { +				chanbase = PDC_CHANNELBASE(i) + 0x60; +
> status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +
> status |= 0x800; +				bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh,
> chanbase, status); +				status &= ~0x800; +
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase, status); +
> status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +
> continue; +			} crv = wdcintr(wdc_cp); if (crv == 0) { printf("%s:%d:
> bogus intr (reg 0x%x)\n", @@ -541,24 +505,29 @@
> 
> 
> static void -pdc205xx_do_reset(struct ata_channel *chp, int poll) 
> +pdcsata_do_reset(struct ata_channel *chp, int poll) { struct
> pciide_softc *sc = CHAN_TO_PCIIDE(chp); -	u_int32_t scontrol; - -
> wdc_do_reset(chp, poll); +	int reset, status, i, chanbase;
> 
> /* reset SATA */ -	scontrol = SControl_DET_INIT | SControl_SPD_ANY |
> SControl_IPM_NONE; -	SCONTROL_WRITE(sc, chp->ch_channel, scontrol); -
> delay(50*1000); - -	scontrol &= ~SControl_DET_INIT; -
> SCONTROL_WRITE(sc, chp->ch_channel, scontrol); -	delay(50*1000); -} +
> reset = (1 << 11); +	chanbase = PDC_CHANNELBASE(chp->ch_channel) +
> 0x60; +	for (i = 0; i < 11;i ++) { +		status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +		if
> (status & reset) break; +		delay(100); +		status |= reset; +
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase, status); +
> } +	status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh,
> chanbase); +	status &= ~reset; +	bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, chanbase, status); +	status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase);
> 
> +	wdc_do_reset(chp, poll);
> 
> +}
> 
> static void pdc205xx_drv_probe(struct ata_channel *chp)
> 
>> Unformatted:

hi,

i have massive troubles with fxp* since adding the patch to a netbsd-3 
machine (build on may 10th, see uname):

test: {11} ping -f 192.168.100.2

(takes about four or five seconds to start! should start immediately)

PING packetvermuckler.ts39-bln.riscworks.net (192.168.100.2): 56 data bytes
...............................................................................................^C...........

----packetvermuckler.ts39-bln.riscworks.net PING Statistics----
824 packets transmitted, 600 packets received, 27.2% packet loss
round-trip min/avg/max/stddev = 0.243/1.640/80.343/8.880 ms
   314.8 packets/sec sent,  268.3 packets/sec received
test: {12} ifconfig fxp0
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
         capabilities=6<TCP4CSUM,UDP4CSUM>
         enabled=0
         address: 00:02:b3:8e:29:83
         media: Ethernet autoselect (none flowcontrol,rxpause,txpause)
         status: no carrier
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         inet 192.168.100.123 netmask 0xffffff00 broadcast 192.168.100.255
         inet6 fe80::202:b3ff:fe8e:2983%fxp0 prefixlen 64 scopeid 0x1
test: {13} uname -a
NetBSD test.riscworks.net 3.0_STABLE NetBSD 3.0_STABLE (GENERIC) #0: Wed 
May 10 15:22:29 CEST 2006 
root@deneb.ts39-bln.riscworks.net:/usr/obj/sys/arch/i386/compile/GENERIC 
i386

i can login via ssh, and a ping flood from the LAN gets me this:

localhost:~ tis$ sudo ping -f 192.168.100.123
PING 192.168.100.123 (192.168.100.123): 56 data bytes
...................................................................^C
--- 192.168.100.123 ping statistics ---
2319158 packets transmitted, 2319091 packets received, 0% packet loss
round-trip min/avg/max = 0.153/0.435/192.297 ms

which looks much better.

summary: the machine says 'no carrier', but pings external hosts. it 
allows login via ssh (reliable, i work on four shells right now).

the problem appears with both MP and uniprocessor kernels (the machine 
is MP).

NetBSD 3.0-RELEASE runs very well on the same machine; i'll try a kernel 
without above patch soon.

dmesg following:

NetBSD 3.0_STABLE (GENERIC) #0: Wed May 10 15:22:29 CEST 2006
 
root@deneb.ts39-bln.riscworks.net:/usr/obj/sys/arch/i386/compile/GENERIC
total memory = 1279 MB
avail memory = 1240 MB
BIOS32 rev. 0 found at 0xfd8b0
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium III (686-class), 864.02 MHz, id 0x683
cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
cpu0: features 383fbff<FXSR,SSE>
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 256 KB 32B/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: 8 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CNB20-HE PCI bridge (rev. 0x22)
ppb0 at pci0 dev 0 function 1: ServerWorks CNB20-HE PCI/AGP bridge (rev. 
0x01)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
vga1 at pci1 dev 0 function 0: Matrox MGA G400 AGP (rev. 0x85)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
pchb1 at pci0 dev 0 function 2
pchb1: ServerWorks CNB30-LE PCI bridge (rev. 0x00)
pchb2 at pci0 dev 0 function 3
pchb2: ServerWorks CNB30-LE PCI bridge (rev. 0x00)
pci2 at pchb2 bus 2
pci2: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ahc1 at pci2 dev 1 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at irq 11
ahc1: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc1: 16 targets, 8 luns per target
ahc2 at pci2 dev 1 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc2: interrupting at irq 11
ahc2: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc2: 16 targets, 8 luns per target
fxp0 at pci0 dev 1 function 0: i82550 Ethernet, rev 12
fxp0: interrupting at irq 10
fxp0: Ethernet address 00:02:b3:8e:29:83
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
acardide0 at pci0 dev 2 function 0
acardide0: Acard ATP860-A Ultra66 IDE Controller (rev. 0x01)
acardide0: bus-master DMA support present
acardide0: primary channel wired to native-PCI mode
acardide0: using irq 5 for native-PCI interrupt
atabus0 at acardide0 channel 0
acardide0: secondary channel wired to native-PCI mode
atabus1 at acardide0 channel 1
pdcsata0 at pci0 dev 3 function 0
pdcsata0: Promise PDC40718 SATA300 controller (rev. 0x02)
pdcsata0: interrupting at irq 11
pdcsata0: bus-master DMA support present
atabus2 at pdcsata0 channel 0
atabus3 at pdcsata0 channel 1
atabus4 at pdcsata0 channel 2
atabus5 at pdcsata0 channel 3
fxp1 at pci0 dev 7 function 0: i82559 Ethernet, rev 8
fxp1: interrupting at irq 11
fxp1: May need receiver lock-up workaround
fxp1: Ethernet address 00:10:83:ff:e1:5a
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 15 function 0
pcib0: ServerWorks OSB4 southbridge (rev. 0x50)
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at irq 14
atabus6 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at irq 15
atabus7 at rccide0 channel 1
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
pdcsata0:1: bogus intr (reg 0x14)
pdcsata0:3: bogus intr (reg 0x14)
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
wd0 at atabus3 drive 0sd0 at scsibus0 target 0 lun 0: <QUANTUM, 
ATLAS10K2-TY184L, DA40> disk fixed
sd0: 17366 MB, 17338 cyl, 5 head, 410 sec, 512 bytes/sect x 35566480 sectors
sd0: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged 
queueing
pdcsata0:1:0: lost interrupt
         type: ata tc_bcount: 512 tc_skip: 0
: <WDC WD2500YD-01NVB1>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(pdcsata0:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
DMA)
wd1 at atabus5 drive 0pdcsata0:3:0: lost interrupt
         type: ata tc_bcount: 512 tc_skip: 0
: <WDC WD2500YD-01NVB1>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(pdcsata0:3:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
DMA)
atapibus0 at atabus6: 2 targets
cd0 at atapibus0 drive 0: <MATSHITADVD-ROM SR-8585, , 1W21> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2 
(Ultra/33) (using DMA)
raid0: RAID Level 1
raid0: Components: component0[**FAILED**] /dev/wd0a
raid0: Total Sectors: 490234624 (239372 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs

-- 
Timo Schoeler | http://riscworks.net/~tis | timo.schoeler@riscworks.net
RISCworks -- Perfection is a powerful message
ISP | POWER & PowerPC afficinados | Networking, Security, BSD services
GPG Key fingerprint = B5F6 68A4 EC45 C309 6770  38C4 50E8 2740 9E0C F20A

There are 10 types of people in the world. Those who understand binary
and those who don't.