Subject: kern/26358: the "lpt" at "atppc" is VERY slow in "STD" mode w/ interrupts
To: None <gnats-bugs@gnats.NetBSD.org>
From: Paul Shupak <paul@Plectere.com>
List: netbsd-bugs
Date: 07/18/2004 18:01:50
>Number:         26358
>Category:       kern
>Synopsis:       the "lpt" at "atppc" is VERY slow in "STD" mode w/ interrupts
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Mon Jul 19 01:19:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Paul Shupak
>Release:        NetBSD 2.0G
>Organization:
>Environment:
System: NetBSD cobalt 2.0G NetBSD 2.0G (COBALT-$Revision: 1.4 $) #562: Wed Jul 14 10:22:38 PDT 2004 root@cobalt:/sys/arch/i386/compile/COBALT i386
Architecture: i386
Machine: i386
>Description:
	When using the atppc attachment in non-dma "standard" mode with
interrupts (i.e. lptctl /dev/lpt1 mode standard dma  no ieee  no intr yes)
the system becomes completely unresponsive (each character causes a sleep/
wakeup pair of events and at least 2 transitions between contexts).
>How-To-Repeat:
	My test case is a ~200 megabyte Postscript file generated by Acrobat
from a ~195 MB ".pdf" generated by scanning a page at 1200dpi 48-bit color.
On a 850MHz P3 box feeding a HP LaserJet 6MP w/35M, the current code generates
a constant 26,000 to 28,000 interrupts a second a transfers about 22K/sec to
complete the job in ~9700 seconds with the machine completely unresponsive
for the entire time.  With the changes below, the interrupt rate falls to
either 17 or 18 a second (i.e. by about one thousand and five hundred fold)
 and the transfer rate goes up to ~ 33K/sec, with the machine slow, but
responsive (new time is ~6400 seconds).

	% ls -ls page.pdf page.ps
191200 -rw-r--r--   1 root        wheel     195732947 Jul 17 06:39 page.pdf
197728 -rw-r--r--   1 root        wheel     202425804 Jul 16 21:55 page.ps
	% lptctl /dev/lpt1 mode standard dma  no ieee  no intr yes
	% time cat page.ps > /dev/lpt1

	These are VERY large test cases. AND the rate is almost exactly one
tenth of what would be expected, after the "fix" below ( is "DELAY(1)"
really one usec.? - I'll have to check).  Still, the machine is usable
during printing (just slow), versus unusable.
	Any relevant dmesg out but is below (note, there are ten com ports
sharing the interrupt, and the "atppc" is behind a "puc" -- a separate PR
will eventually address the "un-modified" code crashing machine where the
ECP BAR isn't mapped exact 0x400 past the base PP BAR - like below).

--------------------------------------------------------------------------------
NetBSD 2.0G (SVCS) #235: Sat Jul 17 19:09:28 PDT 2004
        root@svcs:/sys/arch/i386/compile/SVCS
total memory = 2047 MB
avail memory = 1996 MB
...
000:11:1 0x1415 0x9523
        10h port 0x00006800 0x00000008
        14h port 0x00006400 0x00000004
        18h port 0x00006000 0x00000020
        1ch mem  0xeb800000 0x00001000
...
cpu0: Intel Pentium III (686-class), 856.07 MHz, id 0x683
...
puc0 at pci0 dev 7 function 0: SIIG PS8000 8S PCI 16C650 (20x family) (com, com,
 com, com, com, com, com, com)
com2 at puc0 port 0: interrupting at irq 10
com2: st16650a, working fifo
com3 at puc0 port 1: interrupting at irq 10
com3: st16650a, working fifo
com4 at puc0 port 2: interrupting at irq 10
com4: st16650a, working fifo
com5 at puc0 port 3: interrupting at irq 10
com5: st16650a, working fifo
com6 at puc0 port 4: interrupting at irq 10
com6: st16650a, working fifo
com7 at puc0 port 5: interrupting at irq 10
com7: st16650a, working fifo
com8 at puc0 port 6: interrupting at irq 10
com8: st16650a, working fifo
com9 at puc0 port 7: interrupting at irq 10
com9: st16650a, working fifo
...
puc1 at pci0 dev 11 function 0: Oxford Semiconductor OX16PCI952 UARTs (com, com)
com10 at puc1 port 0: interrupting at irq 10
com10: st16650a, working fifo
com11 at puc1 port 1: interrupting at irq 10
com11: st16650a, working fifo
puc1 at pci0 dev 11 function 0: Oxford Semiconductor OX16PCI952 UARTs (com, com)
com10 at puc1 port 0: interrupting at irq 10
com10: st16650a, working fifo
com11 at puc1 port 1: interrupting at irq 10
com11: st16650a, working fifo
puc2 at pci0 dev 11 function 1: Oxford Semiconductor OX16PCI952 Parallel port (l
pt)
atppc1 at puc2 port 0: AT Parallel Port
atppc1: interrupting at irq 10
atppc1: FIFO <depth,wthr,rthr>=<16,8,8>
atppc1: capabilities=3d<INTR,FIFO,PS2,ECP,EPP>
ppbus1 at atppc1
ppbus1: IEEE1284 negotiation: modes NIBBLE/ECP
ppbus1: IEEE1284 device found.
ppbus1: Probing for PnP devices.
ppbus1: <Hewlett-Packard HP LaserJet 6MP> PJL,MLC,PCLXL,PCL,POSTSCRIPT
lpt1 at ppbus1: port mode = 1<COMPATIBLE>
...

>Fix:
144d143
< static int atppc_std_intr_handler(struct atppc_softc * const);
641,649c640
< 		if (atppc->sc_use & ATPPC_USE_INTR) {
< 			claim = atppc_std_intr_handler(atppc);
< 			if (atppc->sc_outbstart
< 			   >= (atppc->sc_outb + atppc->sc_outb_nbytes))
< 				wake_up = WRITER;
< 			else
< 				wake_up = NONE;
< 		}
< 		else if (atppc->sc_outb)
---
> 		if (atppc->sc_outb)
2015,2102d2005
< /* Write the byte currentedly pointed to in the softc's buffer in "STD" mode. */
< static void
< atppc_std_write_byte(struct atppc_softc * const atppc, unsigned char ctr)
< {
< 	/*
< 	 * NOTE: the two microseconds of delays puts a 500KHz upper bound
< 	 * on the transfer speed!  If we had a "DELAY_ONE_HALF_US()" and
< 	 * a "DELAY_ONE_QUARTER_US()", we'd more than increase the maximum
< 	 * rate by more than double (or cut the overhead by at about five
< 	 * eighths).
< 	 */
< 	/* Put data in data register */
< 	atppc_w_dtr(atppc, *(atppc->sc_outbstart));
< 	atppc_barrier_w(atppc);
< 	DELAY(1); /* DELAY_ONE_QUARTER_US()/DELAY(.25) */
< 
< 	/* Pulse strobe to indicate valid data on lines */
< 	ctr |= STROBE;
< 	atppc_w_ctr(atppc, ctr);
< 	atppc_barrier_w(atppc);
< 	DELAY(1); /* DELAY_ONE_HALF_US()/DELAY(.5) */
< 	ctr &= ~STROBE;
< 	atppc_w_ctr(atppc, ctr);
< 	atppc_barrier_w(atppc);
< 
< 	return;
< }
< 
< /*
<  * If the peripheral is ready, try to write as many bytes as we can (in
<  * "STD" mode) during one interupt request. NOTE: There is no need to go
<  * between the "top" and "bottom" parts of the driver as long as the buffer
<  * isn't empty.
<  */
< static int
< atppc_std_intr_handler(struct atppc_softc * const atppc)
< {
< 	unsigned char str;
< 	unsigned char ctr;
< 	int any;
< 
< 	/*
< 	 * NOTE: the three microseconds of delays puts a 333KHz upper bound
< 	 * on the transfer speed!  If we had a "DELAY_ONE_HALF_US()" and a
< 	 * "DELAY_ONE_QUARTER_US()", we'd triple the maximum rate (or cut
< 	 * the overhead by two thirds).
< 	 */
< 	ctr = atppc_r_ctr(atppc);
< 	atppc_barrier_r(atppc);
< 
< 	any = 0;
< 	/* try to "chain" interrupts together */
< 	while (atppc->sc_outbstart < (atppc->sc_outb + atppc->sc_outb_nbytes)) {
< 		str = atppc_r_str(atppc);
< 		atppc_barrier_r(atppc);
< 		if ((str & SPP_MASK) != SPP_READY)
< 			break;
< 		atppc_std_write_byte(atppc, ctr);
< 		atppc->sc_outbstart++;
< 		any = 1;
< 		/* Give time for the peripheral to respond */
< 		DELAY(1); /* DELAY_ONE_QUARTER_US()/DELAY(.25) */
< 	}
< 	return any;
< }
< 
< /* Sleep the write while the `real' work occurs */
< static int
< atppc_wait_buf_done_or_error(struct atppc_softc * const atppc,
< 			     const caddr_t where)
< {
< 	int error;
< 
< 	/*
< 	 * NOTE: HP DeskJets can refuse data for greater than 45 seconds
< 	 * (ex. my PhotoSmart P1100 regularly pauses and refuses data for 35
< 	 * to 40 seconds on large jobs, even longer when duplexing).
< 	 */
< 	/* Wait for interrupt for 20 * MAXBUSYWAIT - after all we're sleeping */
< 	error = ltsleep(where, PPBUSPRI | PCATCH, __func__, 20 * MAXBUSYWAIT,
< 		ATPPC_SC_LOCK(atppc));
< 
< 	if (!error && (atppc->sc_irqstat & ATPPC_IRQ_nACK))
< 		atppc->sc_irqstat &= ~ATPPC_IRQ_nACK;
< 
< 	return error;
< }
<  
2126,2127c2029,2030
< 	if (atppc->sc_use & ATPPC_USE_INTR) {
< 		/* Wait for peripheral to become ready */
---
> 	while (atppc->sc_outbstart < (atppc->sc_outb + atppc->sc_outb_nbytes)) {
> 		/* Wait for peripheral to become ready for MAXBUSYWAIT */
2132,2134c2035,2038
< 		/* Start the sequence with the first byte */
< 		atppc_std_write_byte(atppc, ctr);
< 		atppc->sc_outbstart++;
---
> 		/* Put data in data register */
> 		atppc_w_dtr(atppc, *(atppc->sc_outbstart));
> 		atppc_barrier_w(atppc);
> 		DELAY(1);
2136,2154c2040,2053
< #if 0
< 		atppc->sc_outerr = atppc_wait_interrupt(atppc,
< 			atppc->sc_outb, ATPPC_IRQ_nACK);
< #else
< 		/*
< 		 * Put the writer to sleep until the buffers empty, or an error
< 		 * occurs.
< 		 */
< 		atppc->sc_outerr = atppc_wait_buf_done_or_error(atppc,
< 								atppc->sc_outb);
< #endif /* #if 0 */
< 		if (atppc->sc_outerr)
< 			return;
< 	} else {
< 		while (atppc->sc_outbstart
< 		      < (atppc->sc_outb + atppc->sc_outb_nbytes)) {
< 			/* Wait for peripheral to become ready */
< 			atppc->sc_outerr = atppc_poll_str(atppc, SPP_READY,
< 							  SPP_MASK);
---
> 		/* Pulse strobe to indicate valid data on lines */
> 		ctr |= STROBE;
> 		atppc_w_ctr(atppc, ctr);
> 		atppc_barrier_w(atppc);
> 		DELAY(1);
> 		ctr &= ~STROBE;
> 		atppc_w_ctr(atppc, ctr);
> 		atppc_barrier_w(atppc);
> 
> 		/* Wait for nACK for MAXBUSYWAIT */
> 		timecount = 0;
> 		if (atppc->sc_use & ATPPC_USE_INTR) {
> 			atppc->sc_outerr = atppc_wait_interrupt(atppc,
> 				atppc->sc_outb, ATPPC_IRQ_nACK);
2157,2162c2056
< 
< 			atppc_std_write_byte(atppc, ctr);
< 			atppc->sc_outbstart++;
< 
< 			/* Wait for nACK for MAXBUSYWAIT */
< 			timecount = 0;
---
> 		} else {
2170,2172d2063
< 
< 			/* Update buffer position, byte count and counter */
< 			atppc->sc_outbstart++;
2174a2066,2067
> 		/* Update buffer position, byte count and counter */
> 		atppc->sc_outbstart++;
>Release-Note:
>Audit-Trail:
>Unformatted: