Subject: kern/21184: com (RS-232C serial) locks the kernel when external device is turned off
To: None <gnats-bugs@gnats.netbsd.org>
From: None <itohy@netbsd.org>
List: netbsd-bugs
Date: 04/15/2003 00:25:30
>Number:         21184
>Category:       kern
>Synopsis:       com (RS-232C serial) locks the kernel when external device is turned off
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 14 08:26:00 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     ITOH Yasufumi
>Release:        NetBSD 1.6L
>Organization:
>Environment:
System: NetBSD pino.my.domain 1.6L NetBSD 1.6L (PINO) #376: Sun Apr 13 23:15:10 JST 2003 itohy@pino.my.domain:/amd/fmv/w/src/sys/arch/i386/compile/PINO i386
Architecture: i386
Machine: i386

from dmesg:
com0 at pnpbios0 index 14 (PNP0501)
com0: io 3f8-3ff, irq 4
com0: ns16550a, working fifo

>Description:
	On one of my machines (L: Toshiba Libretto 100),
	when it (L) is used for serial console of an Alpha (A: AlphaStation),
	and when A is turned off,
	the kernel of console machine (L) locks solid.
	The L kernel completely hangs including the keyboard, network
	and clock.

	When this happens, the IIR register of the com device
	reads 0xcc or 0xc4 (data receive interrupt)
	but the LSR register reads 0x60 (no data).

	The driver tries to clear the interrupt by some magic
	(clear IER and back again),
	but that doesn't work on this machine (L).
	The comintr() will never complete.  Hence the hang.

	This hang doesn't occur all machines, and probably caused
	by minor difference of behavior of com compatible devices.

>How-To-Repeat:
	1. Connect Libretto 100 (L) and AlphaStation (A)
	   with a null-modem cable with flow control
	   (1-7, 2-3, 3-2, 4-8&6, 5-5, 6&8-4, 7-1).
	2. Add the following line to L's /etc/remote.
		dec:dv=/dev/tty00:br#9600:pa=none:dc:
	3. On L:
		# tip dec
	4. Turn A on, and then turn A off.
	5. The L kernel hangs.

	6. Turn A on again.
	7. L kernel becomes working again.

>Fix:
	The NS 16550 data sheet
	<http://www.national.com/pf/PC/PC16550D.html>
	says the receiver interrupt is cleared by reading
	the data register.
	Following to the document, read the data register
	(and discard the value) to clear the interrupt.

	The "(iir & 0x06) == IIR_RXRDY" matches both 0xcc and 0xc4.
	(The literal constant 0x06 should probably be a macro, though.)

--- src/sys/dev/ic/com.c.orig	Sat Mar 15 11:17:56 2003
+++ src/sys/dev/ic/com.c	Sun Apr 13 22:45:32 2003
@@ -2061,12 +2061,19 @@ comintr(void *arg)
 				bus_space_write_1(iot, ioh, com_ier, sc->sc_ier);
 			}
 		} else {
+#if 0
 			if ((iir & IIR_IMASK) == IIR_RXRDY) {
 				bus_space_write_1(iot, ioh, com_ier, 0);
 				delay(10);
 				bus_space_write_1(iot, ioh, com_ier,sc->sc_ier);
 				continue;
 			}
+#else
+			if ((iir & 0x06) == IIR_RXRDY) {
+				(void) bus_space_read_1(iot, ioh, com_data);
+				continue;
+			}
+#endif
 		}
 
 		msr = bus_space_read_1(iot, ioh, com_msr);
>Release-Note:
>Audit-Trail:
>Unformatted: