Subject: Re: Serial console hangs
To: None <port-i386@netbsd.org>
From: Andreas Gustafsson <gson@araneus.fi>
List: port-i386
Date: 09/01/1998 16:04:51
On August 15, Bill Studenmund <skippy@macro.stanford.edu> said:

> On Sat, 30 May 1998, David Stanhope wrote:
> 
> > I have seen this many times, and my analisis goes as follows:
> > 
> > in the kernel putc routine in 'com.c', assuming a program is outputing data
> > (something
> > started by 'init') and then the kernel wants to do a 'putc' what happens is:
> 
> > putc waits for the uart to have room, then outputs the character, then
> > clears the interrupt flag, the problem with this is that this will keep
> > an interrupt from signalling to the com interrupt handler that it needs
> > to load more characters. When this hang occurs output will still come
> 
> I'm confused. Shouldn't another interupt be generated once the character
> output by putc gets sent? Then there'd be an interupt for the tty layer to
> process.

After writing the character to the UART, the putc routine spins in a
loop waiting for the transmission to finish.  Only then does it clear
the interrupt.  This guarantees there will be no interrupt after putc
has finished.

> Could this be the scenario:
> 
> 1) com fifo drains, generating an interupt
> 2) com chip transmits last character, so transmitter's empty as well as
> 	fifo
> 3) computc transmits byte
> 4) since transmitter's empty, byte goes to transmitter, and fifo
> immediatly becomes empty, generating an interupt
> 5) computc clears pending interupt

Perhaps, but the current race can be explained even if the FIFO does
not become empty immediately, because the driver specifically waits
for it to empty.

> If the scenario I mentioned is right, just have computc test to see if the
> fifo's empty after transmitting. If so, don't clear the pending interupt.
> Though this can be a nasty race condition. :-)

Due to the the driver waiting for the output to finish, the FIFO is
_always_ empty after transmitting.  Therefore, your proposal is
equivalent to never clearing the interrupt, which is precisely what
David suggested.

In other words, the solution is simply to remove the 'bus_space_read'
at the end of 'com_common_putc' which clears the interrupts (as
suggested in PR #4263).

I have two machines that suffered from the serial console hang bug,
both with the console running at 4800 bps.  They both hung _every_
time I booted, but always resumed booting when I pressed a key on the
console keyboard (because the incoming character generates an
interrupt).  Applying the patch below fixed the problem with no
apparent ill side effects.

Please someone commit this change and close PR #4263.

*** src/sys/dev/ic/com.c.mine.base	Sun Aug 16 14:11:11 1998
--- src/sys/dev/ic/com.c	Tue Sep  1 11:49:48 1998
***************
*** 2063,2071 ****
  	while (!ISSET(stat = bus_space_read_1(iot, ioh, com_lsr), LSR_TXRDY)
  	    && --timo)
  		;
! 
! 	/* clear any interrupts generated by this transmission */
! 	stat = bus_space_read_1(iot, ioh, com_iir);
  	splx(s);
  }
  
--- 2063,2074 ----
  	while (!ISSET(stat = bus_space_read_1(iot, ioh, com_lsr), LSR_TXRDY)
  	    && --timo)
  		;
!        /*
! 	* Do not clear pending interrupts, as doing so would hang
! 	* any non-console transmission that happened to be in progress
! 	* when the console output occurred.  Instead, there will be a benign
! 	* spurious interrupt in the case that no output was in progress.
! 	*/
  	splx(s);
  }
  
-- 
Andreas Gustafsson, gson@araneus.fi