Subject: Faster serial ports.
To: tech-kern@netbsd.org, Netbsd pc532 list <port-pc532@netbsd.org>
From: Ian Dall <Ian.Dall@dsto.defence.gov.au>
List: tech-kern
Date: 01/27/1999 10:41:24
I have been experimenting with passing data up from device driver to
line discipline in chunks rather than a byte at a time.

The pc532 supports 230.4 kbaud serial IO with the right duarts. These also
have quite good hardware flow control support. I looped two ports
and ran ppp between them at the maximum rate.

When ftp'ing over this link I was only getting about 8kB/s. vmstat
shows that the cpu is close to 100% in system state. Profiling the
kernel showed that about 90% of the cpu time is in children of the
soft interrupt handler and most of that time is spent in pppinput. A
faster CPU would, of course give more throughput, but I would expect
the proprtion of time spent in pppinput to be roughly the same.

So, I have modified the line discipline structure to take an extra
call back (l_brint). To take advantage of this, drivers and line
disciplines need to be modified, but the old l_rint is retained and
new drivers work with old line disciplines and visa versa.

I have been able to approximately double throughput on the above ppp
test, though I am continuing to optimize the block version of
pppinput (pppbinput).

Is there interest in adopting this? There are a couple of design
decisions which could be debated, but the main thing I wanted to do at
this point is demonstrate the gains that are achievable to motivate
discussing the details.

On a related front, now that the input side has been sped up, one of
the largest consumers of time is getc. This is called on the output
side by scnstart in a loop to take characters off the output queue and
stick them in the uart's output fifo. getc seems to be fairly heavy
weight and I am wondering if there is a better way. For example getc
raises spltty, but we are already at spltty. Also I don't think the
quote stuff is relevant on output is it? I don't really want the
driver to have to know about the details of the output queue
implimentation, but a set of macros which a) provided a pointer to the
data, b) told one how much contiguous data there was, and c) flushed n
bytes from the queue, could all be implimented cheaply if they are
always called at spltty and quoting can be ignored.

Ian