Subject: kern/26706: audio at neo: neo mismanages record intrs, audio drops record data
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <chap0@cs.purdue.edu>
List: netbsd-bugs
Date: 08/18/2004 14:57:29
>Number:         26706
>Category:       kern
>Synopsis:       audio at neo: neo mismanages record intrs, audio drops record data
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 18 17:25:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Chapman Flack
>Release:        i386 1.6.2 (but I just checked same code on cvs head)
>Organization:
Purdue CS
>Environment:
Machine is at home, can't run uname.  Dell Latitude CPi R400GT with NeoMagic MagicMedia 256ZX and audio at neo.  I reference the affected lines in neo.c and audio.c below so this pr should be unambiguous.  I cut and paste these lines from today's cvs HEAD.
>Description:
This pr describes two problems I had to correct with recording using audio at neo.  They seem to be ancient in CVS - I wonder if recording gets tested much ....

The first problem was that the userland reader process (whether audiorecord or just cat /dev/audio) simply blocked forever, with no data ever transferred.  Looking at audioctl -a while the reader was blocked showed the record error count constantly increasing, at the audio sample rate - the driver was simply dropping everything that came in, adding it to the drop statistic, and never satisfying the userland read that was active the whole time.

Here are some lines from /sys/dev/audio.c, function audio_rint:

	cb->used += blksize;
...
	if (cb->pause) {
		DPRINTFN(1, ("audio_rint: pdrops %lu\n", cb->pdrops));
...
	} else if (cb->used + blksize >= cb->usedhigh && !cb->copying) {
		DPRINTFN(1, ("audio_rint: drops %lu\n", cb->drops));

These are the two conditions where audio decides to drop data. The first is obvious, and it gets counted separately in pdrops. No problem there.

The second condition drops data if cb->used (which has already been incremented by blksize), plus blksize (again), is > OR EQUAL usedhigh (the ringbuffer highwater setting) and !copying (copying is certainly false on the first interrupt because there's nothing to copy yet).

The initial highwater setting from audio_calcwater is:
		sc->sc_rr.end - sc->sc_rr.start - sc->sc_rr.blksize;
... that is, the ringbuffer size minus one block.

The ringbuffer and block sizes are partly influenced by the hardware, which is why this code might seem to work with other hw layers.  However, with neo, the initial blocksize winds up at 1/3 of the buffer size.  Which means the initial highwater setting is buffersize -1*blksize, or 2*blksize, and the drop-data condition is immediately satisfied on the very first interrupt (and forever after that).

To get past this roadblock, I changed the >= to a strict > in the drop condition.  That works fine for me, though I haven't decided if it's the only or best way to fix the fundamental bug.  (Fundamentally I have my doubts about the overall design and maintainability of this audio.c, more later.)

That led me to the second problem.  This one's in neo.c and it's quite simple, it doesn't lead me to question the overall soundness of the code.  In /sys/dev/pci/neo.c, first look at the (working) play branch of the interrupt handler, neo_intr:

	if (status & sc->playint) {
		status &= ~sc->playint;

		sc->pwmark += sc->pblksize;
		sc->pwmark %= sc->pbufsize;

		nm_wr_4(sc, NM_PBUFFER_WMARK, sc->pbuf + sc->pwmark);

		nm_ackint(sc, sc->playint);

		if (sc->pintr)
			(*sc->pintr)(sc->parg);

		rv = 1;
	}

The neo is an interesting chip.  It constantly runs around your ringbuffer transferring data, and interrupts you now and then so you know what's been transferred.  You can't tell it a blocksize and have it interrupt for every block; instead, you write its WMARK register with some address within your ringbuffer and it interrupts you whenever it passes that address.  The code above, which works, gets interrupt-per-block behavior by doing this: keep a local copy of the wmark, advance it by blksize on every interrupt, and write the updated value to the chip's WMARK register so another interrupt is triggered one block later.

What's missing from the code in the record branch?

	if (status & sc->recint) {
		status &= ~sc->recint;

		sc->rwmark += sc->rblksize;
		sc->rwmark %= sc->rbufsize;

		nm_ackint(sc, sc->recint);
		if (sc->rintr)
			(*sc->rintr)(sc->rarg);

		rv = 1;
	}

Here the code advances its own local variable rwmark, but never gets around to telling the chip about it!  The chip's mark register still has the same value, and the chip won't interrupt again until it has run completely around the buffer and passed the same point again.  Because the initial buffer size is 3*blocksize, this means one interrupt is generated for every 3 blocks of record data.  But every interrupt calls sc->rintr (that's audio.c:audio_int) which, as we saw above, unconditionally does cb->used += blksize; that is, the fixed assumption is each interrupt represents ONE block.

The result is just what you would expect:
  dd if=/dev/sound bs=8000 count=10 of=foo
should complete in 10 seconds at a sample rate of 8000, but it really completes in 30.  The file foo is 80000 bytes and takes the correct 10 seconds to play back; what you hear is regular 3/4 sec samples from every 2 1/4 seconds of original input, smushed together.

There's no question what the right fix is in this case.  :)

		sc->rwmark += sc->rblksize;
		sc->rwmark %= sc->rbufsize;
+		nm_wr_4(sc, NM_PBUFFER_RMARK, sc->rbuf + sc->rwmark);
		nm_ackint(sc, sc->recint);

According to CVS, neo.c's been missing this line since r1.1.  :)

Now back to that larger issue of audio.c design.

There are two significant pieces of function here that I think would benefit from being factored out.  The ringbuffer manipulation is very elaborate and it doesn't convince me.  The design doesn't seem to be clear on what the low and high water marks are for (NB the audio.c water marks have nothing to do with the neo chip water marks), and the code isn't clearly doing the same thing the manpage says, either.

The basic piece of function here is a ringbuffer that has a DMA hardware device constantly running around it, with some asynchronous indication of the hardware's progress; the hw never stops running around the buffer but just flings interrupts as it flies by, so receiving an interrupt lets you make a conservative estimate how much space has been consumed.
Meanwhile, you want to satisfy uios whenever you can with available ringbuffer space.  Block size is not a fundamental concept here; the device doesn't care with what size transfers you fill or empty the buffer as long as you keep up, and there's no reason you can't fill or empty from a uio of any arbitrary size.  Making a 'blocksize' number available to userland query may be useful for letting the userland pick a reasonable transfer size for low overhead, but this is performance tuning only and should not change the normal semantics of read(2) and write(2) for any transfer size at all.  Similarly with high/low water settings; these may be useful for some performance tuning but the meanings have to be very clearly defined and reasonable.  An obvious reasonable use is to delay SIGIOs until there's a decent amount available; a possibly reasonable use is to delay poll() notification the same way.  What doesn't make as much sense is using the watermark as an excuse to block or co
 ntinue to block an existing read() or write() when you know you could satisfy it!

A dma_ringbuffer structure like that would probably be useful to other types of hardware and other drivers besides audio, and is probably just complicated enough to be worth doing once and getting the semantics clearly clean; drivers like audio should just use it.  I can't offer to put it together in the immediate future ... maybe later in the year ... I'm certainly willing to help hash out design details with anyone else interested.

Another factorable bit of function in audio.c is data conversion.
The conversions are all simple transformations (possibly expanding or
contracting) on a stream of data with a hw driver on one end and a userland request at the other.  I'm not sure I should breathe the word STREAMS, but surely netbsd has some in-kernel facility for accomplishing the same kind of thing without requiring the kind of ad-hoc code that's in audio.c, right?  In a pinch the uio api itself could do the trick, though there's probably room for more efficient ideas.
>How-To-Repeat:
audio at neo before audio.c fix applied:  cat /dev/audio
no output, cat blocks until killed, audioctl shows record error count (dropped data) incrementing at the full sampling rate.

audio.c fix applied but neo.c not fixed:
dd if=/dev/audio of=foo bs=8000 count=10
command completes in 30 secs rather than 10, file foo plays back in 10 secs but contains regular 3/4 sec samples from each 2 1/4 secs of original input.
>Fix:
audio.c: change >= to > in:
	} else if (cb->used + blksize >= cb->usedhigh && !cb->copying) {

this is more a 'workaround' than 'fix' - I am not convinced the overall audio.c design is sound except for this, or that this is really the right way to fix it, but it gets audio at neo recording in a pinch.

neo.c:
add the line marked + in the record branch of neo_intr:
		sc->rwmark += sc->rblksize;
		sc->rwmark %= sc->rbufsize;
+		nm_wr_4(sc, NM_PBUFFER_RMARK, sc->rbuf + sc->rwmark);
		nm_ackint(sc, sc->recint);

This is a 'fix' - no serious design doubts about neo.c, it's just missing this line.
>Release-Note:
>Audit-Trail:
>Unformatted: