tech-kern: some ugen throughput numbers with the USRP

Subject: some ugen throughput numbers with the USRP
To: None <tech-kern@netbsd.org>
From: Joanne M Mikkelson <jmmikkel@bbn.com>
List: tech-kern
Date: 07/22/2006 01:58:03
For anyone who might be interested in the improvement seen using the
USRP with the new ugen functionality, here are a few numbers
collected with the USRP testing tools.  Previously we could get about
4 MB/s on NetBSD while the specialized Linux "fusb" code would do 32
MB/s.  Now we are doing better than Linux on the bidirectional test
but still not quite making 32 MB/s unidirectional.

The full set of measurements taken (along with this summary) is
available at:
    http://acert.ir.bbn.com/viewvc/adroitgrdevel/adroitgrdevel/radio_test/usb/test-results?view=co

Summary
=======

The following USB throughput results were collected on two systems
with the same hardware, running NetBSD-current with our ugen changes
and SuSE Linux.

The ugen changes allow specifying the length of the transfer to
request from the host controller, and here the fusb_netbsd testing
code was recompiled with the different sizes.  The fusb_linux code
uses 16k requests (and says that this is the largest request
possible).  In both cases the USRP library's default buffer size of 2
MB was used.  The ugen driver could also be changed to avoid a copy to
the buffer in the driver, and these tests investigate how much
performance is improved in that case.


For reference, here is how interpolation/decimation relates to the
intended data rate:

data rate | decimation | interpolation
--------------------------------------
16 MB/s     16           32
18.3 MB/s   14           28
21.3 MB/s   12           24
25.6 MB/s   10           20
32 MB/s     8            16
42.6 MB/s   6            12


benchmark_usb.py (bidirectional test)

driver       | xfer size | maximum (read+write)
----------------------------------
NetBSD         16k         32 MB/s
Linux          16k         36.57 MB/s
NetBSD         64k         32 MB/s (usually gets 36.57)
NetBSD         128k        32 MB/s
NetBSD -copy   16k         32 MB/s
NetBSD -copy   64k         42.6 MB/s
NetBSD -copy   128k        42.6 MB/s


test_standard_usrp_rx

driver       | xfer size | maximum
----------------------------------
NetBSD         16k         21.3
Linux          16k         32
NetBSD         64k         25.6
NetBSD         128k        21.3
NetBSD -copy   16k         25.6
NetBSD -copy   64k         25.6
NetBSD -copy   128k        25.6

test_standard_usrp_tx

driver       | xfer size | maximum
----------------------------------
NetBSD         16k         21.3
Linux          16k         32
NetBSD         64k         25.6
NetBSD         128k        21.3
NetBSD -copy   16k         21.3
NetBSD -copy   64k         25.6
NetBSD -copy   128k        25.6


The Linux numbers suggest that there is about 36 MB/s bandwidth
available total (maybe more but less than 42), and it must be divided
between transmit and receive.  So 32 can be done one-way, but as soon
as one needs bidirectional traffic, neither direction can do 32.
Probably the USRP could be set up to use, say, 25.6 and 8 between read
and write instead of 16 and 16, but not 25.6 and 16.

This follows fairly well from the implementation.  On Linux, USRP
reads and writes are all done via a generic request mechanism funneled
through the control endpoint.  So the sum of reads and writes in
aggregate seems to be constrained by how fast data can be pushed
through this system.

With our NetBSD implementation, unless the transactions go in
lock-step and thus one of read and write has to wait while the other's
completion interrupt is being handled, read and write are handled
independently all the way down until you get to the host controller
driver.  Therefore the bidirectional numbers are more related to the
sum of the two unidirectional numbers, instead of bidirectional being
essentially equal to unidirectional as we're seeing with Linux.

The NetBSD numbers demonstrate that 128k transfers perform worse than
64k.  As would be expected, 128k transfers aren't worse with the extra
copy removed but they also aren't notably better.  So while there is
clearly too much cost copying 128k at a time vs. copying 64k, there is
still a lot of cost that's not in the copy at all, because the numbers
don't get vastly better when the copy is removed.  The latter cost is
what's preventing us from getting unidirectional rates comparable to
Linux.

Copying to/from user space is not showing to be the bottleneck; the
kernel debug logs clearly show that user space consumes and writes
faster than the bus in these tests.


Choosing a Good Buffer Size
===========================

The previous results are all using a buffer size of 2 MB (which is 2
MB for each of read and write with fusb_netbsd).  Also, all reads and
writes from user space were 16k.  The following tests indicated the
read and write length does not matter very much.  However, reducing
the buffer size from 2 MB demonstrably helps with the bidirectional
throughput.

Because the highest rate reached is not always the same, these results
include several runs of benchmark_usb.py.  The maximum rate is based
on what benchmark_usb.py claimed for five runs, trying to take into
account that all the higher transfer rates report underruns or
overruns occasionally.

driver       | xfer | buffer | maximum
             | size |  size  |  rate
--------------------------------------
NetBSD         16k    2M       32
NetBSD         64k    2M       32
NetBSD         128k   2M       32

NetBSD         16k    1M       32
NetBSD         32k    1M       36.57
NetBSD         64k    1M       36.57
NetBSD         128k   1M       32

NetBSD         32k    256k     36.57
NetBSD         64k    256k     42.6

NetBSD         32k    128k     36.57
NetBSD         64k    128k     42.6

NetBSD         32k    64k      36.57
NetBSD         64k    64k      36.57

NetBSD         16k    64k      32
NetBSD         4k     64k      32
NetBSD         4k     32k      32

It appears that the best performance for these tests is 64k transfers
and a 256k buffer.  The same is true with the copy removed, although
larger buffer and transfer sizes show an improvement:

driver       | xfer | buffer | maximum
             | size |  size  |  rate
--------------------------------------
NetBSD -copy   16k    2M       32
NetBSD -copy   64k    2M       42.6
NetBSD -copy   128k   2M       42.6

NetBSD -copy   64k    1M       42.6
NetBSD -copy   128k   1M       42.6

NetBSD -copy   32k    256k     42.6
NetBSD -copy   64k    256k     42.6

NetBSD -copy   32k    128k     36.57
NetBSD -copy   64k    128k     42.6