Subject: Re: Thread benchmarks
To: Andrew Doran <ad@netbsd.org>
From: Kris Kennaway <kris@FreeBSD.org>
List: tech-kern
Date: 10/02/2007 00:26:55
Andrew Doran wrote:

>> In tests that have been run on p4 hardware, the FreeBSD system's graph
>> looks more like NetBSD's than the one presented here.  FreeBSD's kernel
>> has a lot of debugging options that hurt performance on by default.  Also,
>> FreeBSD's malloc defaults to 'AJ' in head, which would result in reduced
>> performance.
> 
> I can try turning off debugging in the allocator. What else would you like
> me to try? I would like to provide remote access to the two systems but
> unfortunatley my Internet link is unreliable and I'm not in a position to
> leave them on 24x7. Some details on the test. I grabbed my.cnf from Jeff
> Roberson's weblog:

You should rebuild malloc with MALLOC_PRODUCTION defined (edit 
lib/libc/stdlib/malloc.c) as well as making sure that either 
/etc/malloc.conf is removed or symlinked to 'aj'.  This is pretty important.

Could you also provide a copy of your FreeBSD kernel configuration file 
just so we can double-check?

> 	http://people.freebsd.org/~jeff/bsd.cnf

OK, the only difference to my config is that I have

innodb_log_file_size=900M

instead of 100M.

> Relevant bits of dmesg from the MySQL host:
> 
> total memory = 2047 MB
> avail memory = 2008 MB
> cpu0: Intel Pentium III Xeon (686-class), 701.64 MHz, id 0x6a1
> cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
> cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
> cpu0: features 383fbff<FXSR,SSE>
> cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
> cpu0: L2 cache 1 MB 32B/line 8-way
> cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
> cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
> fxp0 at pci1 dev 6 function 0: i82559 Ethernet, rev 8
> fxp0: interrupting at ioapic0 pin 3 (irq 3)
> fxp0: Ethernet address 00:02:a5:45:a6:48
> inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
> inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> 
> The disk subsystem doesn't matter since I was running the read-only test,
> and with 10000 rows everything fits in core. I compiled MySQL by hand on
> each system:
> 
> ./configure --prefix=/local/mysql --with-pthread --with-innodb

OK.  The FreeBSD port also defines

                 --enable-thread-safe-client
                 --without-debug
		--enable-assembler

(and some other options that don't look relevant).  --with-pthread might 
  enable the first option but if not it could cause performance 
anomalies (i.e. this is relevant for the client, of course).  For 
example I accidentally built postgresql without threaded client support 
recently and spent a while trying to work out why sysbench suddenly ran 
at half speed.

> Everything but necessary processes were killed on the two systems, so they
> were running at most sshd, screen, sysbench and the minimum to be able to
> log in. I did a warm-up run and then started testing:
> 
> for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do
>         echo "=> ${i} THREADS"
>         sysbench --test=oltp --db-driver=mysql --mysql-host=${HOST} \
>             --mysql-user=root --mysql-table-engine=innodb --num-threads=${i} \
>             --max-time=60 --max-requests=0 --oltp-read-only=on run | \
> 	    tee -a ${HOST}.txt
> done

I use

sysbench --test=oltp --num-threads=$1 --mysql-user=root --max-time=120 
--max-requests=0 --oltp-read-only=on --db-driver=mysql 
--mysql-host=192.168.5.120 run

which seems to be equivalent (the default table engine is innodb in our 
config).

Can you run 'vmstat -w 1' for e.g. 30 seconds on your FreeBSD system 
when the test is running?  I see total CPU usage at 100%, with system at 
20-25% and the rest user.

> The two systems are connected via 100Mbps switch. The sysbench host was
> running NetBSD/i386 4.99.30 and has a dual core CPU:

I tested on a quad 500 MHz p3 (i.e. 30% slower clock speed than your 
system), via 100Mbps em0.  Performance was already at the level of the 
FreeBSD curve on your graph (about 320 tps across a range of loads), and 
if I scale up by 700/500 then it's about the same as your NetBSD curve. 
  I suspect that this will actually underestimate performance a bit 
because the CPU is an older generation than yours, so the difference is 
not just clock speed.  One thing that is kind of interesting is that 
some of the locking optimizations that we have not yet committed don't 
make a difference on this machine and workload, apparently they are only 
important at 8 CPUs and above.

Anyway, this all suggests to me that something is going wrong on your 
system, so if the above doesn't help then we'll have to look closer. 
One other possibility is that your NIC may be misbehaving.

Kris