Subject: ioflush is taking excessive cycles on a big-memory machine
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 04/04/2005 19:50:52
Something I've done has got my big AlphaServer (with 16GB of RAM) into a
state where the [ioflush] thread is chewing up a minimum of "25%"
(according to 'top') of the (a?) CPU, and at times it jumps up to 50% or
higher. This also makes the machine very sluggish -- just typing to the
command-line can have multi-second lag between keystroke and echo (with
command-line editing).
ioflush has now, after ~50hrs uptime, accumulated more CPU time than any
other long-running process (including even ntpd).
load averages: 1.65, 1.62, 1.82 18:54:49
83 processes: 82 sleeping, 1 on processor
CPU states: 0.0% user, 0.0% nice, 37.6% system, 10.4% interrupt, 52.0% idle
Memory: 7189M Act, 4120M Inact, 2112K Wired, 48M Exec, 11G File, 1053M Free
Swap: 4000M Total, 4000M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
13 root 18 0 0K 1351M syncer/2 662:43 50.20% 50.20% [ioflush]
5202 woods 3 0 1104K 888K ttyin/3 8:30 37.30% 37.30% systat
5212 root 48 0 42M 42M CPU/3 3:10 23.34% 23.34% top
25675 root 2 0 2584K 1784K select/1 13:31 19.09% 19.09% sshd2
177 root 18 -12 1280K 1288K pause/1 289:17 15.33% 15.33% ntpd
111 root 2 0 632K 496K select/3 22:43 8.01% 8.01% syslogd
126 dns 2 4 94M 91M select/1 74:07 4.49% 4.49% named
272 root 2 0 3016K 2728K select/2 140:24 4.10% 4.10% httpd
9710 cyrus 2 4 4504K 664K select/2 36:43 2.78% 2.78% idled
190 root 2 0 224K 184K netio/1 10:07 1.46% 1.46% rwhod
7259 root 2 0 16M 14M select/3 13:41 0.78% 0.78% emacs
317 root 2 0 2560K 1808K select/0 20:47 0.10% 0.10% sshd2
This is after rsync'ing about 750k files (1.2GB) from another machine
and then moving them around between different filesystems a couple of
times.
However the copying was all done over 10 hours ago and all the dirty
buffers do seem to have been flushed by now.
Typing 'sync' causes the machine to really bog down for about a minute,
but as far as I can tell from systat or vmstat not much I/O happens. It
just seems to take that long to scan for dirty buffers. Also, no matter
how many times I run 'sync', the ioflush thread stays just as busy.
Now I currently have vm.filemax set rather high in hopes of caching as
much file data as possible, since the first shuffling of files was a
trial run and I need to do it all over again.
# sysctl vm | sort
vm.anonmax = 60
vm.anonmin = 40
vm.execmax = 50
vm.execmin = 40
vm.filemax = 60
vm.filemin = 5
vm.loadavg: 1.39 1.59 1.78
vm.maxslp = 20
vm.nkmempages = 32768
vm.uspace = 16384
As hinted by 'top' you'll see this did have the effect I was hoping for
with 11G of file cache used (though I don't know why that's almost an
order of magnitude higher than the on-disk size....). This usage is
confirmed with 'systat buf' (which has the patches to display all these
numbers):
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||||||
16384 metadata buffers using 1676960 kBytes of memory (10%).
1416345 pages for cached file data using 11330760 kBytes of memory (69%).
6162 pages for executables using 49296 kBytes of memory ( 0%).
48314 pages for anon (non-file) data 386512 kBytes of memory ( 2%).
139709 free pages 1117672 kBytes of memory ( 7%).
File System Bufs used % kB in use % Bufsize kB % Util %
/home 14756 90 29784 2 1510104 90 2
/ 1227 7 19282 1 125704 7 15
/var/spool/imap 306 1 834 0 31416 2 3
/usr/pkg 41 0 222 0 4200 0 5
/var 35 0 256 0 3608 0 7
/var/log 10 0 264 0 1016 0 26
/tmp 1 0 1 0 104 0 1
Total: 16376 99 50643 3 1676152 100 3
(Note all those filesystems, except the MFS /tmp, are mounted with soft
dependencies enabled)
However actually using this many filecache pages seems to have
overwhelmed ioflush -- it seems to need "too many" cycles now just to
discover that there's nothing for it to do.
I'm going to reduce vm.filemax to maybe 25% and vm.filemin to 1%, but I
don't hold much hope at being able to actually limit the filecache.
(It seems a reboot is the easiest way to free the filecache :-)
A long time ago this post was made suggesting a way to force ioflush to
calm down by increasing the length of time the thread sleeps:
http://mail-index.netbsd.org/port-mac68k/2005/01/19/0002.html
I doubt that'll help here though since it seems the "current work" is
taking longer than a second to complete even when there's no real work
to do. And of course it won't make the search algorithm any faster
either....
Has anyone been able to profile ioflush (with softdep) recently?
Any other ideas how I might avoid this overhead in production?
--
Greg A. Woods
H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>