Subject: Re: fsync performance hit on 1.6.1
To: Daniel Brewer <danielb@cat.co.za>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 07/06/2003 09:00:12
hi,

before 1.6, data from write() was cached in two separate lists,
one for clean data and one for dirty data, so if there were many
clean buffers and a few dirty ones, it was very fast for fsync()
to find the few dirty buffers to write them out.  in 1.6 and later,
the data is cached in a single list (of VM pages this time instead
of buffers) and so fsync() has to do a lot more work to find the
small number of dirty pages amidst all the clean pages.

it would probably be worthwhile to split up the pages for a file
into two different lists like the buffers were before.
that will be a fair bit of work, though, so it's not likely that
anyone will do it soon.

to work around this problem for now, if you don't continue to access
the data you're writing to the file you could try using msync() with
MS_INVALIDATE instead of fsync() to write the data to disk.  that will
eliminate all (or at least most) of the clean pages in memory so that
the kernel won't need to keep scanning them all the time.  if you do
continue to access the data after you've written it, then I don't
have any good ideas.

-Chuck


On Wed, Jul 02, 2003 at 02:18:15PM +0200, Daniel Brewer wrote:
> Hi
> 
> When comparing our software's CPU utilization running on 151 & on 161, I noticed an inexplicably high usage on 161. After digging deeper with gprof, I found that an fsync on 161 takes significantly longer than on 151. Our software writes captured images into a ring buffer in a memory file-system, so other servers can retrieve them. Could someone explain the why fsync on 1.6.1 is so significantly slower than on 1.5.1? Is there a work-around? Or perhaps a completely different way of doing this?
> 
> The process involved after capture, is as follows:
> 
> flock(fdPostBuf, LOCK_EX)
> lseek(fdPostBuf, img_offset, SEEK_SET)
> write(fdPostBuf, img_buf, img_size)
> fsync(fdPostBuf)
> flock(fdPostBuf, LOCK_UN)
> 
> Both the 151 and 161 system are running the same processor (1.2GHz Celeron) both have the same motherboard, have 128MB ram and are both running Western Digital 20Gig drives. Running the 161 box on a 2GHz celeron with 256MB ram gives comparable results to the 1.2Ghz Celeron with 128MB ram.
> 
> Here's an extract from the output of gprof on 1.5.1 :
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name   
>   1.95     43.18     0.93    16364     0.06     0.06  write
>   0.36     45.99     0.17     8025     0.02     0.02  fsync
>   0.29     46.56     0.14    16052     0.01     0.01  flock
>   0.04     47.49     0.02     8025     0.00     0.27  WritePostEventDisk(int, int, int, int)
> 
> Here's the same extract from the output of gprof on 1.6.1 :
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name  
>  60.10     23.15    23.15     8067     2.87     2.87  fsync
>   5.87     34.52     2.26    16337     0.14     0.14  write
>   0.42     36.76     0.16    16136     0.01     0.01  flock
>   0.16     37.76     0.06     8067     0.01     3.30  WritePostEventDisk(int, int, int, int)
> 
> Thanks
> -Daniel