Subject: fsync performance hit on 1.6.1
To: None <tech-kern@netbsd.org>
From: Daniel Brewer <danielb@cat.co.za>
List: tech-kern
Date: 07/02/2003 14:18:15
This is a multi-part message in MIME format.

------=_NextPart_000_010C_01C340A4.C72041A0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi

When comparing our software's CPU utilization running on 151 & on 161, I =
noticed an inexplicably high usage on 161. After digging deeper with =
gprof, I found that an fsync on 161 takes significantly longer than on =
151. Our software writes captured images into a ring buffer in a memory =
file-system, so other servers can retrieve them. Could someone explain =
the why fsync on 1.6.1 is so significantly slower than on 1.5.1? Is =
there a work-around? Or perhaps a completely different way of doing =
this?

The process involved after capture, is as follows:

flock(fdPostBuf, LOCK_EX)
lseek(fdPostBuf, img_offset, SEEK_SET)
write(fdPostBuf, img_buf, img_size)
fsync(fdPostBuf)
flock(fdPostBuf, LOCK_UN)

Both the 151 and 161 system are running the same processor (1.2GHz =
Celeron) both have the same motherboard, have 128MB ram and are both =
running Western Digital 20Gig drives. Running the 161 box on a 2GHz =
celeron with 256MB ram gives comparable results to the 1.2Ghz Celeron =
with 128MB ram.

Here's an extract from the output of gprof on 1.5.1 :
  %   cumulative   self              self     total          =20
 time   seconds   seconds    calls  ms/call  ms/call  name  =20
  1.95     43.18     0.93    16364     0.06     0.06  write
  0.36     45.99     0.17     8025     0.02     0.02  fsync
  0.29     46.56     0.14    16052     0.01     0.01  flock
  0.04     47.49     0.02     8025     0.00     0.27  =
WritePostEventDisk(int, int, int, int)

Here's the same extract from the output of gprof on 1.6.1 :
  %   cumulative   self              self     total          =20
 time   seconds   seconds    calls  ms/call  ms/call  name =20
 60.10     23.15    23.15     8067     2.87     2.87  fsync
  5.87     34.52     2.26    16337     0.14     0.14  write
  0.42     36.76     0.16    16136     0.01     0.01  flock
  0.16     37.76     0.06     8067     0.01     3.30  =
WritePostEventDisk(int, int, int, int)

Thanks
-Daniel

------=_NextPart_000_010C_01C340A4.C72041A0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.3315.2870" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY background=3D"" bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hi<BR><BR>When comparing our software's =
CPU=20
utilization running on 151 &amp; on 161, I noticed an inexplicably high =
usage on=20
161. After digging deeper with gprof, I found that an fsync on 161 takes =

significantly longer than on 151. Our software writes captured images =
into a=20
ring buffer in a memory file-system, so other servers can retrieve them. =
Could=20
someone explain the why fsync on 1.6.1 is so&nbsp;significantly slower =
than on=20
1.5.1? Is there a work-around? Or perhaps a completely different way of =
doing=20
this?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>The process involved after capture, is =
as=20
follows:</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>flock(fdPostBuf, LOCK_EX)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>lseek(fdPostBuf, img_offset, =
SEEK_SET)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>write(fdPostBuf, img_buf, =
img_size)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>fsync(fdPostBuf)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>flock(fdPostBuf, LOCK_UN)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Both the 151 and 161 system are running =
the same=20
processor (1.2GHz Celeron) both have the same motherboard, have 128MB =
ram and=20
are both running Western Digital 20Gig drives. Running the 161 box on a =
2GHz=20
celeron with 256MB ram gives comparable results to the 1.2Ghz Celeron =
with 128MB=20
ram.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Here's an extract from the output of =
gprof on 1.5.1=20
:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; %&nbsp;&nbsp; =
cumulative&nbsp;&nbsp;=20
self&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;=20
self&nbsp;&nbsp;&nbsp;&nbsp;=20
total&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
<BR>&nbsp;time&nbsp;&nbsp; seconds&nbsp;&nbsp; seconds&nbsp;&nbsp;&nbsp; =

calls&nbsp; ms/call&nbsp; ms/call&nbsp; name&nbsp;&nbsp; </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 1.95&nbsp;&nbsp;&nbsp;&nbsp;=20
43.18&nbsp;&nbsp;&nbsp;&nbsp; 0.93&nbsp;&nbsp;&nbsp;=20
16364&nbsp;&nbsp;&nbsp;&nbsp; 0.06&nbsp;&nbsp;&nbsp;&nbsp; 0.06&nbsp;=20
write</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 0.36&nbsp;&nbsp;&nbsp;&nbsp;=20
45.99&nbsp;&nbsp;&nbsp;&nbsp; 0.17&nbsp;&nbsp;&nbsp;&nbsp;=20
8025&nbsp;&nbsp;&nbsp;&nbsp; 0.02&nbsp;&nbsp;&nbsp;&nbsp; 0.02&nbsp;=20
fsync</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 0.29&nbsp;&nbsp;&nbsp;&nbsp;=20
46.56&nbsp;&nbsp;&nbsp;&nbsp; 0.14&nbsp;&nbsp;&nbsp;=20
16052&nbsp;&nbsp;&nbsp;&nbsp; 0.01&nbsp;&nbsp;&nbsp;&nbsp; 0.01&nbsp;=20
flock</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 0.04&nbsp;&nbsp;&nbsp;&nbsp;=20
47.49&nbsp;&nbsp;&nbsp;&nbsp; 0.02&nbsp;&nbsp;&nbsp;&nbsp;=20
8025&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp; 0.27&nbsp;=20
WritePostEventDisk(int, int, int, int)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Here's the same extract from the output =
of gprof on=20
1.6.1 :</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; %&nbsp;&nbsp; =
cumulative&nbsp;&nbsp;=20
self&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;=20
self&nbsp;&nbsp;&nbsp;&nbsp;=20
total&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
<BR>&nbsp;time&nbsp;&nbsp; seconds&nbsp;&nbsp; seconds&nbsp;&nbsp;&nbsp; =

calls&nbsp; ms/call&nbsp; ms/call&nbsp; name&nbsp; </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;60.10&nbsp;&nbsp;&nbsp;&nbsp;=20
23.15&nbsp;&nbsp;&nbsp; 23.15&nbsp;&nbsp;&nbsp;&nbsp;=20
8067&nbsp;&nbsp;&nbsp;&nbsp; 2.87&nbsp;&nbsp;&nbsp;&nbsp; 2.87&nbsp;=20
fsync</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 5.87&nbsp;&nbsp;&nbsp;&nbsp;=20
34.52&nbsp;&nbsp;&nbsp;&nbsp; 2.26&nbsp;&nbsp;&nbsp;=20
16337&nbsp;&nbsp;&nbsp;&nbsp; 0.14&nbsp;&nbsp;&nbsp;&nbsp; 0.14&nbsp;=20
write</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 0.42&nbsp;&nbsp;&nbsp;&nbsp;=20
36.76&nbsp;&nbsp;&nbsp;&nbsp; 0.16&nbsp;&nbsp;&nbsp;=20
16136&nbsp;&nbsp;&nbsp;&nbsp; 0.01&nbsp;&nbsp;&nbsp;&nbsp; 0.01&nbsp;=20
flock</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; 0.16&nbsp;&nbsp;&nbsp;&nbsp;=20
37.76&nbsp;&nbsp;&nbsp;&nbsp; 0.06&nbsp;&nbsp;&nbsp;&nbsp;=20
8067&nbsp;&nbsp;&nbsp;&nbsp; 0.01&nbsp;&nbsp;&nbsp;&nbsp; 3.30&nbsp;=20
WritePostEventDisk(int, int, int, int)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Thanks</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>-Daniel</FONT></DIV></BODY></HTML>

------=_NextPart_000_010C_01C340A4.C72041A0--