Subject: 1sec+ delays using msync(2) with flags MS_ASYNC | MS_INVALIDATE
To: None <current-users@netbsd.org>
From: Brian de Alwis <bsd@cs.ubc.ca>
List: current-users
Date: 04/18/2007 10:27:47
Summary: a call to msync(2) with flags MS_ASYNC | MS_INVALIDATE
appears to be done as a synchronous call, and can take >1sec.
Should it? How can I avoid it?

I'm packaging up crm114 for pkgsrc: crm114 is a powerful text
classifier that does particularly well for spam filtering, amongst
other uses.  (It's in wip/crm114; it's marked as broken for now as
some of its more esoteric classifiers currently fail the tests,
though it does compile and work with the usual classifiers.)

I'm trying to figure out why causes a long sustained disk write of
a second or more on each spam classification.  ktrace -R reports
that almost a second is spent in a call to __msync13():

    0.9941388845 CALL  __msync13(0xbb7df000,0x2dc714,3)

corresponding to a call in the source:

    msync (map->addr, map->actual_len, MS_ASYNC | MS_INVALIDATE);

crm114 does most of its file manipulation by mmaping the files to
memory and uses the contents as a hash table.  In my case there
are two spam classification mmaps (sparse files) which are 3,000,084
bytes each.  So although MS_ASYNC is specified, it appears the sync
is actually treated as synchronous.  Should it be?  How it be
avoided? (I'm not sure why the author uses MS_INVALIDATE.)

Compounding this problem is that the spam processing does this for
each message processed.

Brian.

-- 
  Brian de Alwis | Software Practices Lab | UBC | http://www.cs.ubc.ca/~bsd/
      "Amusement to an observing mind is study." - Benjamin Disraeli