Subject: Re: direct I/O
To: Charles M. Hannum <abuse@spamalicious.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/03/2005 19:04:12
In message <200503040217.11775.abuse@spamalicious.com>,
"Charles M. Hannum" writes:

>On Friday 04 March 2005 01:28, Chuck Silvers wrote:
>>  34.52    370.95   370.95   709745     0.52     0.52  genfs_putpages
>>   5.51    695.74    59.24 529689671     0.00     0.00  pmap_clear_attrs
>>   2.17    805.80    23.33 532201987     0.00     0.00  pmap_tlb_shootnow
>
>I think it's pretty obvious here that genfs_putpages() is doing more work than
> 
>necessary.  In particular, I have trouble imagining why it needs to call 
>pmap_clear_{modify,referenced}() that many times.

Isn't this the flat graph? If so, then (while I concur) the 34.52sec
in genfs_putpages() dominates the 5.51sec and 2.17sec in the pmap
funtions above.

Even when Chuck duplicated genfs_putpages, genfs_putpage_fsync plus
pmap_clear_attrs is roughly twice the combined copyin and copyout cost.
``Ouch!!''


Out of curiosity: why i386_copyin() but i846_copyout? Do we have a
486-tuned copyout but not a 486-tuned copyin? I seem to recall Linux
2.2 or newer has 3 or 4 different copy routines (and IP checksum
routines?) and dynamically chose the fastest. Tho I'd guess the
potential gains are smaller than the overhead in genfs_putpages,
relative to the FreeBSD/Linux find-dirty-page schemes. (But that is
just a guess.)