Subject: Re: direct I/O
To: Charles M. Hannum <abuse@spamalicious.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/03/2005 19:04:12
In message <200503040217.11775.abuse@spamalicious.com>,
"Charles M. Hannum" writes:
>On Friday 04 March 2005 01:28, Chuck Silvers wrote:
>> 34.52 370.95 370.95 709745 0.52 0.52 genfs_putpages
>> 5.51 695.74 59.24 529689671 0.00 0.00 pmap_clear_attrs
>> 2.17 805.80 23.33 532201987 0.00 0.00 pmap_tlb_shootnow
>
>I think it's pretty obvious here that genfs_putpages() is doing more work than
>
>necessary. In particular, I have trouble imagining why it needs to call
>pmap_clear_{modify,referenced}() that many times.
Isn't this the flat graph? If so, then (while I concur) the 34.52sec
in genfs_putpages() dominates the 5.51sec and 2.17sec in the pmap
funtions above.
Even when Chuck duplicated genfs_putpages, genfs_putpage_fsync plus
pmap_clear_attrs is roughly twice the combined copyin and copyout cost.
``Ouch!!''
Out of curiosity: why i386_copyin() but i846_copyout? Do we have a
486-tuned copyout but not a 486-tuned copyin? I seem to recall Linux
2.2 or newer has 3 or 4 different copy routines (and IP checksum
routines?) and dynamically chose the fastest. Tho I'd guess the
potential gains are smaller than the overhead in genfs_putpages,
relative to the FreeBSD/Linux find-dirty-page schemes. (But that is
just a guess.)