tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: dirhash code



Hi Matt, hi folks,

On Fri, Aug 29, 2008 at 09:54:13AM -0700, Matt Thomas wrote:
> >i've committed a generic dirhash support as a kernel module.
> 
> What it is used for?  Why does it need to be generic?

This dirhash support is for accelerating directory lookup actions for 
creation/deletion and lookup in general. From the log message i posted 
earlier on commit:

------------------
Log Message:
Implement directory hashing to speed up directory traversals. Speed
improvements of at least 4 times in untarring and roughly 100 to 500 times  
on file creation in big directories. Lookup of files was O(n*n) and is now  
O(1) even for file creation. Free spaces in the directory are kept in a
seperate list for fast file creation.      

The postmark benchmark gives:

UDF old:  
pm>set transactions 2000 
pm>set number 3000
pm>run    
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
        1593 seconds total
        681 seconds of transactions (2 per second)

Files:
        3956 created (2 per second)
                Creation alone: 3000 files (4 per second)
                Mixed with transactions: 956 files (1 per second)
        990 read (1 per second)
        1010 appended (1 per second)
        3956 deleted (2 per second)
                Deletion alone: 2912 files (9 per second)
                Mixed with transactions: 1044 files (1 per second)

Data:
        5.26 megabytes read (3.38 kilobytes per second)
        21.93 megabytes written (14.10 kilobytes per second)
pm>

UDF new:
pm>set transactions 2000
pm>set number 3000
pm>run
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
        19 seconds total
        3 seconds of transactions (666 per second)

Files:
        3956 created (208 per second)
                Creation alone: 3000 files (230 per second)
                Mixed with transactions: 956 files (318 per second)
        990 read (330 per second)
        1010 appended (336 per second)
        3956 deleted (208 per second)
                Deletion alone: 2912 files (970 per second)
                Mixed with transactions: 1044 files (348 per second)

Data:
        5.26 megabytes read (283.66 kilobytes per second)
        21.93 megabytes written (1.15 megabytes per second)

---------------------

As you can see its significantly increases directory operations. Since it 
only records generic information like the name hash, name length, offset 
and slot size for entries and slot size for empty slots it can not only be 
used for UDF but also for msdosfs and even UFS. The FS is free do with the 
lookup results. The answers from the dirhash are meant to be authorative 
too. A directory is hashed or not; not in-between.

The 2nd FS i'd like to use it for is nilfs2 but msdosfs might be a willing 
target first too. I think it would be strange to replicate this code over 
and over again. That was my motivation.

With regards,
Reinoud

Attachment: pgp4bo8eEn9U0.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index