tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: very bad behavior on overquota writes

On 11/21/12 14:02, Manuel Bouyer wrote:
I've been looking at performance issues on our NFS server, which I tracked
down to overquota writes. The problem is caused by software that do
writes without error checkings. When doing this, the nfsd threads becomes
100% busy, and nfs requests from other clients can de delayed by
several seconds.
To reproduce this, I've used the attached program. Basically it does an
endless write, without error checking. I first ran it on a NFS client against
a test nfs server and could reproduce the problem. The I ran it
directly on the server against the ffs-exported filesystem, and
could see a similar behavior:
When the uid running it is overquota, the process start using 100% CPU in
system and the number of write syscall per second drops dramatically (from
about 170 to about 20). I can see there is still some write activity on the
disk (about 55 KB/s, 76 writes/s).

The problem is that when we notice we can't do the write, ffs_write() already
did some things that needs to be undone. one of them, which is time consuming,
is to trucate the file back to its original size. Most of the time is
spent in genfs_do_putpages().

The problem here seems to be that we always to a page list walk because
endoff is 0. If the file is large enough to have lots of pages in core,
a lot of time is spent here.

The attached patch improves this a bit, by not always using a list walk.
but I wonder if this could cause some pages to be lost until the vnode
is recycled. AFAIK v_writesize nor v_size can be shrunk without
genfs_do_putpages() being called, but I may have missed something.

I'll also see if we can take shortcuts in ffs_writes at last for some
trivial cases.

SunOS 4 used to have some strange behaviour with overquota writes to an NFS filesystem.  It's a long time ago, but as far as I remember, the overquota writes would appear to succeed and then the close would fail with EDQUOT.  There's an awful lot of code which doesn't bother check for errors on close (and even if you did, it was difficult to know how to handle the error)!




Roger Brooks,                            |  Email:
Computing Services Dept,                 |  Tel:   +44 151 794 4441
The Computer Laboratory                  |
The University of Liverpool,             | 
Liverpool L69 3BX, UK                    | 

Home | Main Index | Thread Index | Old Index