Re: Serious WAPL performance problems

To: Edgar Fuß <ef%math.uni-bonn.de@localhost>, tech-kern%NetBSD.org@localhost
Subject: Re: Serious WAPL performance problems
From: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Date: Tue, 23 Oct 2012 10:11:52 -0700

On Oct 23,  6:51pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Serious WAPL performance problems
} We are facing some very serious file system performance problems on 6.0 which 
} we attribute to WAPL. Comparable 4.0.1 machines with softdep are performing 
} much, much better. Having essentially skipped 5, I cannot easily compare log 
} to softdep on identical hardware.
} 
} The most prominent way to trigger the problem is running an svn update 
command 
} on a certain repository (having a large number of files) with the working 
copy 
} mounted over NFS. This will stall the file server's discs to the point where 
} you get "NFS server not responding, still trying" messages.
} Tracing that svn update (both ktrace and tcpdump) reveals the unusual thing 
it 
} does ist creating some 2,500 .lock files scattered around the directory tree 
} only to unlink all of them just seconds later.
} If you run that command with the working copy on a local (WAPL) file system, 
} it finishes in under 2 seconds, but running iostat shows that some seconds 
} later, the disc (actually a RAID) the fs holding the wc is on is 100% busy 
for 
} 18 seconds.
} If you access the same working copy over NFS, the update takes 20 to 30 
} seconds. During that period, the discs are initially silent for 5-10 seconds, 
} then 100% busy for 8-15 seconds, then silent for 5-7 seconds, busy for 5-10s, 
} silent for 7-9s, busy for 17s. In case you didn't add the times: that too 
} extends to after the command has finished.
} Running the same command on a 4.0.1 system with the wc on a (local, I didn't 
} try NFS) fs with softdeps, it also takes under 2 seconds, but after that, the 
} discs are completely silent save a two-second period some ten seconds later.
} There are similar issues (again, on 6 but not on 4) with svn checkout or a 
} rm -rf of the wc.
} 
} How to debug/analyze/tune this? While we can move our svn working copies from 
} NFS to local storage, this sounds like a problem that can hit other users, 
too.
} 
} Btw, PenguinOS's logging seems also not to have this issue: Having the wc on 
an 
} ext3 fs also makes the disc busy for just a second or two.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=

        Hello.  If possible, I suggest trying the latest 5.1 sources, which
contain the namei fixes David Hollan put into NetBSD-6 as well as allowing
you to compare WAPBL and softdep performance directly.  Having said that,
is it possible for you to get the output of ps -lax on the NFS server
during the 18-20 second window of complete busyness?   Perhaps that will
tell us why it is that NFS processing ceases while all of the logs are
being played and written to disk.

-thanks
-Brian

Follow-Ups:
- Re: Serious WAPL performance problems
  - From: Edgar Fuß
- Re: Serious WAPL performance problems
  - From: Edgar Fuß
- Re: Serious WAPL performance problems
  - From: Edgar Fuß

References:
- Serious WAPL performance problems
  - From: Edgar Fuß

Prev by Date: Serious WAPL performance problems
Next by Date: Re: suenv
Previous by Thread: Serious WAPL performance problems
Next by Thread: Re: Serious WAPL performance problems
Indexes:

Home | Main Index | Thread Index | Old Index