Subject: Re: Meta package for pkgsrc developer tools
To: NetBSD Packages Technical Discussion List <tech-pkg@netbsd.org>
From: Ronald J. Roskens <roskens@sl.econet.com>
List: tech-pkg
Date: 08/08/2006 10:47:16
--=-fjlDWgMBkl7E6gBxsf09
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Mon, 2006-08-07 at 19:42 -0400, Greg A. Woods wrote:

> At Sun, 06 Aug 2006 02:51:13 +0200,
> Joerg Sonnenberger wrote:
> > 
> > On Sat, Aug 05, 2006 at 03:50:43PM -0400, Greg A. Woods wrote:
> > > At Sat, 05 Aug 2006 15:18:26 +0200,
> > > Joerg Sonnenberger wrote:
> > > > 
> > > > On Fri, Aug 04, 2006 at 07:43:35PM -0700, John Nemeth wrote:
> > > > > On Dec 25,  1:00pm, Lubomir Sedlacik wrote:
> > > > > } On Sat, Aug 05, 2006 at 01:09:36AM +0900, Ryo HAYASAKA wrote:
> > > > > } > I think pkgclean and pkgfind are useful tools, too.
> > > > > } 
> > > > > } remind me, why do we have such an useless package as pkgclean in pkgsrc,
> > > > > } again?
> > > > > } 
> > > > > } (if you don't want clutter in your pkgsrc tree, mount it read-only and
> > > > > }  set WRKOBJDIR.  and when you feel like it, just wipe WRKOBJDIR clean
> > > > > }  with rm.  it's even faster!)
> > > > > 
> > > > >      cd /usr/pkgsrc && rm -r */*/work is pretty fast as well.
> > > > 
> > > > But not really portable.
> > > > 	find /usr/pkgsrc -name work -exec rm -r {} \;
> > > > doesn't hit command line limits.
> > > 
> > > but that's not as fast or efficient it could/should be.  Use xargs(1)!
> > 
> > Ironically, xargs is not necessarily faster. The command above needs
> > more execs, but has the advantage that find doesn't visit the work
> > directories.
> 
> Guess all the metadata for my pkgsrc tree must be in the buffer cache
> much of the time!  :-)
> 
> Sorry, I didn't actually think about the guts of the work directories
> themselves since I haven't actually used pkgsrc that way for half a
> decade or so.  :-) I was more worried about the just CVS subdirectories
> chewing up unnecessary time and I/Os.
> 
> 
> The sad thing is the shell glob example is about two orders of magnitude
> faster (at least when you don't have that many work subdirs) than any
> form or use of find.
> 
> This machine's pkgsrc tree hasn't been touched or looked at since it was
> booted (it also doesn't have any work subdirs, just a few work symlinks,
> but it does seem to have enough buffer cache to store the whole tree's
> metadata):
> 
> 19:19 [56] $ time sh -c 'echo */*/work > /dev/null'
>    11.05s real     0.01s user     0.57s system
> 19:19 [57] $ time sh -c 'echo */*/work > /dev/null'
>     0.04s real     0.00s user     0.03s system
> 19:20 [58] $ time sh -c 'echo */*/work > /dev/null'
>     0.04s real     0.01s user     0.03s system
> 19:20 [59] $ time sh -c 'echo */*/work > /dev/null'
>     0.04s real     0.01s user     0.02s system
> 19:20 [60] $ time find . -name work -print > /dev/null
>    37.83s real     0.39s user     2.69s system
> 19:21 [61] $ time find . -name work -print > /dev/null
>    37.76s real     0.37s user     2.17s system
> 19:22 [62] $ time find . \( -name CVS -prune \) -o \( -name work -print \) > /dev/null
>    28.24s real     0.22s user     1.25s system
> 19:23 [63] $ time find . \( -name CVS -prune \) -o \( -name work -print \) > /dev/null
>    25.98s real     0.21s user     1.10s system
> 
> Thankfully GNU Find isn't really any faster:
> 
> 19:27 [70] $ time gfind . \( -name CVS -prune \) -o \( -name work -print \) > /dev/null
>    26.76s real     0.19s user     1.29s system
> 19:28 [71] $ time gfind . -name work -print > /dev/null
>    37.26s real     0.39s user     2.05s system
> 
> 
> > Using maxdepth would have a similiar result.
> 
> Hmmm.... "find: -maxdepth: unknown option"  :-)
> 
> Perhaps you're thinking about "-prune", as per above?  Or GNU Find?


What does

/usr/bin/time find . -mindepth 3 -maxdepth 3 -name work > /dev/null

result in?

Ron

--=-fjlDWgMBkl7E6gBxsf09
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
  <META NAME="GENERATOR" CONTENT="GtkHTML/3.10.1">
</HEAD>
<BODY>
On Mon, 2006-08-07 at 19:42 -0400, Greg A. Woods wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">At Sun, 06 Aug 2006 02:51:13 +0200,</FONT>
<FONT COLOR="#000000">Joerg Sonnenberger wrote:</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; On Sat, Aug 05, 2006 at 03:50:43PM -0400, Greg A. Woods wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; At Sat, 05 Aug 2006 15:18:26 +0200,</FONT>
<FONT COLOR="#000000">&gt; &gt; Joerg Sonnenberger wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; On Fri, Aug 04, 2006 at 07:43:35PM -0700, John Nemeth wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; On Dec 25,  1:00pm, Lubomir Sedlacik wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } On Sat, Aug 05, 2006 at 01:09:36AM +0900, Ryo HAYASAKA wrote:</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } &gt; I think pkgclean and pkgfind are useful tools, too.</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } remind me, why do we have such an useless package as pkgclean in pkgsrc,</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } again?</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; } (if you don't want clutter in your pkgsrc tree, mount it read-only and</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; }  set WRKOBJDIR.  and when you feel like it, just wipe WRKOBJDIR clean</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; }  with rm.  it's even faster!)</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; &gt;      cd /usr/pkgsrc &amp;&amp; rm -r */*/work is pretty fast as well.</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; But not really portable.</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; 	find /usr/pkgsrc -name work -exec rm -r {} \;</FONT>
<FONT COLOR="#000000">&gt; &gt; &gt; doesn't hit command line limits.</FONT>
<FONT COLOR="#000000">&gt; &gt; </FONT>
<FONT COLOR="#000000">&gt; &gt; but that's not as fast or efficient it could/should be.  Use xargs(1)!</FONT>
<FONT COLOR="#000000">&gt; </FONT>
<FONT COLOR="#000000">&gt; Ironically, xargs is not necessarily faster. The command above needs</FONT>
<FONT COLOR="#000000">&gt; more execs, but has the advantage that find doesn't visit the work</FONT>
<FONT COLOR="#000000">&gt; directories.</FONT>

<FONT COLOR="#000000">Guess all the metadata for my pkgsrc tree must be in the buffer cache</FONT>
<FONT COLOR="#000000">much of the time!  :-)</FONT>

<FONT COLOR="#000000">Sorry, I didn't actually think about the guts of the work directories</FONT>
<FONT COLOR="#000000">themselves since I haven't actually used pkgsrc that way for half a</FONT>
<FONT COLOR="#000000">decade or so.  :-) I was more worried about the just CVS subdirectories</FONT>
<FONT COLOR="#000000">chewing up unnecessary time and I/Os.</FONT>


<FONT COLOR="#000000">The sad thing is the shell glob example is about two orders of magnitude</FONT>
<FONT COLOR="#000000">faster (at least when you don't have that many work subdirs) than any</FONT>
<FONT COLOR="#000000">form or use of find.</FONT>

<FONT COLOR="#000000">This machine's pkgsrc tree hasn't been touched or looked at since it was</FONT>
<FONT COLOR="#000000">booted (it also doesn't have any work subdirs, just a few work symlinks,</FONT>
<FONT COLOR="#000000">but it does seem to have enough buffer cache to store the whole tree's</FONT>
<FONT COLOR="#000000">metadata):</FONT>

<FONT COLOR="#000000">19:19 [56] $ time sh -c 'echo */*/work &gt; /dev/null'</FONT>
<FONT COLOR="#000000">   11.05s real     0.01s user     0.57s system</FONT>
<FONT COLOR="#000000">19:19 [57] $ time sh -c 'echo */*/work &gt; /dev/null'</FONT>
<FONT COLOR="#000000">    0.04s real     0.00s user     0.03s system</FONT>
<FONT COLOR="#000000">19:20 [58] $ time sh -c 'echo */*/work &gt; /dev/null'</FONT>
<FONT COLOR="#000000">    0.04s real     0.01s user     0.03s system</FONT>
<FONT COLOR="#000000">19:20 [59] $ time sh -c 'echo */*/work &gt; /dev/null'</FONT>
<FONT COLOR="#000000">    0.04s real     0.01s user     0.02s system</FONT>
<FONT COLOR="#000000">19:20 [60] $ time find . -name work -print &gt; /dev/null</FONT>
<FONT COLOR="#000000">   37.83s real     0.39s user     2.69s system</FONT>
<FONT COLOR="#000000">19:21 [61] $ time find . -name work -print &gt; /dev/null</FONT>
<FONT COLOR="#000000">   37.76s real     0.37s user     2.17s system</FONT>
<FONT COLOR="#000000">19:22 [62] $ time find . \( -name CVS -prune \) -o \( -name work -print \) &gt; /dev/null</FONT>
<FONT COLOR="#000000">   28.24s real     0.22s user     1.25s system</FONT>
<FONT COLOR="#000000">19:23 [63] $ time find . \( -name CVS -prune \) -o \( -name work -print \) &gt; /dev/null</FONT>
<FONT COLOR="#000000">   25.98s real     0.21s user     1.10s system</FONT>

<FONT COLOR="#000000">Thankfully GNU Find isn't really any faster:</FONT>

<FONT COLOR="#000000">19:27 [70] $ time gfind . \( -name CVS -prune \) -o \( -name work -print \) &gt; /dev/null</FONT>
<FONT COLOR="#000000">   26.76s real     0.19s user     1.29s system</FONT>
<FONT COLOR="#000000">19:28 [71] $ time gfind . -name work -print &gt; /dev/null</FONT>
<FONT COLOR="#000000">   37.26s real     0.39s user     2.05s system</FONT>


<FONT COLOR="#000000">&gt; Using maxdepth would have a similiar result.</FONT>

<FONT COLOR="#000000">Hmmm.... &quot;find: -maxdepth: unknown option&quot;  :-)</FONT>

<FONT COLOR="#000000">Perhaps you're thinking about &quot;-prune&quot;, as per above?&nbsp; Or GNU Find</FONT>?
</PRE>
</BLOCKQUOTE>
<BR>
What does<BR>
<BR>
/usr/bin/time find . -mindepth 3 -maxdepth 3 -name work &gt; /dev/null<BR>
<BR>
result in?<BR>
<BR>
Ron
</BODY>
</HTML>

--=-fjlDWgMBkl7E6gBxsf09--