Subject: Re: pkg_summary
To: Jeremy C. Reed <reed@reedmedia.net>
From: Aleksey Cheusov <cheusov@tut.by>
List: pkgsrc-users
Date: 06/13/2007 00:09:29
> I can write a cron job to check for these. (But I still don't understand 
> the FTP layout for the packages as some have an overlay and are in 
> available in two places but some are only available in one place.)
mk/bulk/upload uses the following

   ls -t | grep '\.t[gb]z$' | while read n; do pkg_info -X "$n"; done

hint: xargs makes this 12.2% faster, shorter, and even easier
12% from nothing! ;-)

0 All>time ls -t | grep '\.t[gb]z$' | while read n; do pkg_info -X "$n"; done >/tmp/summary1
   98.02s real    72.08s user    19.75s system
0 All>time ls -t | grep '\.t[gb]z$' | xargs pkg_info -X >/tmp/summary2
   86.07s real    69.98s user    15.27s system
0 All>bc
(98.02-86.07)/98.02
.12191389512344419506

> As for updating or rebuilding -- Maybe we can make a script that removes 
> non-existent data from pkg_summary and adds new data. That should be way 
> faster than creating entire pkg_summary each time.
Rebuilding an entire index is fast enough.
Building an index for .tbz (bzip2 is slower than gzip) 700 packages on
my 5-years old machine (!!!) takes less than 100 secs, see below.
For entire repository (less than 7000 packages) it will take
less than 1000 sec, i.e. less 17 minutes.
17 minutes per day on 800Mhz machine! ;-)
Also note that most repositories are not updated most of the time
(test "`tail -t | head -1`" -nt pkg-summary.gz) may help.

-- 
Best regards, Aleksey Cheusov.