pkgsrc-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[pkgsrc/trunk]: pkgsrc/www/crawl Initial import of crawl-0.4 into the NetBSD ...



details:   https://anonhg.NetBSD.org/pkgsrc/rev/97f1c91888f4
branches:  trunk
changeset: 487686:97f1c91888f4
user:      peter <peter%pkgsrc.org@localhost>
date:      Tue Jan 18 17:46:31 2005 +0000

description:
Initial import of crawl-0.4 into the NetBSD Packages Collection.

The crawl utility starts a depth-first traversal of the web at the specified
URLs. It stores all JPEG images that match the configured constraints.
Crawl is fairly fast and allows for graceful termination. After terminating
crawl, it is possible to restart it at exactly the same spot where it was
terminated. Crawl keeps a persistent database that allows multiple crawls
without revisiting sites.

The main features of crawl are:

 * Saves encountered images or other media types
 * Media selection based on regular expressions and size contraints
 * Resume previous crawl after graceful termination
 * Persistent database of visited URLs
 * Very small and efficient code
 * Asynchronous DNS lookups
 * Supports robots.txt

diffstat:

 www/crawl/DESCR            |  16 ++++++++++++++++
 www/crawl/Makefile         |  30 ++++++++++++++++++++++++++++++
 www/crawl/PLIST            |   5 +++++
 www/crawl/distinfo         |   7 +++++++
 www/crawl/patches/patch-aa |  19 +++++++++++++++++++
 www/crawl/patches/patch-ab |  14 ++++++++++++++
 www/crawl/patches/patch-ac |  13 +++++++++++++
 7 files changed, 104 insertions(+), 0 deletions(-)

diffs (132 lines):

diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/DESCR
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/DESCR   Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,16 @@
+The crawl utility starts a depth-first traversal of the web at the specified
+URLs. It stores all JPEG images that match the configured constraints.
+Crawl is fairly fast and allows for graceful termination. After terminating
+crawl, it is possible to restart it at exactly the same spot where it was
+terminated. Crawl keeps a persistent database that allows multiple crawls
+without revisiting sites.
+
+The main features of crawl are:
+
+ * Saves encountered images or other media types
+ * Media selection based on regular expressions and size contraints
+ * Resume previous crawl after graceful termination
+ * Persistent database of visited URLs
+ * Very small and efficient code
+ * Asynchronous DNS lookups
+ * Supports robots.txt
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/Makefile
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/Makefile        Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,30 @@
+# $NetBSD: Makefile,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+
+DISTNAME=      crawl-0.4
+CATEGORIES=    www
+MASTER_SITES=  http://monkey.org/~provos/
+
+MAINTAINER=    peter%pointless.nl@localhost
+HOMEPAGE=      http://monkey.org/~provos/crawl/
+COMMENT=       Small and efficient HTTP crawler
+
+GNU_CONFIGURE= yes
+USE_PKGINSTALL=        yes
+USE_BUILDLINK3=        yes
+USE_DB185=     yes
+
+CONF_FILES=    ${PREFIX}/share/examples/${PKGBASE}/crawl.conf ${PKG_SYSCONFDIR}/crawl.conf
+
+post-install:
+       ${INSTALL_DATA_DIR} ${PREFIX}/share/examples/${PKGBASE}
+       ${INSTALL_DATA} ${WRKSRC}/crawl.conf ${PREFIX}/share/examples/${PKGBASE}/crawl.conf
+
+SUBST_CLASSES=         path
+SUBST_STAGE.path=      post-patch
+SUBST_FILES.path=      cfg.h
+SUBST_SED.path=                -e 's,crawl.conf,${PKG_SYSCONFDIR}/crawl.conf,g'
+SUBST_MESSAGE.path=    "Fixing hardcoded path."
+
+.include "../../devel/libevent/buildlink3.mk"
+.include "../../mk/bdb.buildlink3.mk"
+.include "../../mk/bsd.pkg.mk"
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/PLIST
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/PLIST   Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,5 @@
+@comment $NetBSD: PLIST,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+bin/crawl
+man/man1/crawl.1
+share/examples/${PKGBASE}/crawl.conf
+@dirrm share/examples/${PKGBASE}
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/distinfo
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/distinfo        Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,7 @@
+$NetBSD: distinfo,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+
+SHA1 (crawl-0.4.tar.gz) = b53be27b572ba6a88ab80243b177873aed0b314b
+Size (crawl-0.4.tar.gz) = 111084 bytes
+SHA1 (patch-aa) = 874cb3b73cbc56e320c58039ecc9fd98ab258a0b
+SHA1 (patch-ab) = 9c934c5c7f03e4acbd02222a30267aded4d01e26
+SHA1 (patch-ac) = 079c792e55fa3e60dead7ff9c1c46132d01a00d4
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/patches/patch-aa
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/patches/patch-aa        Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,19 @@
+$NetBSD: patch-aa,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+
+--- configure.orig     2003-05-18 03:50:55.000000000 +0200
++++ configure  2004-06-11 23:51:00.000000000 +0200
+@@ -2669,6 +2669,14 @@
+       DBINC="-I/usr/include/db2"
+       DBLIB="-ldb2"
+       have_db=yes
++     elif test -f /usr/include/db1/db.h; then
++
++cat >>confdefs.h <<\_ACEOF
++#define HAVE_DB1_H 1
++_ACEOF
++
++        DBLIB="-ldb"
++        have_db=yes
+      elif test -f /usr/include/db_185.h; then
+ 
+ cat >>confdefs.h <<\_ACEOF
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/patches/patch-ab
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/patches/patch-ab        Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,14 @@
+$NetBSD: patch-ab,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+
+--- config.h.in.orig   2003-05-18 02:54:45.000000000 +0200
++++ config.h.in        2004-06-12 00:06:58.000000000 +0200
+@@ -42,6 +42,9 @@
+ /* Define if your system has libdb */
+ #undef HAVE_DB_H
+ 
++/* Define if your system has libdb */
++#undef HAVE_DB1_H
++
+ /* Define to 1 if you have the `dirname' function. */
+ #undef HAVE_DIRNAME
+ 
diff -r 5cd688125d2c -r 97f1c91888f4 www/crawl/patches/patch-ac
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/www/crawl/patches/patch-ac        Tue Jan 18 17:46:31 2005 +0000
@@ -0,0 +1,13 @@
+$NetBSD: patch-ac,v 1.1.1.1 2005/01/18 17:46:31 peter Exp $
+
+--- crawldb.c.orig     2003-05-17 18:59:51.000000000 +0200
++++ crawldb.c  2004-06-11 23:56:47.000000000 +0200
+@@ -44,6 +44,8 @@
+ #include <db_185.h>
+ #elif HAVE_DB_H
+ #include <db.h>
++#elif HAVE_DB1_H
++#include <db1/db.h>
+ #endif
+ #include <compat/md5.h>
+ 



Home | Main Index | Thread Index | Old Index