pkgsrc-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[pkgsrc/trunk]: pkgsrc/biology/filter-fastq biology/filter-fastq: add filter-...



details:   https://anonhg.NetBSD.org/pkgsrc/rev/f2816d7b1b74
branches:  trunk
changeset: 453312:f2816d7b1b74
user:      brook <brook%pkgsrc.org@localhost>
date:      Thu May 27 17:11:42 2021 +0000

description:
biology/filter-fastq: add filter-fastq version 0.0.0.20210527

Filter reads from a FASTQ file using a list of identifiers.

Each entry in the input FASTQ file (or files) is checked against all
entries in the identifier list. Matches are included by default, or
excluded if the --invert flag is supplied. Paired-end files are kept
consistent (in order).

This is almost certainly not the most efficient way to implement this
filtering procedure. I tested a few different strategies and this one
seemed the fastest. Current timing with 16 processes is about 10
minutes per 1M paired reads with gzip'd input and output, depending on
the length of the identifier list to filter by.

usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS]
                       [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip]

diffstat:

 biology/filter-fastq/DESCR    |  15 +++++++++++++++
 biology/filter-fastq/Makefile |  32 ++++++++++++++++++++++++++++++++
 biology/filter-fastq/PLIST    |   3 +++
 biology/filter-fastq/distinfo |   6 ++++++
 4 files changed, 56 insertions(+), 0 deletions(-)

diffs (72 lines):

diff -r df3a84f62dbe -r f2816d7b1b74 biology/filter-fastq/DESCR
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/biology/filter-fastq/DESCR        Thu May 27 17:11:42 2021 +0000
@@ -0,0 +1,15 @@
+Filter reads from a FASTQ file using a list of identifiers.
+
+Each entry in the input FASTQ file (or files) is checked against all
+entries in the identifier list. Matches are included by default, or
+excluded if the --invert flag is supplied. Paired-end files are kept
+consistent (in order).
+
+This is almost certainly not the most efficient way to implement this
+filtering procedure. I tested a few different strategies and this one
+seemed the fastest. Current timing with 16 processes is about 10
+minutes per 1M paired reads with gzip'd input and output, depending on
+the length of the identifier list to filter by.
+
+usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS]
+                       [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip]
diff -r df3a84f62dbe -r f2816d7b1b74 biology/filter-fastq/Makefile
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/biology/filter-fastq/Makefile     Thu May 27 17:11:42 2021 +0000
@@ -0,0 +1,32 @@
+# $NetBSD: Makefile,v 1.1 2021/05/27 17:11:42 brook Exp $
+
+PKGNAME=       filter-fastq-0.0.0.20210527
+GITHUB_PROJECT=        filter-fastq
+GITHUB_TAG=    d2c9218
+DISTNAME=      filter-fastq
+CATEGORIES=    biology
+MASTER_SITES=  ${MASTER_SITE_GITHUB:=stephenfloor/}
+EXTRACT_SUFX=  .zip
+DIST_SUBDIR=   ${GITHUB_PROJECT}
+
+MAINTAINER=    pkgsrc-users%NetBSD.org@localhost
+HOMEPAGE=      https://github.com/stephenfloor/filter-fastq/
+COMMENT=       Filter reads from a FASTQ file
+LICENSE=       mit
+
+WRKSRC=                ${WRKDIR}/filter-fastq-d2c92182674a6d5aa257fb63eb60ac24ddb8b4a0
+USE_LANGUAGES= # none
+NO_BUILD=      yes
+
+PYTHON_VERSIONS_ACCEPTED=      27
+
+REPLACE_PYTHON+=       filter_fastq.py
+
+INSTALLATION_DIRS+=    bin share/doc/filter_fastq
+
+do-install:
+       ${INSTALL_SCRIPT} ${WRKSRC}/filter_fastq.py ${DESTDIR}${PREFIX}/bin
+       ${INSTALL_DATA} ${WRKSRC}/README.md ${DESTDIR}${PREFIX}/share/doc/filter_fastq
+
+.include "../../lang/python/application.mk"
+.include "../../mk/bsd.pkg.mk"
diff -r df3a84f62dbe -r f2816d7b1b74 biology/filter-fastq/PLIST
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/biology/filter-fastq/PLIST        Thu May 27 17:11:42 2021 +0000
@@ -0,0 +1,3 @@
+@comment $NetBSD: PLIST,v 1.1 2021/05/27 17:11:42 brook Exp $
+bin/filter_fastq.py
+share/doc/filter_fastq/README.md
diff -r df3a84f62dbe -r f2816d7b1b74 biology/filter-fastq/distinfo
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/biology/filter-fastq/distinfo     Thu May 27 17:11:42 2021 +0000
@@ -0,0 +1,6 @@
+$NetBSD: distinfo,v 1.1 2021/05/27 17:11:42 brook Exp $
+
+SHA1 (filter-fastq/filter-fastq-d2c9218.zip) = 44b8bbef2690b598a2f06930396fbbf5828e364c
+RMD160 (filter-fastq/filter-fastq-d2c9218.zip) = 715b0e52b5714cea1fa4a64bfe8cbef919cee2ce
+SHA512 (filter-fastq/filter-fastq-d2c9218.zip) = c5ab23b86ac8690f58bf05bd0a16f3b315bd7a71f67bce267fe9f36b5e528ac228c57c2521cad8c547159915cf77433848be58d463100f407693927493ad8f5f
+Size (filter-fastq/filter-fastq-d2c9218.zip) = 4249 bytes



Home | Main Index | Thread Index | Old Index