pkgsrc-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: pkgsrc/textproc/sentencepiece



Module Name:    pkgsrc
Committed By:   wiz
Date:           Mon Mar 13 14:17:12 UTC 2023

Added Files:
        pkgsrc/textproc/sentencepiece: DESCR Makefile Makefile.common PLIST
            buildlink3.mk distinfo

Log Message:
textproc/sentencepiece: import sentencepiece-0.1.97

SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.


To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 pkgsrc/textproc/sentencepiece/DESCR \
    pkgsrc/textproc/sentencepiece/Makefile \
    pkgsrc/textproc/sentencepiece/Makefile.common \
    pkgsrc/textproc/sentencepiece/PLIST \
    pkgsrc/textproc/sentencepiece/buildlink3.mk \
    pkgsrc/textproc/sentencepiece/distinfo

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Added files:

Index: pkgsrc/textproc/sentencepiece/DESCR
diff -u /dev/null pkgsrc/textproc/sentencepiece/DESCR:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/DESCR Mon Mar 13 14:17:12 2023
@@ -0,0 +1,8 @@
+SentencePiece is an unsupervised text tokenizer and detokenizer
+mainly for Neural Network-based text generation systems where the
+vocabulary size is predetermined prior to the neural model training.
+SentencePiece implements subword units (e.g., byte-pair-encoding
+(BPE)) and unigram language model with the extension of direct
+training from raw sentences. SentencePiece allows us to make a
+purely end-to-end system that does not depend on language-specific
+pre/postprocessing.
Index: pkgsrc/textproc/sentencepiece/Makefile
diff -u /dev/null pkgsrc/textproc/sentencepiece/Makefile:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/Makefile      Mon Mar 13 14:17:12 2023
@@ -0,0 +1,8 @@
+# $NetBSD: Makefile,v 1.1 2023/03/13 14:17:12 wiz Exp $
+
+PKGCONFIG_OVERRIDE+=   sentencepiece.pc.in
+
+.include "Makefile.common"
+
+.include "../../devel/cmake/build.mk"
+.include "../../mk/bsd.pkg.mk"
Index: pkgsrc/textproc/sentencepiece/Makefile.common
diff -u /dev/null pkgsrc/textproc/sentencepiece/Makefile.common:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/Makefile.common       Mon Mar 13 14:17:12 2023
@@ -0,0 +1,16 @@
+# $NetBSD: Makefile.common,v 1.1 2023/03/13 14:17:12 wiz Exp $
+#
+# used by textproc/sentencepiece/Makefile
+# used by textproc/py-sentencepiece/Makefile
+
+DISTNAME=      sentencepiece-0.1.97
+CATEGORIES=    textproc
+MASTER_SITES=  ${MASTER_SITE_GITHUB:=google/}
+GITHUB_TAG=    v${PKGVERSION_NOREV}
+
+MAINTAINER=    pkgsrc-users%NetBSD.org@localhost
+HOMEPAGE=      https://github.com/google/sentencepiece/
+COMMENT=       Unsupervised text tokenizer for Neural Network-based text generation
+LICENSE=       apache-2.0
+
+USE_LANGUAGES= c c++17
Index: pkgsrc/textproc/sentencepiece/PLIST
diff -u /dev/null pkgsrc/textproc/sentencepiece/PLIST:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/PLIST Mon Mar 13 14:17:12 2023
@@ -0,0 +1,17 @@
+@comment $NetBSD: PLIST,v 1.1 2023/03/13 14:17:12 wiz Exp $
+bin/spm_decode
+bin/spm_encode
+bin/spm_export_vocab
+bin/spm_normalize
+bin/spm_train
+include/sentencepiece_processor.h
+include/sentencepiece_trainer.h
+lib/libsentencepiece.a
+lib/libsentencepiece.so
+lib/libsentencepiece.so.0
+lib/libsentencepiece.so.0.0.0
+lib/libsentencepiece_train.a
+lib/libsentencepiece_train.so
+lib/libsentencepiece_train.so.0
+lib/libsentencepiece_train.so.0.0.0
+lib/pkgconfig/sentencepiece.pc
Index: pkgsrc/textproc/sentencepiece/buildlink3.mk
diff -u /dev/null pkgsrc/textproc/sentencepiece/buildlink3.mk:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/buildlink3.mk Mon Mar 13 14:17:12 2023
@@ -0,0 +1,12 @@
+# $NetBSD: buildlink3.mk,v 1.1 2023/03/13 14:17:12 wiz Exp $
+
+BUILDLINK_TREE+=       sentencepiece
+
+.if !defined(SENTENCEPIECE_BUILDLINK3_MK)
+SENTENCEPIECE_BUILDLINK3_MK:=
+
+BUILDLINK_API_DEPENDS.sentencepiece+=  sentencepiece>=0.1.97
+BUILDLINK_PKGSRCDIR.sentencepiece?=    ../../textproc/sentencepiece
+.endif # SENTENCEPIECE_BUILDLINK3_MK
+
+BUILDLINK_TREE+=       -sentencepiece
Index: pkgsrc/textproc/sentencepiece/distinfo
diff -u /dev/null pkgsrc/textproc/sentencepiece/distinfo:1.1
--- /dev/null   Mon Mar 13 14:17:12 2023
+++ pkgsrc/textproc/sentencepiece/distinfo      Mon Mar 13 14:17:12 2023
@@ -0,0 +1,5 @@
+$NetBSD: distinfo,v 1.1 2023/03/13 14:17:12 wiz Exp $
+
+BLAKE2s (sentencepiece-0.1.97.tar.gz) = 969788b6d87e8c992f6df4349f984fb2d6e80f978d4007127174222ec7fcb3ab
+SHA512 (sentencepiece-0.1.97.tar.gz) = 4c35488e3661e45be677b04299c0d0b1f0d46421098f0b1625a1bb5e7725d175dfd55328a5a7bbf88badeb03c2ba087aef942b0d7520a29f6bf34eae211a99eb
+Size (sentencepiece-0.1.97.tar.gz) = 11945436 bytes



Home | Main Index | Thread Index | Old Index