pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

textproc/dict-mueller7, illegal byte sequences



With my current locale:

LANG="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="C"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=""

Netbsd's sed chokes with the bytes inserted in
work/usr/local/share/dict/to-dict. The file seems to be iso-8859-1,
so sed sees wrong byte sequences.

===> Building for dict-mueller7-1.2nb10
cd /home/adr/pkgsrc/textproc/dict-mueller7/work/usr/local/share/dict &&  sh to-dict --src-data Mueller7GPL.koi mueller7.data > /dev/null &&  /usr/pkg/bin/perl mueller2utf8 < mueller7.data > tmp_1 &&  /usr/pkg/bin/dictfmt --utf8 -p --columns 0  -s 'Mueller English-Russian Dictionary'  -u 'http://www.chat.ru/~mueller_dic'  --headword-separator ', ' mueller7 < tmp_1 &&  /usr/pkg/bin/dictzip *.dict
sed: 1: "/^_/,/_.  Japan  ...": RE error: illegal byte sequence
sed: 6: "s/$/\
/g; s/[^]]*\ \ /% ...": RE error: illegal byte sequence
           6 headwords
=======================

The worst part is that the package get built, so with a bulk build it can
go unnoticed.

Note that gsed just will not match invalid sequences, so the script will
not do what it should do but without error.

Setting LC_CTYPE=C solves the issue.

I imagine this kind of problem has been discussed in the past.

What is the best practice in this case?

Regards.
adr.


Home | Main Index | Thread Index | Old Index