pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
textproc/dict-mueller7, illegal byte sequences
With my current locale:
LANG="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="C"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=""
Netbsd's sed chokes with the bytes inserted in
work/usr/local/share/dict/to-dict. The file seems to be iso-8859-1,
so sed sees wrong byte sequences.
===> Building for dict-mueller7-1.2nb10
cd /home/adr/pkgsrc/textproc/dict-mueller7/work/usr/local/share/dict && sh to-dict --src-data Mueller7GPL.koi mueller7.data > /dev/null && /usr/pkg/bin/perl mueller2utf8 < mueller7.data > tmp_1 && /usr/pkg/bin/dictfmt --utf8 -p --columns 0 -s 'Mueller English-Russian Dictionary' -u 'http://www.chat.ru/~mueller_dic' --headword-separator ', ' mueller7 < tmp_1 && /usr/pkg/bin/dictzip *.dict
sed: 1: "/^_/,/_. Japan ...": RE error: illegal byte sequence
sed: 6: "s/$/\
/g; s/[^]]*\ \ /% ...": RE error: illegal byte sequence
6 headwords
=======================
The worst part is that the package get built, so with a bulk build it can
go unnoticed.
Note that gsed just will not match invalid sequences, so the script will
not do what it should do but without error.
Setting LC_CTYPE=C solves the issue.
I imagine this kind of problem has been discussed in the past.
What is the best practice in this case?
Regards.
adr.
Home |
Main Index |
Thread Index |
Old Index