tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: The first step away from CVS



On 2010-01-07 08:44 +1300 (Thu), Lloyd Parkes wrote:

> We will probably need to assign character sets to individual
> developers and then re-encode commit messages based on who did the
> commit. I'm not entirely sure how to implement this....

Probably assigning a conversion regime to each developer would be
better. By this, I mean having a few different heuristics that look at
the text, guess a reasonable encoding, and do the conversion.

This would let us use, e.g., NKF (Network Kanji Filter) on Japanese
users. This does very good detection of the various Japanese encodings,
and when converting them will I believe do a few tricks to get into
UTF-8 things on which a straight conversion (such as using iconv) will
fail.

I have a lot of experience with I18N, character set encodings and so on,
basic knowledge of English, French and Japanese, and a bit of knowledge
about Chinese as well. I'd be happy to help with advice on this sort of
thing.

cjs
-- 
Curt Sampson         <cjs%cynic.net@localhost>         +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw


Home | Main Index | Thread Index | Old Index