NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[RFC] introducing new locale-db implementation (Re: lib/39662: shortcomings in LC_{MONETARY,NUMERIC,TIME,MESSAGES} db format)

The following reply was made to PR lib/39662; it has been noted by GNATS.

From: "Takehiko NOZAKI" <>
Subject: [RFC] introducing new locale-db implementation (Re: lib/39662: 
shortcomings in LC_{MONETARY,NUMERIC,TIME,MESSAGES} db format)
Date: Fri, 2 Jan 2009 02:41:37 +0900

 happy new year! all.
 let's remember following discussion about locale-db format.
 let me summarize:
 1. the lack of magic number, no versioning mechanism is the killer
    for backward binary compatibility for libc itself.
 2. plain-text based db file can't afford to store wide string data,
   it is not good idea "on the fly" conversion, we need more efficient format
   that can easily handle byteorder(3) issue.
 3. making /usr/share/locale/*/LC_MESSAGES as the monolithic file
    give us the confliction with gettext(3)'s namespace,
 4. we're already have too many locale db format,
   LC_CTYPE(rune), *.cat(catgets), *.mo(gettext), citrus_db(iconv)
   introducing another format is not good idea.
 before the shipping of 5.0, we have to fix these problems (this
 problem is already filed as PR/39662, and blocker for netbsd-5).
 so i wrote brand new localedata implementation for LC_*.
 it uses citrus_db framework as backend(we're already uses citrus_db
 to implement iconv).
 here is the patch to HEAD and netbsd-5.
 i've already checked this patch doesn't break release build:
    i386, amd64, hpcarm, hpcmips, hpcsh, vax.
 i want to commit this patch into HEAD and send pullup-5 request.
 is there any objection, or comments?
 i think it is better to merge only libc's change,
 and don't install LC_{MONETARY,NUMERIC,TIME,MESSAGES} locale-db for 5.0
 (currently, this patch **install** all kind of locale-db, see
 because of following reason:
 1. our regex(3) doesn't supports multibyte encoding such as UTF-8,
 so it can't parse multibyte LC_MESSAGES's yesexpr/noexpr correctly.
 we have to introdule multibyte-aware regex(3).
 2. some locale(ja_JP.eucJP, ko_KR.eucKR) assign LC_NUMERIC's currency_symbol
 as 0x5c(\), the zombie derrived from internalional version of ISO646 makes
 some shell script broken, i'm afraid.
    $ LANG=ja_JP.eucJP locale -k currency_symbol
 as far as Solaris' locale(1), 0x5c is surely escaped.
 we have to fix our locale(1).
 3. date(1) output is too strange under some locale(ja_JP.eucJP and so on),
 because the format string is hardcoded:
    121         format = "%a %b %e %H:%M:%S %Z %Y";
 this format must be "%+", but it seems that our strftime(3) lacks
 "%+" conversion facility.  and more, LC_TIME's d_t_fmt field doesn't include
 %a(week) and %Z(timezone).  so we have to fix date(1), and add new field to
 implement "%+" and maintain locale definition file.
 i think there is no time to fix these problems until 5.0 release...
 i once sugested make LC_* as sub-directory for versioning.
 but i abondon this, because we already have monolithic LC_CTYPE db.
 so my previous idea of  localedef(1) at tech-userlevel@  is
 hard to realize ;-< and i think it is much confusious that such
 monolithic'db and modular'ed db exists same time.
 anyway, forward compatibility is no problems, new setlocale(3) can read
 previous plain-text type as well as new citrus_db's locale-db.
 but backward is not, because  localeio.c never validate locale-db,
 no IS_REG, no magic, no size checing :-<
 # that's why i strongly against to localeio at tech-userlevel@.
 very truly yours.
 Takehiko NOZAKI <>

Home | Main Index | Thread Index | Old Index