tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: tar issue on netbsd-5

>> anything writing standard tar format has this limit, because [...]
> As I said in the last discussion of this topic and the corresponding
> PR, it strongly depends on which standard you are talking about.  For
> ustar the limit is 8GB, for POSIX Interchange Format it is no
> problem.

Are you talking about the "pax Extended Header" spec, such as is found
(Normally, I wouldn't take a pax spec as relevant to tar at all, but I
have a copy of that page saved alongside my tar source and marked as
being standards-relevant, and the format looks like an extension to
least-common-denominator tar format.)

Strictly, it's unimplementable on NetBSD, and probably assorted other
OSes, since it demands UTF-8 encoding for paths and link-to strings,
which means it demands that names be character strings, but NetBSD file
names are octet strings, not character strings.  They look like
character strings, but aren't; the actual name is the octet string,
with conversion between characters and octets, when it happens,
happening elsewhere.  If you doubt, suppose you call readdir() and find
that d_name[] contains 0xc1, 0xa5, 0xac, 0x00.  Is that the single
Unicode character TAI LE LETTER AUE encoded in UTF-8, or is that the
three ISO-8859-1 characters A-acute, yen-sign, not-sign?  Or is it due
to some bit of software generating file names based on some
non-character mechanism (such as representing the number 12364128, or
maybe 11009313, in base 254?  Or maybe something else?  Without some
way to tell, there's no way to generate UTF-8 for it - and, in the last
case, it's not even clear what the correct UTF-8 would be.  (Anyone
happen to know what existing implementations do?  I'm curious, but not
quite curious enough to find and build one to see.)

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML      
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Home | Main Index | Thread Index | Old Index