Subject: Re: UTF8 and nroff manpages
To: None <tech-misc@NetBSD.org>
From: Alan Barrett <apb@cequrux.com>
List: tech-misc
Date: 09/22/2005 17:23:38
On Thu, 22 Sep 2005, David Brownlee wrote:
> 	If locale is en_US.UTF-8 then manpages do not display '-' as
> 	'-', but instad as the hex sequence e2 88 92. You can easily
> 	see this by comparing the output of
> 		nroff -mandoc /usr/share/man/man1/ls.1
> 	with LC_CTYPE=en_US.UTF-8 and undefined.
> 
> 	I can understand for text formatting that e2 88 92 may be a
> 	'better' unicode entity for a hyphen, but for a manpage its
> 	very much not.

Actually, that's the UTF8 code for the Unicode character U+2212 (minus
sign).  It's not a hyphen at all.  Hyphen would be U+2010.

$ printf "0xe2, 0x88, 0x92" | recode utf8/x..dump
UCS2   Mne   Description

2212   -2    minus sign
$ printf "-" | recode ascii..dump
UCS2   Mne   Description

002D   -     hyphen-minus
$ printf "0x2010" | recode ucs2/x2..dump
UCS2   Mne   Description

2010   -1    hyphen
$ printf "0x2010" | recode ucs2/x2..utf8/x
0xE2, 0x80, 0x90
$

> 	Does anyone have any ideas?

Sorry, no.

--apb (Alan Barrett)