Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Hyphens in man pages are no longer hyphens



On Thu, Dec 31, 2009 at 12:50:05 -0700, Sverre Froyen wrote:

> For I while now I have been using "LC_CTYPE=en_US.UTF-8" in my .profile.  I 
> recently noticed that I can no longer copy and paste commands containing 
> hyphens from a man page because the hyphens get formatted as "342 200 220" 
> (from od -c /usr/share/man/cat1/send-pr.0).  Unsetting LC_TYPE results in man 
> pages containing regular hyphens.  The strange thing, however, is that this 
> used to work.  With LC_TYPE set.  Formatted man pages from a catman on 22 
> Nov. 
> contain normal hyphens whereas current as of a week ago produce the UTF-8 
> specific hyphens.  Does anyone know what changed?  Is there a way to restore 
> the old behavior?

In utf-8 mode (or in PS output) groff translates:

* unescaped - to \u2010 HYPHEN (or /hyphen glyph)

  you get this in a command names like "send-pr" where unescaped '-'
  becomes a hyphen.

* escaped \- to \u2212 MINUS SIGN (or /minus glyph)

  you get this in options b/c mdoc Fl macro uses \- and before that it
  was a customary practice to use escpaed \-X for options.


src/gnu/dist/groff/font/devutf8/NOTES says:

  Character 0x002D has not been given a name because its Unicode name
  HYPHEN-MINUS is so ambiguous that it is unusable for serious typographic
  use.

so you cannot even refer ASCII '-' in utf-8 mode unless you modify
font files.

I don't have any PDF distiller handy to test what gonna happen if you
convert groff -Tps output to PDF and then try to copy-paste a command
example from the PDF document, but you'll probably get the same
problem with the /minus used for options (Adobe glyph lists says that
/hyphen is ASCII '-' \u002D).

Of course in copy-pastable command line examples we don't want
"serious typographic use", we want ASCII '-' for its literal character
value :), but there's a catch.  Let's say you want to copy-paste

    eval `ssh-agent -s`

but in the roff source the first '-' is plain (hyphen) and the second
is escaped (minus).

The only way to solve this properly as far as I can tell is to use
some special font for examples that are intended to be copy-pastable
in which both hyphen (-) and minus (\-) look the same and both are
represented by something that will get you ASCII '-' when copied.  For
PS output that could be a special alias for Courier that uses /hyphen
for both - and \-.  For utf-8 it would use ASCII '-' for both (instead
of fancy unicode chars).


PS: The back-tick has the same problems too, as it ends up as \u2018
LEFT SINGLE QUOTATION MARK :)

SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen


Home | Main Index | Thread Index | Old Index