tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Google SoC, mandoc, and PostScript



Hello,

This mail introduces another GSoC project: adding postscript output to
mandoc ("mandoc -Tps"), http://mdocml.bsd.lv/.  My mentors are T Klauser
(wiz@), J Sonnenberger (joerg@), and D Baron (dillo@).  A copy of the
proposal, edited for brevity, is included below.  It has details.  The
abstract follows:

 "mandoc -Tps is missing. While mandoc is fast becoming
  byte-compatible with GNU troff for terminal output (-Tascii)
  and has advanced X/HTML  output (-Thtml, -Txhtml), there does
  not yet exist PostScript output (-Tps). I propose implementing
  -Tps, initially as a shim over terminal output, then, with this
  milestone complete, bringing in more typographic awareness
  (variable-font, justification, etc.)."

The project will involve an initial implementation of the driver
(1,5/2,0 months), then focus on typography (1,5/1,0 months).  In my
experience, most of the first part will be spent hair-pulling regarding
lists (cf. mdoc_term.c, mdoc_html.c in mandoc).

If I finish ahead of time, I plan on adding -Tpdf, although this is
beyond the scope of GSoC.  I'll be calling upon Dieter's expertise to
pull this part off.

Prior to SoC, I'll add initial typographic cues, such as for sentential
spacing, which must occur before AST serialisation  This will relieve
front-ends of considerable complexity (see, e.g., the OpenBSD local
patches for EOS spacing).

That's the long and short of it.  If you have any questions, please let
me know.  Regarding myself, I wrote mandoc.  In the distant past I built
the "mult" forks of NetBSD and OpenBSD, the "sysjail" utility of days
gone by, and other goodies at http://bsd.lv/.  I'm a PhD student at KTH
in Stockholm studying math and theoretical computer science (game theory
and abstract algebra).  I also work in the club industry.

If you'd like to follow progress, you can sign up for the mdocml mailing
list (see http://mdocml.bsd.lv/), or just ask.

Thanks again to Thomas, Joerg, Dieter, NetBSD, and Google,

Kristaps "Mischka"

Proposal:

mandoc compiles UNIX "-man" and "-mdoc" manual input into a variety of
output forms. At this time, it is in the base distribution for NetBSD,
OpenBSD, and Dragon Fly BSD (and a third-party port for FreeBSD). In
OpenBSD, it is linked to the build and slated to be the default
formatter by 4.8. Its sole developer is Kristaps Dzonsons (me), with
significant contributions from several downstream (primarily OpenBSD and
NetBSD) maintainers, listed below. mandoc has existed in its current
form for roughly 16 months as dated by CVS.

Why mandoc over Heirloom or GNU troff? Beyond licensing (groff, GNU
troff, is GPLv3-licensed; Heirloom is variously-licensed, at times with
CDDL), programming language (GNU and Heirloom troff both use C++
components), complexity (see Heirloom and GNU troff source packages),
and packaging (both GNU and Heirloom troff have binary bits, which
prevent cross-compilation), mandoc is Fast. Recent measures (see OpenBSD
"misc" mailing list, 08-04-2010) peg it at twenty to twenty-five times
faster on a variety of architectures, including VAX.

mandoc, technically-speaking a compiler, is composed of input parsers
for its two distinct input languages, "-mdoc" and "-man", and output
formatters for these formats onto a variety of media. At this time, this
includes terminal-encoded ASCII (-Tascii), HTML-4.01-strict and CSS2
(-Thtml), and XHTML-1.0-strict and CSS2 (-Txhtml). PostScript is missing.

PostScript represents the last of a triptych of output modes:
fixed-font, on-screen text (-Tascii); cross-referenced, on-line
hypermedia (-Thtml, -Txhtml); and print-friendly, off-line text (-Tps).
Although there is no immediate push for PostScript in the requirement
for formatting on-screen manuals, there exists not-insignificant
pressure from the community to implement this functionality.

The reason for delay is one of complexity: mandoc, while striving to be
as simple as possible, is necessarily proportional in complexity to its
input and output. Although "-mdoc" and "-man" input is managed by the
back-end compiler libraries and normalised into abstract syntax trees,
properly formatting output is no easy task. The existing output modes
consist of near 4 000 lines of code each. Adding output modes requires
considerable knowledge of traditional troff output as well as input
caveats that are not directly normalised in the abstract syntax tree,
e.g., list handling.

I propose adding -Tps as the last in-system output device for mandoc.

This proposal may be broken down into the following milestones. First,
the basic framework for another output mode must be added. This will
involve small design changes, both in terms of documentation and source,
which assume only two main output modes (mandoc actually has several
more modes, -Ttree and -Tlint). Second, the basic output functionality
must be added to iterate over the abstract syntax trees of both input
modes. Third, bits must be added to perform necessary formatting for all
macros. The completion of these steps will mark the main milestone of
this project and a working -Tps for all parsable manuals.

If more time remains, the initial output may be modified for greater
typographic awareness. I anticipate focussing on line justification,
hyphenation (this is a long-standing issue with other output modes),
inter-line spacing (English/French spacing), and variable-width fonts.

Having written all other mandoc modes, I have a good grasp of the
complexities of each step. I anticipate some overhead as I learn about
PostScript, but consider this to be negligible in comparison to
practical implementation time, especially as it relates to the first and
most significant milestone.

Beyond providing -Tps, this effort will also result in a leaner common
code-base for all output formatters. For the time being, the assumption
of only two connected output modes has resulted in some unnecessary code
duplication and inelegance regarding linkage between outputs and the
main driver. I anticipate, in the development process, pushing the
output modes into distinct libraries, further consolidating shared code,
and breaking apart un-shared code in a more design-intuitive fashion.

[snip long-winded biography]

[snip acks & contribs (you know who you are, awesome people you]


Home | Main Index | Thread Index | Old Index