NetBSD-Docs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Generating pdf in base

On Sat, Jul 05, 2014 at 20:08:02 +0000, David Holland wrote:

> Another possibility is to write a PDF driver for the groff we have.
> This is still work, but possibly not that much (e.g. maybe one could
> rewrite the upstream gplv3 Perl groff driver in Lua...); on the other
> hand the artefact it produces is of limited long-term value. And it's
> not like groff is doing a super job of typesetting the articles...
> and there's long been a desire to kick groff out of base.

I took a look at what needs to be done for this and hacked together a
quick prototype (with perl PDF::Haru binding of libharu) that handles
text only (no D groff_out(5) commands) - it's about 200 lines of
sparse code (though not much error handling or fully blown
groff_out(5) parser that can handle arbitrary output, not just what
groff produces).

I haven't touched PS/PDF in a while, but I hoped it could be simple,
and unfortunately it's not (exaggerating a bit, you can generate PS
from groff intermediate output with sed - I was hoping for this level
of "simple").

PDF is much more restrictive than PS and there are some obstacles in
mapping groff_out(5) to PDF.  As far as I can tell PDF driver must
have access to font metrics, which seem gross, as groff has already
done all the layout.

One obstacle is that c/C commands do not change the current position.
Since showing text does change position, that change needs to be
undone in the generated PDF.  Unfortunately gsave/grestore is not
available for this in PDF, so you need to know the width of the
character to emit the move backwards.  [In my protoype I used a
totally gross hack of printing the same char backwards in invisible
rendering mode, which gets me the right position without knowing the
character width, but doesn't play nice with some programs that extract
text from PDF.]

Another obstacle is that PDF text matrix is a separate matrix that is
gone when text object ends.  This makes it impossible, it seems, to
mix text and graphics without tracking current position, so, again,
you need to know character widths.

Both of these are not unsolvable (or even hard), but feel icky.


Home | Main Index | Thread Index | Old Index